Category Archives: Testing

Quality Control for CSL Styles

One of the most common questions we get about the CSL project, especially when we’re talking to companies looking to implement CSL into their product, is about quality control. So, how do we ensure quality for now more than 1000 different citation styles?

Travis: The Automated First Wall of Defense

Whenever someone submits a style to the CSL repository on GithHub, the first “person” they interact with is—well it’s not a person but a friendly bot called “Travis.” Travis, or “Travis CI” by full name, automatically runs a series of tests on the repository: It checks whether all styles validate, i.e. don’t contain anything that CSL doesn’t allow. It checks whether there are any macros in the style that are not used or are used but not defined. It also makes sure the style follows the naming convention we use in the repository.

This is how Travis looks when it’s happy:

Travis Pass

An unhappy Travis looks like this:

Travis Fail

By clicking on “Details” you can look at the exact error message(s) that cause Travis to fail. In this case, the submitted style, zeitschrift-fur-theologie-und-kirche, specifies some macros that aren’t actually used—Travis lists the test conditions that are violated.

Travis Details

Several Pairs of Watchful Eyes

There are a handful of people who can commit styles directly to the CSL repository (though the styles still go through a check by Travis before being widely distributed). Everyone else can contribute via so called “pull request,” basically a set of suggested changes. A couple of volunteers, mainly Rintze Zelle and me with occasional help from other people, review these submissions.

For new styles, we perform a very basic review: Does everything in the style make sense? Is it going to be used widely enough to warrant inclusion (we’re pretty liberal here, but aren’t accepting styles used by a dozen people)? Are other requirements like documentation and ISSN (for journals) met?

For existing styles, we take a closer look at the proposed changes. While we mostly take the word of contributors for what citations should look like, we do go over the changes in the CSL code to make sure they will actually and consistently have the desired effect. For widely used styles like APA, Chicago Manual, or Vancouver/NLM, we take extra steps to assure the changes get it right and will consult the respective style manual.

This process takes up a significant amount of volunteer time, especially since CSL attracts a huge amount of submissions, many from people with little or no programming experience. In the last month alone, for example, 10 new people contributed citation styles and/or fixes to CSL. We’re very proud of that! But it does mean that we spend more time guiding people through the process than most other projects likely do.

Code Maintenance—CSL on a Diet

We’ll occasionally identify common inefficiencies, sometimes even outright errors, in CSL code and fix those by scripting—everyone using their favorite tool ranging from simple sed commands to more elaborate Perl or Python scripts— (followed by manual control and fixes). In recent weeks we have removed thousands of line of unused and/or superficial code from CSL styles. While these rarely change the behavior of styles, they make our styles more efficient and easier to modify down the road.

User Feedback: A Thousand Eyes and We Still Need More

Now, you’ll say: “None of this guarantees me that the CSL style actually matches what the journal wants.” And you’ll be right. There is absolutely no way we can do active quality control on 1000 different styles. However, there are hundreds of thousands of people using reference managers that rely on CSL styles. Many of them will notice when something is wrong with a style. And when they let us know, we’ll fix it—quickly, in almost all cases.

Feedback from Reference Managers

The “when they let us know” part is quite important. For a long time, at least 90% of all error reports for styles came from Zotero users, reported via their forums and that’s still probably the single largest source of error reports for styles. But there are many other products using CSL styles now and we want to hear from them. By far the biggest recent improvement for getting error reports has developed out of our talks with Mendeley: they have now set up a simple form through which their users can submit style error reports. Those are (automatically) shared between us and Mendeley’s customer support in a google doc and have proven a very effective way for communicating errors. Unfortunately, with rare exceptions, those two are almost the only source of error reports that reach us (with the occasional pandoc user). Which is too bad, since there are at least another half a dozen reference managers using CSL.

So, if your reference manager uses CSL styles, we want to hear from you. We’d be happy to set up similar mechanisms that we currently have working with Mendeley. (colwiz, Docear, Paperpile, Qiqqa, ReadCube etc.—I’m looking at you.)

Having style errors go through the reference manager’s support system is crucial for two reasons: For one, about a third of error reports are actually errors in the data stored in the reference managers. We’re not in a position to tell users how to put in data in every reference manager.  Secondly, not every reference manager interprets CSL styles the same way. It is not uncommon for error reports to actually point to errors in the way CSL styles are interpreted or fields are mapped in a reference manager—so they need to know about these errors.

Feedback from Editors and Publishers

We are particularly happy when we hear from journals directly about their citation style. This—surprisingly in my opinion—still quite rare. We recently heard from a copyeditor at The Lancet. An Elsevier employee is currently going through their list of journals systematically and submitting corrections (directly as patches, which is extra awesome, but we’d take simple error reports, too). We’ve had some smaller journals contract with me to write CSL styles for their house styles. Since they then go on to check on my work, those are guaranteed to be accurate.

Still, I think there’s a lot of room for improvement here. In particular, I wish more of the bigger publishers like Sage and Oxford University Press would be willing to work with us at least on those styles used in multiple of their journals. There should be, as I’ve said before, more journals covered by generic styles. Publishers should also make it much easier to access lists of their journals with corresponding citations styles, along the lines of what BioMed Central does.

Advertisements

Beta Test Zotero

Zotero has just started a regular beta release. The beta version replaces what used to be called the “branch xpi”. It is built regularly from the current release branch. In less technical terms that means that using the beta version you’re testing features that will be in the next minor relase. The current beta, for example contains, code changes that will be in version 4.0.18. The beta version is intended to be usable with minimal risk. It will typically not contain database upgrades, so the risk of data corruption is very low and you can easily revert to the regular version if you’re in a pinch. The beta channel is update a lot more frequently than regular Zotero.
The principal reason for releasing a regular beta is to encourage wider testing of Zotero versions pre-release. If you’re interested enough in Zotero to read this blog, there is a good chance you should run the beta version.

You may want to run Zotero beta If…

      • You want to help Zotero by testing pre-release versions
      • You provide technical support for Zotero and want to be aware of new features before they land (more on that below)
      • You like having new and shiny things before anyone else has them

You (probably) shouldn’t run Zotero beta if…

      • You want maximal stability when using Zotero
      • You panic or get frustrated when something doesn’t work on your computer
      • You have no time or no patience to deal with and report occasional problems on the Zotero forums
      • You’re only using Zotero Standalone (the beta currently is only for the Firefox add-on).

OK, I want to be using the beta version, what now?

Install

First, install the add-on from here. You can simply install it over your existing Zotero, your database will remain untouched. Heed the warning on that page: I’ve never had any trouble running pre-release versions of Zotero and I’m not aware of anyone who has lost data doing so, but it’s beta software and you want to make sure to have regular and automated back-ups.

Check

You can see the currently installed version in the add-ons tab of Firefox or under “About Zotero” in the gears menu. As of this writing the current beta is 4.0.18-beta.r3+fadd486. This means you’re running the 3rd (r3) beta release for the 4.0.18 version of Zotero. The last part after the plus sign corresponds to the last commit to the Zotero source code that’s included in the version, so if you’re following commits (you should be looking at the 4.0 branch) you can easily check whether the version you’re running already contains an addition to the code.

Report Problems

Zotero devs will very much appreciate any error reports from beta users. As for all Zotero errors, you should report them on the forums, and you should provide plenty of details and, if possible, an error report ID. Also mention that you’re running the beta version of Zotero. I had some problems trying out the beta last night, and you can see my report (with quick solution) here.

Talking about shiny things…

The current beta version contains two major improvements in handling PDFs, both coded by community developers. Thanks to Aurimas, “Retrieve Metadata” has improved significantly and you’re now less likely to get locked out by google scholar. Thanks to Emiliano (whose add-ons I praised in my last post), indexing of large (or many) files is now several orders of magnitude faster—a large PDF like War and Peace could take minutes to index and freeze Zotero before, now it takes a couple of seconds.