One of the most common questions we get about the CSL project, especially when we’re talking to companies looking to implement CSL into their product, is about quality control. So, how do we ensure quality for now more than 1000 different citation styles?
Travis: The Automated First Wall of Defense
Whenever someone submits a style to the CSL repository on GithHub, the first “person” they interact with is—well it’s not a person but a friendly bot called “Travis.” Travis, or “Travis CI” by full name, automatically runs a series of tests on the repository: It checks whether all styles validate, i.e. don’t contain anything that CSL doesn’t allow. It checks whether there are any macros in the style that are not used or are used but not defined. It also makes sure the style follows the naming convention we use in the repository.
This is how Travis looks when it’s happy:
An unhappy Travis looks like this:
By clicking on “Details” you can look at the exact error message(s) that cause Travis to fail. In this case, the submitted style, zeitschrift-fur-theologie-und-kirche, specifies some macros that aren’t actually used—Travis lists the test conditions that are violated.
Several Pairs of Watchful Eyes
There are a handful of people who can commit styles directly to the CSL repository (though the styles still go through a check by Travis before being widely distributed). Everyone else can contribute via so called “pull request,” basically a set of suggested changes. A couple of volunteers, mainly Rintze Zelle and me with occasional help from other people, review these submissions.
For new styles, we perform a very basic review: Does everything in the style make sense? Is it going to be used widely enough to warrant inclusion (we’re pretty liberal here, but aren’t accepting styles used by a dozen people)? Are other requirements like documentation and ISSN (for journals) met?
For existing styles, we take a closer look at the proposed changes. While we mostly take the word of contributors for what citations should look like, we do go over the changes in the CSL code to make sure they will actually and consistently have the desired effect. For widely used styles like APA, Chicago Manual, or Vancouver/NLM, we take extra steps to assure the changes get it right and will consult the respective style manual.
This process takes up a significant amount of volunteer time, especially since CSL attracts a huge amount of submissions, many from people with little or no programming experience. In the last month alone, for example, 10 new people contributed citation styles and/or fixes to CSL. We’re very proud of that! But it does mean that we spend more time guiding people through the process than most other projects likely do.
Code Maintenance—CSL on a Diet
We’ll occasionally identify common inefficiencies, sometimes even outright errors, in CSL code and fix those by scripting—everyone using their favorite tool ranging from simple sed commands to more elaborate Perl or Python scripts— (followed by manual control and fixes). In recent weeks we have removed thousands of line of unused and/or superficial code from CSL styles. While these rarely change the behavior of styles, they make our styles more efficient and easier to modify down the road.
User Feedback: A Thousand Eyes and We Still Need More
Now, you’ll say: “None of this guarantees me that the CSL style actually matches what the journal wants.” And you’ll be right. There is absolutely no way we can do active quality control on 1000 different styles. However, there are hundreds of thousands of people using reference managers that rely on CSL styles. Many of them will notice when something is wrong with a style. And when they let us know, we’ll fix it—quickly, in almost all cases.
Feedback from Reference Managers
The “when they let us know” part is quite important. For a long time, at least 90% of all error reports for styles came from Zotero users, reported via their forums and that’s still probably the single largest source of error reports for styles. But there are many other products using CSL styles now and we want to hear from them. By far the biggest recent improvement for getting error reports has developed out of our talks with Mendeley: they have now set up a simple form through which their users can submit style error reports. Those are (automatically) shared between us and Mendeley’s customer support in a google doc and have proven a very effective way for communicating errors. Unfortunately, with rare exceptions, those two are almost the only source of error reports that reach us (with the occasional pandoc user). Which is too bad, since there are at least another half a dozen reference managers using CSL.
So, if your reference manager uses CSL styles, we want to hear from you. We’d be happy to set up similar mechanisms that we currently have working with Mendeley. (colwiz, Docear, Paperpile, Qiqqa, ReadCube etc.—I’m looking at you.)
Having style errors go through the reference manager’s support system is crucial for two reasons: For one, about a third of error reports are actually errors in the data stored in the reference managers. We’re not in a position to tell users how to put in data in every reference manager. Secondly, not every reference manager interprets CSL styles the same way. It is not uncommon for error reports to actually point to errors in the way CSL styles are interpreted or fields are mapped in a reference manager—so they need to know about these errors.
Feedback from Editors and Publishers
We are particularly happy when we hear from journals directly about their citation style. This—surprisingly in my opinion—still quite rare. We recently heard from a copyeditor at The Lancet. An Elsevier employee is currently going through their list of journals systematically and submitting corrections (directly as patches, which is extra awesome, but we’d take simple error reports, too). We’ve had some smaller journals contract with me to write CSL styles for their house styles. Since they then go on to check on my work, those are guaranteed to be accurate.
Still, I think there’s a lot of room for improvement here. In particular, I wish more of the bigger publishers like Sage and Oxford University Press would be willing to work with us at least on those styles used in multiple of their journals. There should be, as I’ve said before, more journals covered by generic styles. Publishers should also make it much easier to access lists of their journals with corresponding citations styles, along the lines of what BioMed Central does.