Tag Archives: citation styles

Quality Control for CSL Styles

One of the most common questions we get about the CSL project, especially when we’re talking to companies looking to implement CSL into their product, is about quality control. So, how do we ensure quality for now more than 1000 different citation styles?

Travis: The Automated First Wall of Defense

Whenever someone submits a style to the CSL repository on GithHub, the first “person” they interact with is—well it’s not a person but a friendly bot called “Travis.” Travis, or “Travis CI” by full name, automatically runs a series of tests on the repository: It checks whether all styles validate, i.e. don’t contain anything that CSL doesn’t allow. It checks whether there are any macros in the style that are not used or are used but not defined. It also makes sure the style follows the naming convention we use in the repository.

This is how Travis looks when it’s happy:

Travis Pass

An unhappy Travis looks like this:

Travis Fail

By clicking on “Details” you can look at the exact error message(s) that cause Travis to fail. In this case, the submitted style, zeitschrift-fur-theologie-und-kirche, specifies some macros that aren’t actually used—Travis lists the test conditions that are violated.

Travis Details

Several Pairs of Watchful Eyes

There are a handful of people who can commit styles directly to the CSL repository (though the styles still go through a check by Travis before being widely distributed). Everyone else can contribute via so called “pull request,” basically a set of suggested changes. A couple of volunteers, mainly Rintze Zelle and me with occasional help from other people, review these submissions.

For new styles, we perform a very basic review: Does everything in the style make sense? Is it going to be used widely enough to warrant inclusion (we’re pretty liberal here, but aren’t accepting styles used by a dozen people)? Are other requirements like documentation and ISSN (for journals) met?

For existing styles, we take a closer look at the proposed changes. While we mostly take the word of contributors for what citations should look like, we do go over the changes in the CSL code to make sure they will actually and consistently have the desired effect. For widely used styles like APA, Chicago Manual, or Vancouver/NLM, we take extra steps to assure the changes get it right and will consult the respective style manual.

This process takes up a significant amount of volunteer time, especially since CSL attracts a huge amount of submissions, many from people with little or no programming experience. In the last month alone, for example, 10 new people contributed citation styles and/or fixes to CSL. We’re very proud of that! But it does mean that we spend more time guiding people through the process than most other projects likely do.

Code Maintenance—CSL on a Diet

We’ll occasionally identify common inefficiencies, sometimes even outright errors, in CSL code and fix those by scripting—everyone using their favorite tool ranging from simple sed commands to more elaborate Perl or Python scripts— (followed by manual control and fixes). In recent weeks we have removed thousands of line of unused and/or superficial code from CSL styles. While these rarely change the behavior of styles, they make our styles more efficient and easier to modify down the road.

User Feedback: A Thousand Eyes and We Still Need More

Now, you’ll say: “None of this guarantees me that the CSL style actually matches what the journal wants.” And you’ll be right. There is absolutely no way we can do active quality control on 1000 different styles. However, there are hundreds of thousands of people using reference managers that rely on CSL styles. Many of them will notice when something is wrong with a style. And when they let us know, we’ll fix it—quickly, in almost all cases.

Feedback from Reference Managers

The “when they let us know” part is quite important. For a long time, at least 90% of all error reports for styles came from Zotero users, reported via their forums and that’s still probably the single largest source of error reports for styles. But there are many other products using CSL styles now and we want to hear from them. By far the biggest recent improvement for getting error reports has developed out of our talks with Mendeley: they have now set up a simple form through which their users can submit style error reports. Those are (automatically) shared between us and Mendeley’s customer support in a google doc and have proven a very effective way for communicating errors. Unfortunately, with rare exceptions, those two are almost the only source of error reports that reach us (with the occasional pandoc user). Which is too bad, since there are at least another half a dozen reference managers using CSL.

So, if your reference manager uses CSL styles, we want to hear from you. We’d be happy to set up similar mechanisms that we currently have working with Mendeley. (colwiz, Docear, Paperpile, Qiqqa, ReadCube etc.—I’m looking at you.)

Having style errors go through the reference manager’s support system is crucial for two reasons: For one, about a third of error reports are actually errors in the data stored in the reference managers. We’re not in a position to tell users how to put in data in every reference manager.  Secondly, not every reference manager interprets CSL styles the same way. It is not uncommon for error reports to actually point to errors in the way CSL styles are interpreted or fields are mapped in a reference manager—so they need to know about these errors.

Feedback from Editors and Publishers

We are particularly happy when we hear from journals directly about their citation style. This—surprisingly in my opinion—still quite rare. We recently heard from a copyeditor at The Lancet. An Elsevier employee is currently going through their list of journals systematically and submitting corrections (directly as patches, which is extra awesome, but we’d take simple error reports, too). We’ve had some smaller journals contract with me to write CSL styles for their house styles. Since they then go on to check on my work, those are guaranteed to be accurate.

Still, I think there’s a lot of room for improvement here. In particular, I wish more of the bigger publishers like Sage and Oxford University Press would be willing to work with us at least on those styles used in multiple of their journals. There should be, as I’ve said before, more journals covered by generic styles. Publishers should also make it much easier to access lists of their journals with corresponding citations styles, along the lines of what BioMed Central does.

Advertisements

Writing CSL – Features and Best Practices

With the CSL repository growing steadily, and a very large number of different contributors — more than 250 since its beginning — helping by adding or improving styles, I figured this would be a good time to highlight some features of CSL and what they mean for best practices in coding a CSL style.

Groups

CSL allows you to group items that belong together and set a “delimiter” between them. Think of a common example, the colon between place and publisher: New York, NY: Columbia University Press, 1990. You can code this as

 <text variable="publisher-place" suffix=": "/>
 <text variable="publisher"/>
 <text variable="issued" form="year" date-part="year" prefix=", "/>

The problem with this version is that if your item doesn’t have a publisher, you get a mess like New York, NY: , 1990. Using groups makes this a little longer, but much more robust:

 <group delimiter=", ">
   <group delimiter=": ">
     <text variable="publisher-place"/>
     <text variable="publisher"/>
   </group>
   <date variable="issued" form="text" date-part="year"/>
 </group>

This will produce sensible output like New York, NY, 1990 regardless of which input variable may be missing.

Best Practice 1: Prefer groups and group delimiters to affixes for punctuation and spaces between items

Macros

CSL allows you to use macros in citation style. Simply put, macros allow you to write a relatively long sequence only once. Say you need the style to give the year of publication and if there is none “n.d.”. This is the type of thing that you’re going to use multiple times in a style — e.g. in the citation, the bibliography, and maybe for sorting — each time taking eight lines of code. So instead of writing the whole thing multiple times, define

 <macro name="issued-year">
   <choose>
     <if variable="issued">
       <date variable="issued" form="text" date-part="year"/>
     </if>
     <else>
       <text term="no-date" form="short"
     </else>
   </choose>
 </macro>

and simply “call” the macro using <text macro="issued-year"/> where you would otherwise write the whole eight lines.

Code economy isn’t the only, or even principal advantage of using such macros. Their biggest advantage is that if someone else wants to change your code, e.g. for a new similar citation style, any changes in a macro only need to be made once. A huge time-saver! Finally, looking into the future, CSL was always intended to be a “modular” language: you can take one piece out of a citation style and plug it into another style. Relying extensively on macros in citation styles fosters such modularity. (Frank Bennett has written up some informal thoughts on this).

Best Practice 2: Use Macro’s extensively. Keep the cs:citation and cs:bibliography parts shorts and keep cs:choose elements in them to an absolute minimum

Terms

“Available at,” “no date,” “anonymous,” “pages” — citation styles contain many such words or short phrases. CSL defines many of them as “terms” and provides translations into dozens of language. Strictly speaking, many of these wouldn’t be necessary. To provide, e.g. a URL preceded by “Available at”, <text variable="URL" prefix="Available at "/>, works. But it’s, well, ugly. It also means that someone who wants to adapt the structure of your style but use it in Spanish will have to rewrite all such prefixes. Instead use

 <group delimiter=" ">
   <text term="available at" text-case="capitalize-first"/>
   <text variable="URL"/>
 </group>

and your style can be converted to any language by simply changing the default-locale setting in the first line of the style (or, leaving it blank, will automatically appear in the user’s install language).

But what if the term isn’t quite right? You may need “Available from” instead? You can redefine any term for any language/locale in the beginning of the style, in our example:

 <locale xml:lang="en">
   <terms>
     <term name="available at">available from</term>
   </terms>
 </locale>

Best Practice 3: Use terms and labels. Do not add terms or phrases in affixes or using text value=

Miscellanea

  • If you plan to submit the style to the repository, closely read and follow the requirements

  • Assume that users have titles stored in sentence case. Use text-case="title" as needed, but don’t use text-case="sentence"

  • Some styles have left-over clutter from an earlier version. In particular, you will often see periods (which all abbreviated terms have in CSL) first stripped using strip-periods="true" and then added again using suffix=".". Remove this and similar unnecessary code. Future users/coders will thank you.

Best Practice 4: Don’t use sentence case, and keep styles tidy by removing clutter, following CSL repository guidelines

Examples

Best practices are not ironclad commandments. Of all the “rules” above, only failing to conform to repository guidelines and forcing sentence case on a title will prevent a style from being accepted to the CSL repository. You can look at the styles for American Psychological Association or Vancouver. You’ll see a lot of rule-breaking, but also a general adherence to these recommendations to give you a sense of how a well-coded style looks.