Publish and share your code

We_Can_Do_It_RI recently gave a brief talk for our local R Users group about why we should bother to share R scripts or more generally code. [By the way, if you live in Sevilla, you should check out this group; that is, if you think meeting in a bar to have some beers and talk about R sounds like fun –I do!]. I presented the talk not as an expert in code sharing but as someone who recently prepared her first complete R script (reflecting the analyses of a manuscript) for publication1 and realized the challenges while embracing the value of publishing code.

The talk (which you can see here) started by listing some reasons why code is not made available. I think the key issues may be that publishing code is not required, yet it takes time; that we do not think our code or analyses are novel or unique; and that we worry others will find mistakes. These are valid concerns but perhaps are also some of the reasons why we should actually publish our code. Data and code sharing are important for science. Being able to repeat someone else’s analyses or use the exact same methods to replicate a study can improve our understanding of nature. Particularly in ecology where replication can be complicated by the complicated nature of nature itself, being able to know you are using exactly the same regression analyses to predict a pattern with new data can be very helpful. If results are different then you know it is the data that differ. Instead, if you need to write code to recreate the (hopefully) same regression models only by using the typical verbal description found in a methods section, then you may hesitate to attribute differences only to the data. Being able to redo analyses is also great for learning, it provides the opportunity to see clearly what the person did and try it yourself. Well-documented scripts can be useful teaching/learning tools. Even if what you did is not novel or unique, someone else out there does not know how to use that method and will appreciate having your script to learn.

Of course, publishing code does take time. In my case, it took probably a full week of work, just to go from my original scripts to a final version that I felt comfortable submitting as supplementary material. But for me this was not wasted time (although I hope the next one will be faster). When preparing the script I realized some code could be simplified (making it easier to read) with user-defined functions and loops, as a result I learned more about functions and loops. The final script was much shorter, looked nicer, and because it avoided duplication (cutting and pasting code to do the same thing for different variables or data subsets) it was also easier to check for errors (and I did find some small mistakes; these did not change the conclusions of the paper but were mistakes nonetheless and I am glad to have spotted them). Publishing code is generally not required and today there are actually few incentives to “waste” time this way, but more and more voices are asking for greater transparency and repeatability in science, and publishing data and code/scripts is great for transparency and repeatability. Although today your chances of publishing a paper, getting funds, or finding a job are not yet affected by whether you share data and code, I think (hope?) this is going to change. And I vow to do my (very small) part to change it. As a reviewer, author, supervisor, and employer I will encourage and value data and code sharing.

Reluctance to publish code is probably also associated with a fear of exposure. Because when you let others see the inner-workings of your research, they can see your limitations as a scientist and, Who likes to expose weaknesses? Even though I am proud of the final script I prepared, it may look like rookie code to others. It probably shows my limitations in code writing (Could I have used a faster, simpler way to do something?) and my analytical limitations (Could I have used a different, better method?). Even scarier, by providing data and code someone can find a mistake, a big one that does actually alter the conclusions or my paper! In the worst case scenario, this may happen after publication leading to corrections or retractions. But honest mistakes (and if you expose your code mistakes are likely not intentional) are not a bad thing. You are not a lesser scientist because you make mistakes. Instead, I would argue that you are a better researcher for choosing to risk your mistakes and limitations exposed.

All in all, I think publishing code is the way forward; and there are great tools to make the preparation and formatting easier, including R Markdown and Shiny. However, you do not need to learn new tools; a simple text file with your documented script is enough. Maybe it will not be fancy, but it will show what you did. Whatever format you choose once it is ready you may use different ways to make it available. Journals rarely say not to additional online supplementary materials and if they do you can always post your code in your own website, blog or a repository. So publishing code is not that hard to do, it is good for you (you learn), good for others (they learn), and good for science, so let’s do it!

1The manuscript that includes the R script I mentioned here is currently under review. Once it is accepted I will post it here (linked to this entry). The script and data will also accompany the paper as supplementary material.




One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s