As science becomes increasingly more data and computation intensive, maintaining the ability to build on our own or other’s prior work requires that the process that takes data and other inputs all the way to the results presented in a paper is documented and made available in full detail.

This course will teach participants how to develop workflows going from raw data to graphics and statistical analysis, using the programming language and statistical environment R. Over the course of the semester, participants will learn the skills to write scripts to automate data formatting and analysis, making their studies replicable.

Course Goals

  • Understanding and being able to use basic programming concepts
  • Automate data analysis
  • Working collaboratively and openly on code
  • Knowing how to generate dynamic documents
  • Being able to use a continuous test-driven development approach

Course format


There will be three 3-hour long lectures on January 16th, Februrary 13th, March 13th. The other weeks, the lab will be open and the instructor will be present to help working towards the final project and the package presentation.

Package demonstrations

Towards the end of the semester, each week, a group will present a package of their choice to the rest of the class. This presentation will include a general overview of the package as well as an hands-on part where everyone in the class will have the opportunity to use the package. A draft of the handout and exercises will need to be posted on Github at least 1 week before presentation so that feedback can be provided.

Final project

It’s easier to learn programming by doing. Because many of the concepts covered in the class will be new, they will require a significant time commitment to embrace them. Recognizing that time in graduate school is precious, and that the skills taught in this class should be directly applicable, students will work in groups to design a project that will improve, facilitate their current or future research. Working on the project will take most of our classroom time (and will also take time outside of class, actually it has been recommended that students should program at least every other day when learning).


This schedule is tentative. Topics and coverage may change. Other topics will be covered in short lectures depending on the interests of the class and needs for the final projects (potential topics include: data manipulation, working with dates, regular expressions, etc.)




