Skip to main content

Reproducible Analytical Pipelines Blog #1

Feb 24, 2022

SDD RAP journey

Publication teams across NHS Digital are transforming their code bases into Reproducible Analytical Pipelines (RAP) to improve the transparency and reproducibility of our statistical publications.

The principles and practices used in RAP (version control, testing, etc.) can be challenging for teams to adopt while still delivering their BAU work and so the Data Science RAP squad have been offering focused support to publication teams to guide and encourage them in their RAP transformation.

Most recently, we worked with four analysts in the Smoking, Drinking, and Drugs (SDD) publication team. The publication had previously been derived using SAS. The SDD team wanted to move this to Python, using best coding practice, version control, and a standard package structure.

At the start of a RAP transformation, everyone is keen to get stuck in straight away! So we tried to keep up-front classroom based learning to a minimum, focusing on the basics of getting familiar with Python and Git. RAP transformation is a journey through different levels of maturity and is best tackled over time. Since the SDD team was setting out on this journey, we focussed on getting the fundamentals right.

Every member of the SDD team took an introductory course in Python from Datacamp, and then we led them in some tutorials about using Git for version control. This was taken up enthusiastically, as everyone learned how to branch from a repository, push code, and resolve merge conflicts. At this stage of learning, everyone was also getting used to the tools: not only Git but also tools for Python development.

Learning the theory is important but true learning happens when you try to apply the knowledge in a real-world setting. The SDD team analysts showed great determination as they stated rewriting their code into Python. We went slowly at first while everyone found their feet but in a remarkably short amount of time the team were working at pace.

But how should a team of 4 analysts collaborate to build a Python package from scratch? Deciding how to split the coding work into independent tasks can be tricky, and so we gave some support on these design decisions. Peer review is probably the most important practice that the team used. In part, peer review helps to check the quality of code but more importantly it gives an opportunity for every member of the team to get an awareness of how the whole code works. When learning a new language, a frustrating “why doesn’t my code work?” could be resolved by a simple syntax error. In the first few weeks we were available for detailed code reviews, but also those ad-hoc 5-minute fixes to keep the team on track.

There was a lot of code to migrate! The SDD team took an agile approach of producing something as quickly as possible that could be tested end-to-end, to get a sense of achievement early on. We called this the “thin slice”, where we took a single statistic from the publication and tried to reproduce it correctly using Python. The next step was to review the code and improve it to coding best practice, most importantly, to identify sections of code that are and moving those sections of code into functions.

For those early steps, all members of the team needed to stay really close together and understand how the thin slice works, and where parameters, functions, derivations, etc. are all held in the package structure. Once the thin slice was complete and understood by all, it was easier for individuals to take the next coding tasks, whether it was adding another statistic end-to-end, or writing some functions or derivations to be used by all. It got quicker as the team got used to the rhythm of collaborative development: peer review and committing code.

The SDD also gave some great feedback that will help improve the RAP journey for future teams. Some of the suggestions that we will immediately apply in our work include:

  • Staggering the training sessions over time so that coding can begin sooner and happen alongside training.
  • Giving more time to the code review process, especially the discussion of design choices.
  • Making sure everyone has the defended time to learn and practice the new skills
  • Adding more video guidance on the design process, peer review, and branching strategies

We’d encourage all publication teams to go on their own RAP journey. We can offer focused support like we have done with the SDD team or take a lighter touch by using the content on the RAP Community of Practice and attending our RAP drop-in sessions on Thursday mornings.