Skip to content

Quality Assuring Analytical Outputs

RAP Quality Assurance Plan

The key resource for tools, guidance and templates in this section is derived from the "Aqua Book: guidance on producing quality analysis for the government". It sets out the best practice in quality assurance, drawn from engagement with organisations across the UK public and private sectors.

The book lists several assurance activities to provide confidence to the model developers, assurers and users that a given version of the model is fit for purpose (with 'purpose' defined in the model scope documentation). One way to assess the level of confidence in the model is through the below checklist. It helps to identify omissions or areas for improvement so that analysts and decision-makers can judge the risks of using outputs from the model as it stands.

You can also create the QA log to audit if the QA plan has been followed – tracking who did the work, who reviewed, etc. This should be completed during model development by the developer and post-development by those performing QA work. The QA checklist and log should be sent alongside any analysis that goes for clearance to assess the risk associated with any evidence generated by the model.

Quality assurance checklist

Here is a quality assurance checklist which we adapted from ONS's Quality Assurance of Code for Analysis and Research and updated to apply in Data Science team for some previous projects. You can select the relevant steps for your project depending on its complexity and the required level of quality assurance.

Governance and IG

  • Do we have an approved commission?
  • Do we have IG approval to access the data?
  • Do we have agreed scope and requirements?
  • Do we have a clear and comprehensive project specification?
  • Do we know the key stakeholders?
  • Do we have a plausible delivery roadmap?
  • Do we know who will sign off on the project
  • Do we have a QA plan?
  • Do we have a QA log?

Project management

  • The roles and responsibilities of team members are clearly defined.
  • An issue tracker (e.g Jira or Trello) is used to record development tasks.
  • New issues or tasks are guided by users’ needs and stories.
  • Issues templates are used to ensure proper logging of the title, description, labels and comments.
  • Acceptance criteria are noted for issues and tasks in JIRA board.
  • Decision log, explaining why we made various choices
  • Quality assurance standards and processes for the project are defined.

Data management

  • Do we have agreed data specifications for all inputs?
  • Do data owners agree that the data is fit for purpose?
  • Do we validate all input data?
  • Do we check for extreme and marginal values?
  • Input data are stored safely and are treated as read-only.
  • Input data are versioned. All changes to the data result in new versions being created, or changes are recorded as new records
  • Input data is documented in a data register if possible, including where they come from and their importance to the analysis.
  • Input data has been profiled and checked with the users' expectations
  • Outputs from your analysis are disposable and are regularly deleted and regenerated while analysis develops. Your analysis code can reproduce them at any time.
  • Non-sensitive data are made available to users. If data are sensitive, dummy data is made available so that the code can be run by others.
  • Data quality is monitored, as per the government data quality framework.
  • Fields within input and output datasets are documented in a data dictionary.
  • Large or complex data are stored in a database.

Project structure and clarity

  • Do we have a process flow diagram? (data flow, diagram, etc. as appropriate)
  • Do we have a README.md in the package that explains how it works and where to start? (This file details the project purpose, basic installation instructions, and examples of usage).
  • A clear, standard directory structure is used to separate input data, outputs, code and documentation.
  • Where appropriate, guidance for prospective contributors is available including a code of conduct.
  • If the code's users are not familiar with the code, more instructions should be provided to guide lead users through example use cases.

Good coding practices

  • Do we have agreed coding standards?
  • Names used in the code are informative and concise.
  • Names used in the code are explicit, rather than implicit.
  • Code logic is clear and avoids unnecessary complexity.
  • Code follows a standard style, e.g. PEP8 for Python and Google or tidyverse for R.

Version control

  • Code is version controlled using Git.
  • Code is committed regularly, preferably when a discrete unit of work has been completed.
  • An appropriate branching strategy is defined and used throughout development.
  • Code is open-sourced. Any sensitive data are omitted or replaced with dummy data.
  • Committing standards are followed such as appropriate commit summary and message supplied.
  • Commits are tagged at significant stages. This is used to indicate the state of code for specific releases or model versions.
  • Continuous integration is applied through tools such as GitHub Actions, to ensure that each change is integrated into the workflow smoothly.

Modular code

  • Individual pieces of logic are written as functions. Classes are used if more appropriate.
  • Code is grouped in themed files (modules) and is packaged for easier use.
  • Main analysis scripts import and run high level functions from the package.
  • Low level functions and classes carry out one specific task. As such, there is only one reason to change each function.
  • Repetition in the code is minimised. For example, by moving reusable code into functions or classes.
  • Objects and functions are open for extension but closed for modification; functionality can be extended without modifying the source code.
  • Subclasses retain the functionality of their parent class while adding new functionality. Parent class objects can be replaced with instances of the subclass and still work as expected.

Code documentation

  • Has a static copy of the code been made available on confluence and shared with stakeholders? (the code version used for QA)
  • Comments are used to describe why code is written in a particular way, rather than describing what the code is doing.
  • Comments are kept up to date, so they do not confuse the reader.
  • Code is not commented out to adjust which lines of code run.
  • All functions and classes are documented to describe what they do, what inputs they take and what they return.
  • Python code is documented using docstrings.
  • Human-readable (preferably HTML) documentation is generated automatically from code documentation.
  • Documentation is hosted for easy access. GitHub Pages and Read the Docs provide a free service for hosting documentation publicly.

Configuration

  • Credentials and other secrets are not written in code but are configured as environment variables.
  • Configuration is written as code and is clearly separated from code used for analysis.
  • The configuration used to generate particular outputs, releases and publications is recorded.
  • If appropriate, multiple configuration files are used and interchangeable depending on system/local/user.

Peer review

  • Code authors should annotate source code before the review
  • Peer review is conducted and recorded near to the code. Merge or pull requests are used to document review, when relevant.
  • Pair programming is used to review code and share knowledge.
  • Establish a process for fixing defects found during review process
  • Users are encouraged to participate in peer review as a team building activity.

Testing

  • Core functionality is unit tested as code. See pytest for Python and testthat for R.
  • Code based tests are run regularly.
  • Bug fixes include implementing new unit tests to ensure that the same bug does not reoccur.
  • Informal tests are recorded near to the code.
  • Stakeholder or user acceptance signoffs are recorded near to the code.
  • Tests are automatically run and recorded using continuous integration or git hooks.
  • The whole process is tested from start to finish using one or more realistic end-to-end tests.
  • Test code is clean and readable. Tests make use of fixtures and parametrisation to reduce repetition.
  • Formal user acceptance testing is conducted and recorded.
  • Integration tests ensure that multiple units of code work together as expected.

Dependency management

  • Required passwords, secrets and tokens are documented, but are stored outside of version control.
  • Required libraries and packages are documented, including their versions.
  • Working operating system environments are documented.
  • Example configuration files are provided.
  • Where appropriate, code runs independent of operating system (e.g. suitable management of file paths).
  • Dependencies are managed separately for users, developers, and testers.
  • There are as few dependencies as possible.
  • Package dependencies are managed using an environment manager such as conda env, virtualenv for Python or renv for R.
  • Docker containers or virtual machine builds are available for the code execution environment and these are version controlled.

Logging

  • Misuse or failure in the code produces informative error messages.
  • Create error exception handling when founding out critical error
  • Code configuration is recorded when the code is run.
  • Pipeline route is recorded if decisions are made in code.
  • Filter out sensitive data before save and share it

Project documentation

  • Does the methodology reflect the latest state of the code?
  • Have we explained the agreed uses and scope of the data - internal, analytical?
  • Have we documented and explained all inputs in the databases?
  • Have we included links to relevant external documentation?
  • Have we written all appropriate caveats?
  • The extent of analytical quality assurance conducted on the project is clearly documented.
  • Assumptions in the analysis and their quality are documented next to the code that implements them. These are also made available to users.
  • Copyright and licenses are specified for both documentation and code.
  • Instructions for how to cite the project are given.
  • Releases of the project used for reports, publications, or other outputs are versioned using a standard pattern such as semantic versioning.
  • A summary of changes to functionality are documented in a changelog following releases. The changelog is available to users.
  • Example usage of packages and underlying functionality is documented for developers and users.
  • Design certificates confirm that the design is compliant with requirements.
  • If appropriate, the software is fully specified.

Output

  • Do the users (analytical team) have the skills to run and operate the code?
  • Have we documented all the outputs?
  • Show the details of the table view in database that we make available for the users
  • Explanation of how to use the pipeline and product
  • Signoffs from reviewers and senior managers.

Last update: November 11, 2024
External Links Disclaimer

NHS England makes every effort to ensure that external links are accurate, up to date and relevant, however we cannot take responsibility for pages maintained by external providers.

NHS England is not affiliated with any of the websites or companies in the links to external websites.

If you come across any external links that do not work, we would be grateful if you could report them by raising an issue on our RAP Community of Practice GitHub.