codonPython package

Submodules

codonPython.age_bands module

codonPython.age_bands.age_band_10_years(age: int) → str

Place age into appropriate 10 year band

This function takes the age supplied as an argument and returns a string representing the relevant 10 year banding.

Parameters

age (int) – Age of the person

Returns

out – The 10 year age band

Return type

str

Examples

>>> age_band_10_years(3)
'0-9'
>>> age_band_10_years(None)
'Age not known'
>>> age_band_10_years(95)
'90 and over'
codonPython.age_bands.age_band_5_years(age: int) → str

Place age into appropriate 5 year band

This function takes the age supplied as an argument and returns a string representing the relevant 5 year banding.

Parameters

age (int) – Age of the person

Returns

out – The 5 year age band

Return type

str

Examples

>>> age_band_5_years(3)
'0-4'
>>> age_band_5_years(None)
'Age not known'
>>> age_band_5_years(95)
'90 and over'

codonPython.check_consistent_measures module

codonPython.check_consistent_measures.check_consistent_measures(data, geography_col: str = 'Org_Level', measure_col: str = 'Measure', measures_set: set = {}) → bool

Check every measure is in every geography level.

Parameters
  • data (pd.DataFrame) – DataFrame of data to check.

  • geography_col (str, default = "Org_Level") – Column name for the geography level.

  • measure_col (str, default = "Measure") – Column name for measure

  • measures_set (set, default = set()) – Set of measures that should be in every geography level. If empty, the existing global set is presumed to be correct.

Returns

Whether the checks have been passed.

Return type

bool

Examples

>>> check_consistent_measures(
...   pd.DataFrame({
...     "Geog" : ["National" ,"National", "Region", "Region", "Local", "Local",],
...     "measure" : ["m1", "m2", "m1", "m2", "m1", "m2",],
...     "Value_Unsuppressed" : [4, 2, 2, 1, 2, 1,],
...   }),
...   geography_col = "Geog",
...   measure_col = "measure",
...   measures_set = set({"m1", "m2"}),
... )
True
>>> check_consistent_measures(
...   pd.DataFrame({
...     "Org_Level" : ["National" ,"National", "Region", "Region", "Local", "Local",],
...     "Measure" : ["m1", "m3", "m1", "m2", "m1", "m2",],
...     "Value_Unsuppressed" : [4, 2, 2, 1, 2, 1,],
...   })
... )
False

codonPython.check_consistent_submissions module

codonPython.check_consistent_submissions.check_consistent_submissions(data, national_geog_level: str = 'National', geography_col: str = 'Org_Level', submissions_col: str = 'Value_Unsuppressed', measure_col: str = 'Measure') → bool

Check total submissions for each measure are the same across all geography levels except national.

Parameters
  • data (pd.DataFrame) – DataFrame of data to check.

  • national_geog_level (str, default = "National") – Geography level code for national values.

  • geography_col (str, default = "Org_Level") – Column name for the geography level.

  • submissions_col (str, default = "Value_Unsuppressed") – Column name for the submissions count.

  • measure_col (str, default = "Measure") – Column name for measure.

Returns

Whether the checks have been passed.

Return type

bool

Examples

>>> check_consistent_submissions(
...   pd.DataFrame({
...     "Geog" : ["N" ,"N", "Region", "Region", "Local", "Local",],
...     "measure" : ["m1", "m2", "m1", "m2", "m1", "m2",],
...     "submissions" : [4, 2, 2, 1, 2, 1,],
...   }),
...   national_geog_level = "N",
...   geography_col = "Geog",
...   submissions_col = "submissions",
...   measure_col = "measure",
... )
True
>>> check_consistent_submissions(
...   pd.DataFrame({
...     "Org_Level" : ["National" ,"National", "Region", "Region", "Local", "Local",],
...     "Measure" : ["m1", "m2", "m1", "m2", "m1", "m2",],
...     "Value_Unsuppressed" : [4, 2, 3, 1, 2, 1,],
...   })
... )
False

codonPython.check_nat_val module

codonPython.check_nat_val.check_nat_val(df: pandas.core.frame.DataFrame, breakdown_col: str = 'Breakdown', measure_col: str = 'Measure', value_col: str = 'Value_Unsuppressed', nat_val: str = 'National') → bool

Check national value less than or equal to sum of breakdowns.

This function checks that the national value is less than or equal to the sum of each organisation level breakdown. This function does not apply to values which are averages. This function does not apply to values which are percentages calculated from the numerator and denominator.

Parameters
  • df (pandas.DataFrame) – DataFrame of data to check.

  • breakdown_col (str, default = "Breakdown") – Column name for the breakdown level.

  • measure_col (str, default = "Measure") – Column name for measures

  • value_col (str, default = "Value_Unsuppressed") – Column name for values

  • nat_val (str, default = "National") – Value in breakdown column denoting national values

Returns

Whether the checks have been passed.

Return type

bool

Examples

>>> check_nat_val(
...   df = pd.DataFrame({
...     "Breakdown" : ['National', 'CCG', 'CCG', 'Provider', 'Provider',
... 'National' ,'CCG', 'CCG', 'Provider', 'Provider','National' ,'CCG', 'CCG',
... 'Provider', 'Provider',],
...     "Measure" : ['m1', 'm1', 'm1', 'm1', 'm1', 'm2', 'm2', 'm2', 'm2',
... 'm2', 'm3', 'm3', 'm3', 'm3', 'm3',],
...     "Value_Unsuppressed" : [9, 4, 5, 3, 6, 11, 2, 9, 7, 4, 9, 5, 4, 6,
... 3],
...   }),
...   breakdown_col = "Breakdown",
...   measure_col = "Measure",
...   value_col = "Value_Unsuppressed",
...   nat_val = "National",
... )
True
>>> check_nat_val(
...   df = pd.DataFrame({
...     "Breakdown" : ['National', 'CCG', 'CCG', 'Provider', 'Provider',
... 'National' ,'CCG', 'CCG', 'Provider', 'Provider','National' ,'CCG', 'CCG',
... 'Provider', 'Provider',],
...     "Measure" : ['m1', 'm1', 'm1', 'm1', 'm1', 'm2', 'm2', 'm2', 'm2',
... 'm2', 'm3', 'm3', 'm3', 'm3', 'm3',],
...     "Value_Unsuppressed" : [18, 4, 5, 3, 6, 11, 2, 9, 7, 4, 9, 5, 4, 6,
... 3],
...   }),
...   breakdown_col = "Breakdown",
...   measure_col = "Measure",
...   value_col = "Value_Unsuppressed",
...   nat_val = "National",
... )
False

codonPython.check_null module

codonPython.check_null.check_null(dataframe: pandas.core.frame.DataFrame, columns_to_be_checked: list) → int

Checks a pandas dataframe for null values

This function takes a pandas dataframe supplied as an argument and returns a integer value representing any null values found within the columns to check.

Parameters
  • data (pandas.DataFrame) – Dataframe to read

  • columns_to_be_checked (list) – Given dataframe columns to be checked for null values

Returns

out – The number of null values found in the given columns

Return type

int

Examples

>>> check_null(dataframe = pd.DataFrame({'col1': [1,2], 'col2': [3,4]}),columns_to_be_checked = ['col1', 'col2'])
0
>>> check_null(dataframe = pd.DataFrame({'col1': [1,numpy.nan], 'col2': [3,4]}),columns_to_be_checked = ['col1'])
1

codonPython.dateValidator module

codonPython.dateValidator.validDate(date_string: str) → bool

Validates stringtype dates of type dd/mm/yyyy, dd-mm-yyyy or dd.mm.yyyy from years 1900-9999. Leap year support included.

Parameters

date_string (str) – Date to be validated

Returns

Whether the date is valid or not

Return type

boolean

Examples

>>> validDate("11/02/1996")
True
>>> validDate("29/02/2016")
True
>>> validDate("43/01/1996")
False

codonPython.file_utils module

codonPython.file_utils.compare(x, y, names=['x', 'y'], dups=False, same=False, comment=False)

This function returns a dictionary with:

  1. Same values between data frames x and y

  2. Values in x, not in y

  3. Values in y, not in x

(optional): (4) Duplicates of x (5) Duplicates of y (6) Boolean of whether x and y are the same

Parameters
  • x (pandas.DataFrame) – DataFrame #1

  • y (pandas.DataFrame) – DataFrame #2

  • names (list) – a list of user preferred file names e.g. [‘File1’, ‘File2’] default = [‘x’,’y’]

  • dups (bool) – True to include duplicates check for each file default = False

  • same (bool) – True to activate. Outputs True if DataFrames are the same default = False

  • comment (bool) – True to activate. Prints out statistics of the compariosn results e.g. number of same valeus, number of duplicates, number of outliers and whether the DataFrames are the same default = False

Returns

out

Return type

dict

Examples

‘>>> c = compare(df1, df2, names = [‘df1’,’df2’], dups = True, same = True, comment =True)’

There are 133891 same values There are 16531 outliers in df1 There are 20937 outliers in df2 There are 48704 duplicates in df1 There are 0 duplicates in df2 The DataFrames are not the same

‘>>> c = compare(df2, df2, names = [‘df2’,’df2’], dups = True, same = True, comment =True)’

There are 154444 same values There are 0 outliers in df2 There are 0 outliers in df2 There are 0 duplicates in df2 There are 0 duplicates in df2 The DataFrames are the same

This function creates a list of all files of a certain type, satisfying the criteria outlined in like = […] parameter. The function only searches for files in the specified folder of the current working directory that is set by the user.

Parameters
  • path (string) – Path to a folder in the current working directory default = ‘.’, i.e. current working directory folder

  • doctype (string) – Document format to search for e.g. ‘csv’ or ‘xlsx’ default = ‘csv’

  • like (list) – A list of words to filter the file search on default = [‘’], i.e. no filter

  • strict (bool) – Set True to search for filenames containing all words from ‘like’ list ( default = False

Returns

Return type

list

Examples

>>> file_search(doctype = 'md')
['README.md', 'CONTRIBUTING.md']
>>> file_search(doctype = 'md', like = ['READ'])
['README.md']
codonPython.file_utils.import_files(path='.', doctype='csv', sheet='Sheet1', subdir=False, like=[''], strict=False)

This function imports all documents of a given format to a dictionary and returns this dictionary, keeping original file names.

Parameters
  • path (string) – Path to a folder in the current working directory default = ‘.’, i.e. current working directory folder

  • doctype (string) – Document format to search for e.g. ‘csv’ or ‘xlsx’ default = ‘csv’

  • sheet (string) – Sheet name of the xlsx file default = ‘Sheet1’

  • subdir (bool) – True to allow download all files, including the subdirectories default = False

  • like (list) – A list of words to filter the file search on default = [‘’], i.e. no filter

  • strict (bool) – Set True to search for filenames containing all words from ‘like’ list default = False

Returns

out

Return type

dict

Examples

‘>>> import_files()’

File Data_AprF_2019 is successfully imported

File Data_AugF_2019 is successfully imported

File Data_JulF_2019 is successfully imported

File Data_JunF_2019_v1 is successfully imported

File Data_MayF_2019 is successfully imported

File Data_SepP_2019 is successfully imported

‘>>> import_files(like = [‘Aug’,’Sep’])’

File Data_AugF_2019 is successfully imported

File Data_SepP_2019 is successfully imported

codonPython.nhsd_colours module

codonPython.nhsd_colours.nhsd_colours()

Returns a dictionary full of the different official NHSD colours from the style guide: https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/nhs-digital-style-guidelines/how-we-look/colour-palette

Parameters

None

Returns

colour_dict – A dictionary containing sets of official NHS Digital branding colours (Hexidecimal format) and fonts.

Return type

dict (Python dictionary)

codonPython.nhsd_colours.nhsd_seaborn_style()

Sets the seaborn style to be inline with NHSD guidlines. This means your graphs in Seaborn, or in Matplotlib will come out looking as per the NHSD style guide. Simply run this function.

Parameters

None

Returns

Return type

None

codonPython.nhsNumber module

codonPython.nhsNumber.nhsNumberGenerator(to_generate: int, random_state: int = None) → list

Generates up to 1M random NHS numbers compliant with modulus 11 checks as recorded in the data dictonary. https://www.datadictionary.nhs.uk/data_dictionary/attributes/n/nhs/nhs_number_de.asp?shownav=1

Parameters
  • to_generate (int) – number of NHS numbers to generate

  • random_state (int, default : None) – Optional seed for random number generation, for testing and reproducibility.

Returns

generated – List of randomly generated NHS numbers

Return type

list

Examples

>>> nhsNumberGenerator(2, random_state=42)
[8429141456, 2625792787]
codonPython.nhsNumber.nhsNumberValidator(number: int) → bool

Validate NHS Number according to modulus 11 checks as recorded in the data dictionary. https://www.datadictionary.nhs.uk/data_dictionary/attributes/n/nhs/nhs_number_de.asp?shownav=1

Parameters

number (int) – 10 digit integer to validate.

Returns

If the number passes modulus 11 checks a.k.a. is valid.

Return type

bool

Examples

>>> nhsNumberValidator(8429141456)
True
>>> nhsNumberValidator(8429141457)
False

codonPython.suppression module

codonPython.suppression.suppress_value(valuein: int, rc: str = '*', upper: int = 100000000) → str

Suppress values less than or equal to 7, round all non-national values.

This function suppresses value if it is less than or equal to 7. If value is 0 then it will remain as 0. If value is at national level it will remain unsuppressed. All other values will be rounded to the nearest 5.

Parameters
  • valuein (int) – Metric value

  • rc (str) – Replacement character if value needs suppressing

  • upper (int) – Upper limit for suppression of numbers

Returns

out – Suppressed value (*), 0 or valuein if greater than 7 or national

Return type

str

Examples

>>> suppress_value(3)
'*'
>>> suppress_value(24)
'25'
>>> suppress_value(0)
'0'

codonPython.tableFromSql module

codonPython.tableFromSql.tableFromSql(server: str, database: str, table_name: str, user: str = '', password: str = '', schema: str = None, index_col: str = None, coerce_float: bool = True, parse_dates: list = None, columns: list = None, chunksize: int = None)

Returns a SQL table in a DataFrame.

Convert a table stored in SQL Server 2016 into a pandas dataframe. Uses sqlalchemy and pandas.

Parameters
  • server (string) – Name of the SQL server

  • database (string) – Name of the SQL database

  • user (string, default: "") – If verification is required, name of the user

  • password (string, default: "") – If verification is required, password of the user

  • table_name (string) – Name of SQL table in database.

  • schema (string, default : None) – Name of SQL schema in database to query (if database flavor supports this). Uses default schema if None (default).

  • index_col (string or list of strings, default : None) – Column(s) to set as index(MultiIndex).

  • coerce_float (boolean, default : True) – Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point. Can result in loss of Precision.

  • parse_dates (list or dict, default : None) –

    • List of column names to parse as dates.

    • Dict of {column_name: format string} where format string is strftime compatible in

    case of parsing string times or is one of (D, s, ns, ms, us) in case of parsing integer timestamps. - Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.

  • columns (list, default : None) – List of column names to select from SQL table

  • chunksize (int, default : None) – If specified, returns an iterator where chunksize is the number of rows to include in each chunk.

Returns

Dataframe of the table requested from sql server

Return type

pd.DataFrame

Examples

# >>> tableFromSql(“myServer2”, “myDatabase2”, “myTable2”) # pd.DataFrame # >>> tableFromSql(“myServer”, “myDatabase”, “myTable”, schema=”specialSchema”, columns=[“col_1”, “col_3”]) # pd.DataFrame

codonPython.tolerance module

codonPython.tolerance.check_tolerance(t, y, to_exclude: int = 1, poly_features: list = [1, 2], alpha: float = 0.05, parse_dates: bool = False, predict_all: bool = False) → pandas.core.frame.DataFrame

Check that some future values are within a weighted least squares confidence interval.

Parameters
  • t (pd.Series) – N explanatory time points of shape (N, 1).

  • y (pd.Series) – The corresponding response variable values to X, of shape (N, 1).

  • to_exclude (int, default = 1) – How many of the last y values will have their tolerances checked.

  • poly_features (list, default = [1, 2]) – List of degrees of polynomial basis to fit to the data. One model will be produced for each number in the list, eg. the default will fit a linear and a second degree polynomial to the data and return both sets of results.

  • alpha (float, default = 0.05) – Alpha parameter for the weighted least squares confidence interval.

  • parse_dates (bool, default = True) – Set to true to parse string dates in t

  • predict_all (bool, default = False) – Set to true to show predictions for all points of the dataset.

Returns

DataFrame containing:

”t” : Value for t “yhat_u” : Upper condfidence interval for y “yobs” : Observed value for y “yhat” : Predicted value for y “yhat_l” : Lower confidence interval for y “polynomial”: Max polynomial of model fit to the data

Return type

pd.DataFrame

Examples

>>> check_tolerance(
...     t = pd.Series([1001,1002,1003,1004,1005,1006]),
...     y = pd.Series([2,3,4,4.5,5,5.1]),
...     to_exclude = 2,
... )
      t     yhat_u  yobs   yhat    yhat_l  polynomial
0  1005   6.817413   5.0  5.500  4.182587           1
1  1006   7.952702   5.1  6.350  4.747298           1
2  1005   9.077182   5.0  4.875  0.672818           2
3  1006  13.252339   5.1  4.975 -3.302339           2

Module contents