codonPython package¶

Submodules¶

codonPython.age_bands module¶

codonPython.age_bands.age_band_10_years(age: int) → str¶

Place age into appropriate 10 year band

This function takes the age supplied as an argument and returns a string representing the relevant 10 year banding.

Parameters: age (int) – Age of the person
Returns: out – The 10 year age band
Return type: str

Examples

>>> age_band_10_years(3)
'0-9'
>>> age_band_10_years(None)
'Age not known'
>>> age_band_10_years(95)
'90 and over'

codonPython.age_bands.age_band_5_years(age: int) → str¶

Place age into appropriate 5 year band

This function takes the age supplied as an argument and returns a string representing the relevant 5 year banding.

Parameters: age (int) – Age of the person
Returns: out – The 5 year age band
Return type: str

Examples

>>> age_band_5_years(3)
'0-4'
>>> age_band_5_years(None)
'Age not known'
>>> age_band_5_years(95)
'90 and over'

codonPython.check_consistent_measures module¶

codonPython.check_consistent_measures.check_consistent_measures(data, geography_col: str = 'Org_Level', measure_col: str = 'Measure', measures_set: set = {}) → bool¶

Check every measure is in every geography level.

Parameters

data (pd.DataFrame) – DataFrame of data to check.
geography_col (str, default = "Org_Level") – Column name for the geography level.
measure_col (str, default = "Measure") – Column name for measure
measures_set (set, default = set()) – Set of measures that should be in every geography level. If empty, the existing global set is presumed to be correct.

Returns

Whether the checks have been passed.

Return type

bool

Examples

>>> check_consistent_measures(
...   pd.DataFrame({
...     "Geog" : ["National" ,"National", "Region", "Region", "Local", "Local",],
...     "measure" : ["m1", "m2", "m1", "m2", "m1", "m2",],
...     "Value_Unsuppressed" : [4, 2, 2, 1, 2, 1,],
...   }),
...   geography_col = "Geog",
...   measure_col = "measure",
...   measures_set = set({"m1", "m2"}),
... )
True
>>> check_consistent_measures(
...   pd.DataFrame({
...     "Org_Level" : ["National" ,"National", "Region", "Region", "Local", "Local",],
...     "Measure" : ["m1", "m3", "m1", "m2", "m1", "m2",],
...     "Value_Unsuppressed" : [4, 2, 2, 1, 2, 1,],
...   })
... )
False

codonPython.check_consistent_submissions module¶

codonPython.check_consistent_submissions.check_consistent_submissions(data, national_geog_level: str = 'National', geography_col: str = 'Org_Level', submissions_col: str = 'Value_Unsuppressed', measure_col: str = 'Measure') → bool¶

Check total submissions for each measure are the same across all geography levels except national.

Parameters

data (pd.DataFrame) – DataFrame of data to check.
national_geog_level (str, default = "National") – Geography level code for national values.
geography_col (str, default = "Org_Level") – Column name for the geography level.
submissions_col (str, default = "Value_Unsuppressed") – Column name for the submissions count.
measure_col (str, default = "Measure") – Column name for measure.

Returns

Whether the checks have been passed.

Return type

bool

Examples

>>> check_consistent_submissions(
...   pd.DataFrame({
...     "Geog" : ["N" ,"N", "Region", "Region", "Local", "Local",],
...     "measure" : ["m1", "m2", "m1", "m2", "m1", "m2",],
...     "submissions" : [4, 2, 2, 1, 2, 1,],
...   }),
...   national_geog_level = "N",
...   geography_col = "Geog",
...   submissions_col = "submissions",
...   measure_col = "measure",
... )
True
>>> check_consistent_submissions(
...   pd.DataFrame({
...     "Org_Level" : ["National" ,"National", "Region", "Region", "Local", "Local",],
...     "Measure" : ["m1", "m2", "m1", "m2", "m1", "m2",],
...     "Value_Unsuppressed" : [4, 2, 3, 1, 2, 1,],
...   })
... )
False

codonPython.check_nat_val module¶

codonPython.check_nat_val.check_nat_val(df: pandas.core.frame.DataFrame, breakdown_col: str = 'Breakdown', measure_col: str = 'Measure', value_col: str = 'Value_Unsuppressed', nat_val: str = 'National') → bool¶

Check national value less than or equal to sum of breakdowns.

This function checks that the national value is less than or equal to the sum of each organisation level breakdown. This function does not apply to values which are averages. This function does not apply to values which are percentages calculated from the numerator and denominator.

Parameters

df (pandas.DataFrame) – DataFrame of data to check.
breakdown_col (str, default = "Breakdown") – Column name for the breakdown level.
measure_col (str, default = "Measure") – Column name for measures
value_col (str, default = "Value_Unsuppressed") – Column name for values
nat_val (str, default = "National") – Value in breakdown column denoting national values

Returns

Whether the checks have been passed.

Return type

bool

Examples

>>> check_nat_val(
...   df = pd.DataFrame({
...     "Breakdown" : ['National', 'CCG', 'CCG', 'Provider', 'Provider',
... 'National' ,'CCG', 'CCG', 'Provider', 'Provider','National' ,'CCG', 'CCG',
... 'Provider', 'Provider',],
...     "Measure" : ['m1', 'm1', 'm1', 'm1', 'm1', 'm2', 'm2', 'm2', 'm2',
... 'm2', 'm3', 'm3', 'm3', 'm3', 'm3',],
...     "Value_Unsuppressed" : [9, 4, 5, 3, 6, 11, 2, 9, 7, 4, 9, 5, 4, 6,
... 3],
...   }),
...   breakdown_col = "Breakdown",
...   measure_col = "Measure",
...   value_col = "Value_Unsuppressed",
...   nat_val = "National",
... )
True
>>> check_nat_val(
...   df = pd.DataFrame({
...     "Breakdown" : ['National', 'CCG', 'CCG', 'Provider', 'Provider',
... 'National' ,'CCG', 'CCG', 'Provider', 'Provider','National' ,'CCG', 'CCG',
... 'Provider', 'Provider',],
...     "Measure" : ['m1', 'm1', 'm1', 'm1', 'm1', 'm2', 'm2', 'm2', 'm2',
... 'm2', 'm3', 'm3', 'm3', 'm3', 'm3',],
...     "Value_Unsuppressed" : [18, 4, 5, 3, 6, 11, 2, 9, 7, 4, 9, 5, 4, 6,
... 3],
...   }),
...   breakdown_col = "Breakdown",
...   measure_col = "Measure",
...   value_col = "Value_Unsuppressed",
...   nat_val = "National",
... )
False

codonPython.check_null module¶

codonPython.check_null.check_null(dataframe: pandas.core.frame.DataFrame, columns_to_be_checked: list) → int¶

Checks a pandas dataframe for null values

This function takes a pandas dataframe supplied as an argument and returns a integer value representing any null values found within the columns to check.

Parameters

data (pandas.DataFrame) – Dataframe to read
columns_to_be_checked (list) – Given dataframe columns to be checked for null values

Returns

out – The number of null values found in the given columns

Return type

int

Examples

>>> check_null(dataframe = pd.DataFrame({'col1': [1,2], 'col2': [3,4]}),columns_to_be_checked = ['col1', 'col2'])
0
>>> check_null(dataframe = pd.DataFrame({'col1': [1,numpy.nan], 'col2': [3,4]}),columns_to_be_checked = ['col1'])
1

codonPython.dateValidator module¶

codonPython.dateValidator.validDate(date_string: str) → bool¶

Validates stringtype dates of type dd/mm/yyyy, dd-mm-yyyy or dd.mm.yyyy from years 1900-9999. Leap year support included.

Parameters: date_string (str) – Date to be validated
Returns: Whether the date is valid or not
Return type: boolean

Examples

>>> validDate("11/02/1996")
True
>>> validDate("29/02/2016")
True
>>> validDate("43/01/1996")
False

codonPython.file_utils module¶

codonPython.file_utils.compare(x, y, names=['x', 'y'], dups=False, same=False, comment=False)¶

This function returns a dictionary with:

Same values between data frames x and y

Values in x, not in y

Values in y, not in x

(optional): (4) Duplicates of x (5) Duplicates of y (6) Boolean of whether x and y are the same

Parameters

x (pandas.DataFrame) – DataFrame #1
y (pandas.DataFrame) – DataFrame #2
names (list) – a list of user preferred file names e.g. [‘File1’, ‘File2’] default = [‘x’,’y’]
dups (bool) – True to include duplicates check for each file default = False
same (bool) – True to activate. Outputs True if DataFrames are the same default = False
comment (bool) – True to activate. Prints out statistics of the compariosn results e.g. number of same valeus, number of duplicates, number of outliers and whether the DataFrames are the same default = False

Returns

out

Return type

dict

Examples

‘>>> c = compare(df1, df2, names = [‘df1’,’df2’], dups = True, same = True, comment =True)’

There are 133891 same values There are 16531 outliers in df1 There are 20937 outliers in df2 There are 48704 duplicates in df1 There are 0 duplicates in df2 The DataFrames are not the same

‘>>> c = compare(df2, df2, names = [‘df2’,’df2’], dups = True, same = True, comment =True)’

There are 154444 same values There are 0 outliers in df2 There are 0 outliers in df2 There are 0 duplicates in df2 There are 0 duplicates in df2 The DataFrames are the same

codonPython.file_utils.file_search(path='.', doctype='csv', like=[''], strict=False)¶

This function creates a list of all files of a certain type, satisfying the criteria outlined in like = […] parameter. The function only searches for files in the specified folder of the current working directory that is set by the user.

Parameters

path (string) – Path to a folder in the current working directory default = ‘.’, i.e. current working directory folder
doctype (string) – Document format to search for e.g. ‘csv’ or ‘xlsx’ default = ‘csv’
like (list) – A list of words to filter the file search on default = [‘’], i.e. no filter
strict (bool) – Set True to search for filenames containing all words from ‘like’ list ( default = False

Returns

Return type

list

Examples

>>> file_search(doctype = 'md')
['README.md', 'CONTRIBUTING.md']

>>> file_search(doctype = 'md', like = ['READ'])
['README.md']

codonPython.file_utils.import_files(path='.', doctype='csv', sheet='Sheet1', subdir=False, like=[''], strict=False)¶

This function imports all documents of a given format to a dictionary and returns this dictionary, keeping original file names.

Parameters

path (string) – Path to a folder in the current working directory default = ‘.’, i.e. current working directory folder
doctype (string) – Document format to search for e.g. ‘csv’ or ‘xlsx’ default = ‘csv’
sheet (string) – Sheet name of the xlsx file default = ‘Sheet1’
subdir (bool) – True to allow download all files, including the subdirectories default = False
like (list) – A list of words to filter the file search on default = [‘’], i.e. no filter
strict (bool) – Set True to search for filenames containing all words from ‘like’ list default = False

Returns

out

Return type

dict

Examples

‘>>> import_files()’

File Data_AprF_2019 is successfully imported

File Data_AugF_2019 is successfully imported

File Data_JulF_2019 is successfully imported

File Data_JunF_2019_v1 is successfully imported

File Data_MayF_2019 is successfully imported

File Data_SepP_2019 is successfully imported

‘>>> import_files(like = [‘Aug’,’Sep’])’

File Data_AugF_2019 is successfully imported

File Data_SepP_2019 is successfully imported

codonPython.nhsd_colours module¶

codonPython.nhsd_colours.nhsd_colours()¶

Returns a dictionary full of the different official NHSD colours from the style guide: https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/nhs-digital-style-guidelines/how-we-look/colour-palette

Parameters: None –
Returns: colour_dict – A dictionary containing sets of official NHS Digital branding colours (Hexidecimal format) and fonts.
Return type: dict (Python dictionary)

codonPython.nhsd_colours.nhsd_seaborn_style()¶

Sets the seaborn style to be inline with NHSD guidlines. This means your graphs in Seaborn, or in Matplotlib will come out looking as per the NHSD style guide. Simply run this function.

Parameters: None –
Returns
Return type: None

codonPython.nhsNumber module¶

codonPython.nhsNumber.nhsNumberGenerator(to_generate: int, random_state: int = None) → list¶

Generates up to 1M random NHS numbers compliant with modulus 11 checks as recorded in the data dictonary. https://www.datadictionary.nhs.uk/data_dictionary/attributes/n/nhs/nhs_number_de.asp?shownav=1

Parameters

to_generate (int) – number of NHS numbers to generate
random_state (int, default : None) – Optional seed for random number generation, for testing and reproducibility.

Returns

generated – List of randomly generated NHS numbers

Return type

list

Examples

>>> nhsNumberGenerator(2, random_state=42)
[8429141456, 2625792787]

codonPython.nhsNumber.nhsNumberValidator(number: int) → bool¶

Validate NHS Number according to modulus 11 checks as recorded in the data dictionary. https://www.datadictionary.nhs.uk/data_dictionary/attributes/n/nhs/nhs_number_de.asp?shownav=1

Parameters: number (int) – 10 digit integer to validate.
Returns: If the number passes modulus 11 checks a.k.a. is valid.
Return type: bool

Examples

>>> nhsNumberValidator(8429141456)
True
>>> nhsNumberValidator(8429141457)
False

codonPython.suppression module¶

codonPython.suppression.suppress_value(valuein: int, rc: str = '*', upper: int = 100000000) → str¶

Suppress values less than or equal to 7, round all non-national values.

This function suppresses value if it is less than or equal to 7. If value is 0 then it will remain as 0. If value is at national level it will remain unsuppressed. All other values will be rounded to the nearest 5.

Parameters

valuein (int) – Metric value
rc (str) – Replacement character if value needs suppressing
upper (int) – Upper limit for suppression of numbers

Returns

out – Suppressed value (*), 0 or valuein if greater than 7 or national

Return type

str

Examples

>>> suppress_value(3)
'*'
>>> suppress_value(24)
'25'
>>> suppress_value(0)
'0'

codonPython.tableFromSql module¶

codonPython.tableFromSql.tableFromSql(server: str, database: str, table_name: str, user: str = '', password: str = '', schema: str = None, index_col: str = None, coerce_float: bool = True, parse_dates: list = None, columns: list = None, chunksize: int = None)¶

Returns a SQL table in a DataFrame.

Convert a table stored in SQL Server 2016 into a pandas dataframe. Uses sqlalchemy and pandas.

Parameters

server (string) – Name of the SQL server
database (string) – Name of the SQL database
user (string, default: "") – If verification is required, name of the user
password (string, default: "") – If verification is required, password of the user
table_name (string) – Name of SQL table in database.
schema (string, default : None) – Name of SQL schema in database to query (if database flavor supports this). Uses default schema if None (default).
index_col (string or list of strings, default : None) – Column(s) to set as index(MultiIndex).
coerce_float (boolean, default : True) – Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point. Can result in loss of Precision.
parse_dates (list or dict, default : None) –
- List of column names to parse as dates.
- Dict of {column_name: format string} where format string is strftime compatible in
case of parsing string times or is one of (D, s, ns, ms, us) in case of parsing integer timestamps. - Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.
columns (list, default : None) – List of column names to select from SQL table
chunksize (int, default : None) – If specified, returns an iterator where chunksize is the number of rows to include in each chunk.

Returns

Dataframe of the table requested from sql server

Return type

pd.DataFrame

Examples

# >>> tableFromSql(“myServer2”, “myDatabase2”, “myTable2”) # pd.DataFrame # >>> tableFromSql(“myServer”, “myDatabase”, “myTable”, schema=”specialSchema”, columns=[“col_1”, “col_3”]) # pd.DataFrame

codonPython.tolerance module¶

codonPython.tolerance.check_tolerance(t, y, to_exclude: int = 1, poly_features: list = [1, 2], alpha: float = 0.05, parse_dates: bool = False, predict_all: bool = False) → pandas.core.frame.DataFrame¶

Check that some future values are within a weighted least squares confidence interval.

Parameters

t (pd.Series) – N explanatory time points of shape (N, 1).
y (pd.Series) – The corresponding response variable values to X, of shape (N, 1).
to_exclude (int, default = 1) – How many of the last y values will have their tolerances checked.
poly_features (list, default = [1, 2]) – List of degrees of polynomial basis to fit to the data. One model will be produced for each number in the list, eg. the default will fit a linear and a second degree polynomial to the data and return both sets of results.
alpha (float, default = 0.05) – Alpha parameter for the weighted least squares confidence interval.
parse_dates (bool, default = True) – Set to true to parse string dates in t
predict_all (bool, default = False) – Set to true to show predictions for all points of the dataset.

Returns

DataFrame containing:: ”t” : Value for t “yhat_u” : Upper condfidence interval for y “yobs” : Observed value for y “yhat” : Predicted value for y “yhat_l” : Lower confidence interval for y “polynomial”: Max polynomial of model fit to the data

Return type

pd.DataFrame

Examples

>>> check_tolerance(
...     t = pd.Series([1001,1002,1003,1004,1005,1006]),
...     y = pd.Series([2,3,4,4.5,5,5.1]),
...     to_exclude = 2,
... )
      t     yhat_u  yobs   yhat    yhat_l  polynomial
0  1005   6.817413   5.0  5.500  4.182587           1
1  1006   7.952702   5.1  6.350  4.747298           1
2  1005   9.077182   5.0  4.875  0.672818           2
3  1006  13.252339   5.1  4.975 -3.302339           2

codonPython package¶

Submodules¶

codonPython.age_bands module¶

codonPython.check_consistent_measures module¶

codonPython.check_consistent_submissions module¶

codonPython.check_nat_val module¶

codonPython.check_null module¶

codonPython.dateValidator module¶

codonPython.file_utils module¶

codonPython.nhsd_colours module¶

codonPython.nhsNumber module¶

codonPython.suppression module¶

codonPython.tableFromSql module¶

codonPython.tolerance module¶

Module contents¶