codonPython package¶
Submodules¶
codonPython.age_bands module¶
-
codonPython.age_bands.
age_band_10_years
(age: int) → str¶ Place age into appropriate 10 year band
This function takes the age supplied as an argument and returns a string representing the relevant 10 year banding.
- Parameters
age (int) – Age of the person
- Returns
out – The 10 year age band
- Return type
str
Examples
>>> age_band_10_years(3) '0-9' >>> age_band_10_years(None) 'Age not known' >>> age_band_10_years(95) '90 and over'
-
codonPython.age_bands.
age_band_5_years
(age: int) → str¶ Place age into appropriate 5 year band
This function takes the age supplied as an argument and returns a string representing the relevant 5 year banding.
- Parameters
age (int) – Age of the person
- Returns
out – The 5 year age band
- Return type
str
Examples
>>> age_band_5_years(3) '0-4' >>> age_band_5_years(None) 'Age not known' >>> age_band_5_years(95) '90 and over'
codonPython.check_consistent_measures module¶
-
codonPython.check_consistent_measures.
check_consistent_measures
(data, geography_col: str = 'Org_Level', measure_col: str = 'Measure', measures_set: set = {}) → bool¶ Check every measure is in every geography level.
- Parameters
data (pd.DataFrame) – DataFrame of data to check.
geography_col (str, default = "Org_Level") – Column name for the geography level.
measure_col (str, default = "Measure") – Column name for measure
measures_set (set, default = set()) – Set of measures that should be in every geography level. If empty, the existing global set is presumed to be correct.
- Returns
Whether the checks have been passed.
- Return type
bool
Examples
>>> check_consistent_measures( ... pd.DataFrame({ ... "Geog" : ["National" ,"National", "Region", "Region", "Local", "Local",], ... "measure" : ["m1", "m2", "m1", "m2", "m1", "m2",], ... "Value_Unsuppressed" : [4, 2, 2, 1, 2, 1,], ... }), ... geography_col = "Geog", ... measure_col = "measure", ... measures_set = set({"m1", "m2"}), ... ) True >>> check_consistent_measures( ... pd.DataFrame({ ... "Org_Level" : ["National" ,"National", "Region", "Region", "Local", "Local",], ... "Measure" : ["m1", "m3", "m1", "m2", "m1", "m2",], ... "Value_Unsuppressed" : [4, 2, 2, 1, 2, 1,], ... }) ... ) False
codonPython.check_consistent_submissions module¶
-
codonPython.check_consistent_submissions.
check_consistent_submissions
(data, national_geog_level: str = 'National', geography_col: str = 'Org_Level', submissions_col: str = 'Value_Unsuppressed', measure_col: str = 'Measure') → bool¶ Check total submissions for each measure are the same across all geography levels except national.
- Parameters
data (pd.DataFrame) – DataFrame of data to check.
national_geog_level (str, default = "National") – Geography level code for national values.
geography_col (str, default = "Org_Level") – Column name for the geography level.
submissions_col (str, default = "Value_Unsuppressed") – Column name for the submissions count.
measure_col (str, default = "Measure") – Column name for measure.
- Returns
Whether the checks have been passed.
- Return type
bool
Examples
>>> check_consistent_submissions( ... pd.DataFrame({ ... "Geog" : ["N" ,"N", "Region", "Region", "Local", "Local",], ... "measure" : ["m1", "m2", "m1", "m2", "m1", "m2",], ... "submissions" : [4, 2, 2, 1, 2, 1,], ... }), ... national_geog_level = "N", ... geography_col = "Geog", ... submissions_col = "submissions", ... measure_col = "measure", ... ) True >>> check_consistent_submissions( ... pd.DataFrame({ ... "Org_Level" : ["National" ,"National", "Region", "Region", "Local", "Local",], ... "Measure" : ["m1", "m2", "m1", "m2", "m1", "m2",], ... "Value_Unsuppressed" : [4, 2, 3, 1, 2, 1,], ... }) ... ) False
codonPython.check_nat_val module¶
-
codonPython.check_nat_val.
check_nat_val
(df: pandas.core.frame.DataFrame, breakdown_col: str = 'Breakdown', measure_col: str = 'Measure', value_col: str = 'Value_Unsuppressed', nat_val: str = 'National') → bool¶ Check national value less than or equal to sum of breakdowns.
This function checks that the national value is less than or equal to the sum of each organisation level breakdown. This function does not apply to values which are averages. This function does not apply to values which are percentages calculated from the numerator and denominator.
- Parameters
df (pandas.DataFrame) – DataFrame of data to check.
breakdown_col (str, default = "Breakdown") – Column name for the breakdown level.
measure_col (str, default = "Measure") – Column name for measures
value_col (str, default = "Value_Unsuppressed") – Column name for values
nat_val (str, default = "National") – Value in breakdown column denoting national values
- Returns
Whether the checks have been passed.
- Return type
bool
Examples
>>> check_nat_val( ... df = pd.DataFrame({ ... "Breakdown" : ['National', 'CCG', 'CCG', 'Provider', 'Provider', ... 'National' ,'CCG', 'CCG', 'Provider', 'Provider','National' ,'CCG', 'CCG', ... 'Provider', 'Provider',], ... "Measure" : ['m1', 'm1', 'm1', 'm1', 'm1', 'm2', 'm2', 'm2', 'm2', ... 'm2', 'm3', 'm3', 'm3', 'm3', 'm3',], ... "Value_Unsuppressed" : [9, 4, 5, 3, 6, 11, 2, 9, 7, 4, 9, 5, 4, 6, ... 3], ... }), ... breakdown_col = "Breakdown", ... measure_col = "Measure", ... value_col = "Value_Unsuppressed", ... nat_val = "National", ... ) True >>> check_nat_val( ... df = pd.DataFrame({ ... "Breakdown" : ['National', 'CCG', 'CCG', 'Provider', 'Provider', ... 'National' ,'CCG', 'CCG', 'Provider', 'Provider','National' ,'CCG', 'CCG', ... 'Provider', 'Provider',], ... "Measure" : ['m1', 'm1', 'm1', 'm1', 'm1', 'm2', 'm2', 'm2', 'm2', ... 'm2', 'm3', 'm3', 'm3', 'm3', 'm3',], ... "Value_Unsuppressed" : [18, 4, 5, 3, 6, 11, 2, 9, 7, 4, 9, 5, 4, 6, ... 3], ... }), ... breakdown_col = "Breakdown", ... measure_col = "Measure", ... value_col = "Value_Unsuppressed", ... nat_val = "National", ... ) False
codonPython.check_null module¶
-
codonPython.check_null.
check_null
(dataframe: pandas.core.frame.DataFrame, columns_to_be_checked: list) → int¶ Checks a pandas dataframe for null values
This function takes a pandas dataframe supplied as an argument and returns a integer value representing any null values found within the columns to check.
- Parameters
data (pandas.DataFrame) – Dataframe to read
columns_to_be_checked (list) – Given dataframe columns to be checked for null values
- Returns
out – The number of null values found in the given columns
- Return type
int
Examples
>>> check_null(dataframe = pd.DataFrame({'col1': [1,2], 'col2': [3,4]}),columns_to_be_checked = ['col1', 'col2']) 0 >>> check_null(dataframe = pd.DataFrame({'col1': [1,numpy.nan], 'col2': [3,4]}),columns_to_be_checked = ['col1']) 1
codonPython.dateValidator module¶
-
codonPython.dateValidator.
validDate
(date_string: str) → bool¶ Validates stringtype dates of type dd/mm/yyyy, dd-mm-yyyy or dd.mm.yyyy from years 1900-9999. Leap year support included.
- Parameters
date_string (str) – Date to be validated
- Returns
Whether the date is valid or not
- Return type
boolean
Examples
>>> validDate("11/02/1996") True >>> validDate("29/02/2016") True >>> validDate("43/01/1996") False
codonPython.file_utils module¶
-
codonPython.file_utils.
compare
(x, y, names=['x', 'y'], dups=False, same=False, comment=False)¶ This function returns a dictionary with:
Same values between data frames x and y
Values in x, not in y
Values in y, not in x
(optional): (4) Duplicates of x (5) Duplicates of y (6) Boolean of whether x and y are the same
- Parameters
x (pandas.DataFrame) – DataFrame #1
y (pandas.DataFrame) – DataFrame #2
names (list) – a list of user preferred file names e.g. [‘File1’, ‘File2’] default = [‘x’,’y’]
dups (bool) – True to include duplicates check for each file default = False
same (bool) – True to activate. Outputs True if DataFrames are the same default = False
comment (bool) – True to activate. Prints out statistics of the compariosn results e.g. number of same valeus, number of duplicates, number of outliers and whether the DataFrames are the same default = False
- Returns
out
- Return type
dict
Examples
‘>>> c = compare(df1, df2, names = [‘df1’,’df2’], dups = True, same = True, comment =True)’
There are 133891 same values There are 16531 outliers in df1 There are 20937 outliers in df2 There are 48704 duplicates in df1 There are 0 duplicates in df2 The DataFrames are not the same
‘>>> c = compare(df2, df2, names = [‘df2’,’df2’], dups = True, same = True, comment =True)’
There are 154444 same values There are 0 outliers in df2 There are 0 outliers in df2 There are 0 duplicates in df2 There are 0 duplicates in df2 The DataFrames are the same
-
codonPython.file_utils.
file_search
(path='.', doctype='csv', like=[''], strict=False)¶ This function creates a list of all files of a certain type, satisfying the criteria outlined in like = […] parameter. The function only searches for files in the specified folder of the current working directory that is set by the user.
- Parameters
path (string) – Path to a folder in the current working directory default = ‘.’, i.e. current working directory folder
doctype (string) – Document format to search for e.g. ‘csv’ or ‘xlsx’ default = ‘csv’
like (list) – A list of words to filter the file search on default = [‘’], i.e. no filter
strict (bool) – Set True to search for filenames containing all words from ‘like’ list ( default = False
- Returns
- Return type
list
Examples
>>> file_search(doctype = 'md') ['README.md', 'CONTRIBUTING.md']
>>> file_search(doctype = 'md', like = ['READ']) ['README.md']
-
codonPython.file_utils.
import_files
(path='.', doctype='csv', sheet='Sheet1', subdir=False, like=[''], strict=False)¶ This function imports all documents of a given format to a dictionary and returns this dictionary, keeping original file names.
- Parameters
path (string) – Path to a folder in the current working directory default = ‘.’, i.e. current working directory folder
doctype (string) – Document format to search for e.g. ‘csv’ or ‘xlsx’ default = ‘csv’
sheet (string) – Sheet name of the xlsx file default = ‘Sheet1’
subdir (bool) – True to allow download all files, including the subdirectories default = False
like (list) – A list of words to filter the file search on default = [‘’], i.e. no filter
strict (bool) – Set True to search for filenames containing all words from ‘like’ list default = False
- Returns
out
- Return type
dict
Examples
‘>>> import_files()’
File Data_AprF_2019 is successfully imported
File Data_AugF_2019 is successfully imported
File Data_JulF_2019 is successfully imported
File Data_JunF_2019_v1 is successfully imported
File Data_MayF_2019 is successfully imported
File Data_SepP_2019 is successfully imported
‘>>> import_files(like = [‘Aug’,’Sep’])’
File Data_AugF_2019 is successfully imported
File Data_SepP_2019 is successfully imported
codonPython.nhsd_colours module¶
-
codonPython.nhsd_colours.
nhsd_colours
()¶ Returns a dictionary full of the different official NHSD colours from the style guide: https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/nhs-digital-style-guidelines/how-we-look/colour-palette
- Parameters
None –
- Returns
colour_dict – A dictionary containing sets of official NHS Digital branding colours (Hexidecimal format) and fonts.
- Return type
dict (Python dictionary)
-
codonPython.nhsd_colours.
nhsd_seaborn_style
()¶ Sets the seaborn style to be inline with NHSD guidlines. This means your graphs in Seaborn, or in Matplotlib will come out looking as per the NHSD style guide. Simply run this function.
- Parameters
None –
- Returns
- Return type
None
codonPython.nhsNumber module¶
-
codonPython.nhsNumber.
nhsNumberGenerator
(to_generate: int, random_state: int = None) → list¶ Generates up to 1M random NHS numbers compliant with modulus 11 checks as recorded in the data dictonary. https://www.datadictionary.nhs.uk/data_dictionary/attributes/n/nhs/nhs_number_de.asp?shownav=1
- Parameters
to_generate (int) – number of NHS numbers to generate
random_state (int, default : None) – Optional seed for random number generation, for testing and reproducibility.
- Returns
generated – List of randomly generated NHS numbers
- Return type
list
Examples
>>> nhsNumberGenerator(2, random_state=42) [8429141456, 2625792787]
-
codonPython.nhsNumber.
nhsNumberValidator
(number: int) → bool¶ Validate NHS Number according to modulus 11 checks as recorded in the data dictionary. https://www.datadictionary.nhs.uk/data_dictionary/attributes/n/nhs/nhs_number_de.asp?shownav=1
- Parameters
number (int) – 10 digit integer to validate.
- Returns
If the number passes modulus 11 checks a.k.a. is valid.
- Return type
bool
Examples
>>> nhsNumberValidator(8429141456) True >>> nhsNumberValidator(8429141457) False
codonPython.suppression module¶
-
codonPython.suppression.
suppress_value
(valuein: int, rc: str = '*', upper: int = 100000000) → str¶ Suppress values less than or equal to 7, round all non-national values.
This function suppresses value if it is less than or equal to 7. If value is 0 then it will remain as 0. If value is at national level it will remain unsuppressed. All other values will be rounded to the nearest 5.
- Parameters
valuein (int) – Metric value
rc (str) – Replacement character if value needs suppressing
upper (int) – Upper limit for suppression of numbers
- Returns
out – Suppressed value (*), 0 or valuein if greater than 7 or national
- Return type
str
Examples
>>> suppress_value(3) '*' >>> suppress_value(24) '25' >>> suppress_value(0) '0'
codonPython.tableFromSql module¶
-
codonPython.tableFromSql.
tableFromSql
(server: str, database: str, table_name: str, user: str = '', password: str = '', schema: str = None, index_col: str = None, coerce_float: bool = True, parse_dates: list = None, columns: list = None, chunksize: int = None)¶ Returns a SQL table in a DataFrame.
Convert a table stored in SQL Server 2016 into a pandas dataframe. Uses sqlalchemy and pandas.
- Parameters
server (string) – Name of the SQL server
database (string) – Name of the SQL database
user (string, default: "") – If verification is required, name of the user
password (string, default: "") – If verification is required, password of the user
table_name (string) – Name of SQL table in database.
schema (string, default : None) – Name of SQL schema in database to query (if database flavor supports this). Uses default schema if None (default).
index_col (string or list of strings, default : None) – Column(s) to set as index(MultiIndex).
coerce_float (boolean, default : True) – Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point. Can result in loss of Precision.
parse_dates (list or dict, default : None) –
List of column names to parse as dates.
Dict of {column_name: format string} where format string is strftime compatible in
case of parsing string times or is one of (D, s, ns, ms, us) in case of parsing integer timestamps. - Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.
columns (list, default : None) – List of column names to select from SQL table
chunksize (int, default : None) – If specified, returns an iterator where chunksize is the number of rows to include in each chunk.
- Returns
Dataframe of the table requested from sql server
- Return type
pd.DataFrame
Examples
# >>> tableFromSql(“myServer2”, “myDatabase2”, “myTable2”) # pd.DataFrame # >>> tableFromSql(“myServer”, “myDatabase”, “myTable”, schema=”specialSchema”, columns=[“col_1”, “col_3”]) # pd.DataFrame
codonPython.tolerance module¶
-
codonPython.tolerance.
check_tolerance
(t, y, to_exclude: int = 1, poly_features: list = [1, 2], alpha: float = 0.05, parse_dates: bool = False, predict_all: bool = False) → pandas.core.frame.DataFrame¶ Check that some future values are within a weighted least squares confidence interval.
- Parameters
t (pd.Series) – N explanatory time points of shape (N, 1).
y (pd.Series) – The corresponding response variable values to X, of shape (N, 1).
to_exclude (int, default = 1) – How many of the last y values will have their tolerances checked.
poly_features (list, default = [1, 2]) – List of degrees of polynomial basis to fit to the data. One model will be produced for each number in the list, eg. the default will fit a linear and a second degree polynomial to the data and return both sets of results.
alpha (float, default = 0.05) – Alpha parameter for the weighted least squares confidence interval.
parse_dates (bool, default = True) – Set to true to parse string dates in t
predict_all (bool, default = False) – Set to true to show predictions for all points of the dataset.
- Returns
- DataFrame containing:
”t” : Value for t “yhat_u” : Upper condfidence interval for y “yobs” : Observed value for y “yhat” : Predicted value for y “yhat_l” : Lower confidence interval for y “polynomial”: Max polynomial of model fit to the data
- Return type
pd.DataFrame
Examples
>>> check_tolerance( ... t = pd.Series([1001,1002,1003,1004,1005,1006]), ... y = pd.Series([2,3,4,4.5,5,5.1]), ... to_exclude = 2, ... ) t yhat_u yobs yhat yhat_l polynomial 0 1005 6.817413 5.0 5.500 4.182587 1 1 1006 7.952702 5.1 6.350 4.747298 1 2 1005 9.077182 5.0 4.875 0.672818 2 3 1006 13.252339 5.1 4.975 -3.302339 2