Using Python f-strings to run SQL queries
TLDR
- Python can be used to run SQL strings, and they can be parametrised using f-strings to make "dynamic SQL"
- You can load SQL files with Python, then run them.
- You can have Python run your SQL Stored Procedures
Why should we care?
- If your analytical pipeline is already written in SQL, this allows you to get the benefits of Python, without having to rewrite all that SQL code.
- Python can be used to help with parametrisation of your SQL code
- You could use Python to load the data into the database, and then run your SQL script - no need for other tools.
Pre-requisites
Pre-requisite | Importance | Note |
---|---|---|
Are you ready for RAP? | Helpful | Doing some basic Python training will help you get the most out of this guide. |
In this guide we're going to take a look at what are f-strings in Python and how to use them in conjuction with SQL queries to import data from e.g. SQL Server and assign it to a Pandas
dataframe.
Why use f-strings?
F-strings were first introduced in PEP-498 and has since taken the cake when it comes to formatting in Python. Key benefits are:
- They offer simple syntax
- They support arbitrary expressions
- Faster performance vs other formatting options such as
str.format()
and%-formatting
. - Support for multi-line formatting.
Example of f-strings:
>>> name = "Maria"
>>> age = 39
>>> print(f"Hello, nice to meet you. My name is {name} and I'm {age} years old.")
`Hello, nice to meet you. My name is Maria and I'm 39 years old.`
Parametrising SQL queries using Python f-strings
You can insert SQL script into the Python code, and use f-strings to pass in parameters. So for example you can parametrise the publication year and month and edit those parameters only when running your pipeline, reducing production burden and improving publication speed.
There are two ways to implement and read in data via SQL queries wrapped in f-strings, using SQLAlchemy
.
Option 1: SQL query within code (e.g. Python) script
First, we need to create our SQL query, or have our SQL query ready to go e.g.:
SELECT * FROM [{database}].[{table}]
WHERE [PUBLICATION_YEAR] = {year}
AND [SPECIAL_LABEL] = {code_value}
Pandas
dataframe:
import sqlalchemy as sa # read in sqlalchemy package
import pandas as pd # read in pandas
import pyodbc # import odbc drivers for SQL
# Python function
def get_df_from_sql(query, server, database) -> pd.DataFrame:
"""
Use sqlalchemy to connect to the NHSD server and database with the help
of mssql and pyodbc packages
Inputs:
server: server name
database: database name
query: string containing a sql query
Output:
pandas Dataframe
"""
conn = sa.create_engine(f"mssql+pyodbc://{server}/{database}?driver=SQL+Server", fast_executemany=True)
conn.execution_options(autocommit=True)
df = pd.read_sql_query(query, conn)
return df
def create_sql_query(database, table, year, code_value) -> str:
"""Creates a function to select the required fields for the benchmark tool data
Inputs:
database: database name as defined in config.toml file
table: SQL table within database
year: the chosen year of the publication
code_value: special value for the purposes of the publication
Output:
SQL query loaded up with parameters for the publication
"""
query = f""" SELECT * FROM [{database}].[{table}]
WHERE [PUBLICATION_YEAR] = {year}
AND [SPECIAL_LABEL] = {code_value}
"""
return query
query
. The when we call this function, all we need to do is insert the arguments for database
, table
, year
and code_value
and assign the result to a variable
.
# call function and assign to variable for arguments: database = NHS, table = patients, year = 2020, code_value = 3)
sql_query_for_publication = create_sql_query("NHS", "patients", "2020", "3")
Now that we have our query loaded up with our chosen parameters, we can go ahead and create our Pandas
dataframe:
Option 2: SQL query read from a separate SQL script
Option 2 is similar to option 1 with a slight variation in where the SQL script is stored.
We create a separate file, a my_sql_query.sql
script and place our SQL query from above within it:
SELECT * FROM [{database}].[{table}]
WHERE [PUBLICATION_YEAR] = {year}
AND [SPECIAL_LABEL] = {code_value}
def read_in_sql_query(sql_file_path: str, **sql_parameters) -> str:
"""Reads in a SQL query file with parameters and fills them with the values provided in the sql_parameters dictionary
Inputs:
sql_file_path: the path to the file with query
**sql_parameters: parameters and their values that are in the query
Output:
SQL query loaded up with parameters for the publication
"""
with open(sql_file_path,'r') as sql_file:
sql_query = sql_file.read()
new_sql_query = sql_query.format(**sql_parameters)
return new_sql_query
.sql
file:
# location of sql folder containing SQL scripts
sql_file_path = r'src\sql_scripts\my_sql_query.sql'
# call function
sql_parameters = {database: "NHS", table: "patients", year: "2020", code_value: "3"}
sql_query_for_publication = read_in_sql_file(sql_file_path, **sql_parameters)
Pandas
dataframe:
Write a dataframe to SQL Server database
If you have a dataframe ready to be written in a SQL Server table then you can use the following function:
def write_df_to_server(server, database, df_to_write, table_name) -> None:
"""Writes a pandas DataFrame to a table on a given SQL server using SQL Alchemy.
Requires mssql and pyodbc packages.
Parameters:
database: database name
df_to_write: df to write to a SQL Server table
table_name: SQL Server table name
Returns
Write to a SQL Server table.
"""
conn = sa.create_engine(f"mssql+pyodbc:/{server}/{database}?driver=SQL+Server", fast_executemany=True)
conn.execution_options(autocommit=True)
df_to_write.to_sql(name=table_name, con=conn, if_exists='fail', index=False)
Executing SQL Server stored procedures using Python
First, we create a stored procedure to use as an example:
CREATE PROCEDURE PatientsNHSERegion
@PatientID nvarchar(100),
@NHSECode int
AS
BEGIN
SELECT h.NHSE_code,
h.Patient_ID,
h.Other_ID
FROM hes.table as h
WHERE h.Patient_ID = @PatientID
AND h.NHSE_code = @NHSECode;
END;
GO
Option 1: via sqlalchemy
import pandas as pd
import sqlalchemy as sa
query = 'EXEC my_procedure @PatientID:param1, @NHSECode:param2' # add parameters depending on your query
engine = sa.create_engine(f"mssql+pyodbc://{server}/{database}?driver=SQL+Server", fast_executemany=True)
df = pd.read_sql_query(query, engine, params=dict(param1='test2', param2='test3')) # this will store the results in a dataframe
Option 2: via pyodbc
When run in SQL Server, the stored procedure above will produce an output containing results from the above query. To run this via Python, we need to write the following Python script:
import pyodbc as po
# Connection variables
server = ''
database = ''
username = '' # username and password might not be required to connect
password = ''
# this can be wrapped in a fuction like in the example above using the variables server, database, username, password.
try:
# Connection string
cnxn = po.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=' +
server+';DATABASE='+database+';UID='+username+';PWD=' + password)
cursor = cnxn.cursor()
# Prepare the stored procedure execution script and parameter values
storedProc = "Exec PatientsNHSERegion @PatientID = ?, @NHSECode = ?"
params = ("1234590", 5150)
# Execute Stored Procedure With Parameters
cursor.execute(storedProc, params)
# Iterate the cursor
row = cursor.fetchone()
while row:
# Print the row
print(str(row[0]) + " : " + str(row[1] or '') )
row = cursor.fetchone()
# Close the cursor and delete it
cursor.close()
del cursor
# Close the database connection
cnxn.close()
except Exception as e:
print("Error: %s" % e)
Further resources and references
- String formatting: Input and output
- Executing SQL Server stored procedures via Python (citation for section above)
External Links Disclaimer
NHS England makes every effort to ensure that external links are accurate, up to date and relevant, however we cannot take responsibility for pages maintained by external providers.
NHS England is not affiliated with any of the websites or companies in the links to external websites.
If you come across any external links that do not work, we would be grateful if you could report them by raising an issue on our RAP Community of Practice GitHub.