COVID therapeutics dataset in OpenSAFELY-TPP, filtered on non-hospitalised patients.

This notebook displays information about the Therapeutics dataset within the OpenSAFELY-TPP database. It is part of the technical documentation of the OpenSAFELY platform to help users understand the underlying data and guide analyses.

If you want to see the Python code used to create this notebook, you can view it on GitHub.

Note: all row/patient counts are rounded to the nearest 5 and counts 1-7 and in some cases <=7 removed

In [1]:
import pyodbc
import os
import pandas as pd
import numpy as np
from datetime import date, datetime
from IPython.display import display, Markdown

import sys
sys.path.append('../lib/')
from sense_checking import (
    compare_two_values,
    counts_of_distinct_values,
    get_schema,
    identify_distinct_strings,
    multiple_records,
)

pd.set_option('display.max_colwidth', 250)


# get the server credentials from environ.txt
dbconn = os.environ.get('FULL_DATABASE_URL', None).strip('"')

Notebook run date

In [2]:
display(Markdown(f"""This notebook was run on {date.today().strftime('%Y-%m-%d')}. The information below reflects the state of this dataset in OpenSAFELY-TPP as at this date. The last final update of the data was on 28/6/2023."""))

This notebook was run on 2024-01-16. The information below reflects the state of this dataset in OpenSAFELY-TPP as at this date. The last final update of the data was on 28/6/2023.

In [3]:
## Import schema

table = "Therapeutics"
where = {"_non_hospitalised": "where COVID_indication='non_hospitalised'"}

get_schema(dbconn, table, where)

Total rows in Therapeutics where COVID_indication='non_hospitalised': 58590; number of distinct values of Patient_ID rounded to nearest 5 in the table below.

ColumnType MaxLength IsNullable Distinct_Values_non_hospitalised Missing_Values_non_hospitalised Missing_Values_Percentage_non_hospitalised
TableName ColumnName
Therapeutics Patient_ID bigint 8 False 54435 0 0.0
AgeAtReceivedDate int 4 True 98 0 0.0
Received datetime 8 True 558 0 0.0
Intervention varchar 1000 True 5 0 0.0
Diagnosis varchar 1000 True 1 0 0.0
CurrentStatus varchar 1000 True 4 0 0.0
FormName varchar 1000 True 40 0 0.0
TreatmentStartDate datetime 8 True 607 1-7 0.0
Region varchar 1000 True 7 0 0.0
MOL1_onset_of_symptoms varchar 1000 True 946 45030 76.85
MOL1_high_risk_cohort varchar 1000 True 70 45115 77.0
SOT02_onset_of_symptoms varchar 1000 True 2413 37400 63.84
SOT02_risk_cohorts varchar 1000 True 87 37505 64.01
CASIM05_date_of_symptom_onset varchar 1000 True 22 58535 99.9
CASIM05_risk_cohort varchar 1000 True 12 58535 99.9
COVID_indication varchar 1000 True 1 0 0.0
Count int 4 True 4 0 0.0
Der_LoadDate varchar 1000 True 1 0 0.0

Description of General Fields

In [4]:
columns = ["Diagnosis", "FormName", "Region", "Der_LoadDate", "AgeAtReceivedDate"]
threshold = 50

counts_of_distinct_values(dbconn, table, columns, threshold=threshold, where="COVID_indication='non_hospitalised'", include_counts=False)

columns = ["COVID_indication", "Intervention", "CurrentStatus", "Count"]

counts_of_distinct_values(dbconn, table, columns, threshold=threshold, where="COVID_indication='non_hospitalised'")

Summary of values in 'Diagnosis'

filtered on COVID_indication='non_hospitalised'

There were 1 different non-missing values

and 0 missing values.

Diagnosis
0 Covid-19

Summary of values in 'FormName'

filtered on COVID_indication='non_hospitalised'

There were 40 different non-missing values

and 0 missing values.

FormName
0 CASIM05_v1.1 NHS England - Interim Clinical Commissioning Policy: Neutralising monoclonal antibodies or antivirals for non-hospitalised patients with COVID-19 Casirivimab and imdevimab patient registration form
1 M4C PAX01_v1.3 - NHS England - Interim Clinical Commissioning Policy û Treatments for non-hospitalised patients with COVID-19 (Post-pubescent children) û Nirmatrelvir plus ritonavir; Paxlovid - patient registration form
2 MOL01_v2.4 - NHS England - Interim Clinical Commissioning Policy û Treatments for non-hospitalised patients with COVID-19 - Molnupiravir patient registration form
3 MOL01_v2.5 - NHS England - Interim Clinical Commissioning Policy û Treatments for non-hospitalised patients with COVID-19 - Molnupiravir patient registration form
4 MOL01_v2.6 - NHS England - Interim Clinical Commissioning Policy û Treatments for non-hospitalised patients with COVID-19 - Molnupiravir patient registration form
5 MOL1_v1.1 NHS England - Interim Clinical Commissioning Policy - Neutralising monoclonal antibodies or antivirals for non-hospitalised patients with COVID-19 - Molnupiravir patient registration form
6 MOL1_v1.2 NHS England - Interim Clinical Commissioning Policy - Neutralising monoclonal antibodies or antivirals for non-hospitalised patients with COVID-19 - Molnupiravir patient registration form
7 MOL1_v1.3 NHS England - Interim Clinical Commissioning Policy - Neutralising monoclonal antibodies or antivirals for non-hospitalised patients with COVID-19 - Molnupiravir patient registration form
8 MOL1_v1.4 NHS England - Interim Clinical Commissioning Policy - Neutralising monoclonal antibodies or antivirals for non-hospitalised patients with COVID-19 - Molnupiravir patient registration form
9 MOL1_v2.0 - NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 - Molnupiravir patient registration form
10 MOL1_v2.1 - NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 - Molnupiravir patient registration form
11 MOL1_v2.2 - NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 - Molnupiravir patient registration form
12 MOL1_v2.3 - NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 - Molnupiravir patient registration form
13 PAX01_v1.3 - NHS England - Interim Clinical Commissioning Policy û Treatments for non-hospitalised patients with COVID-19 û Nirmatrelvir plus ritonavir; Paxlovid - patient registration form
14 PAX01_v1.4 - NHS England - Interim Clinical Commissioning Policy û Treatments for non-hospitalised patients with COVID-19 û Nirmatrelvir plus ritonavir; Paxlovid - patient registration form
15 PAX1_v1.0 - NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 û PF-07321332 (may also be known as nirmatrelvir) plus ritonavir; Paxlovid - patient re...
16 PAX1_v1.1 - NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 û Nirmatrelvir plus ritonavir; Paxlovid - patient registration form
17 PAX1_v1.2 - NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 û Nirmatrelvir plus ritonavir; Paxlovid - patient registration form
18 REM05_v1.1 NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 û Remdesivir patient registration form
19 REM05_v1.2 NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 û Remdesivir patient registration form
20 REM05_v1.3 NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 (adults) û Remdesivir patient registration form
21 REM05_v1.4 NHS England - Interim Clinical Commissioning Policy û Treatments for non-hospitalised patients with COVID-19 (adults) û Remdesivir patient registration form
22 SOT02_v1.2 - NHS England - Interim Clinical Commissioning Policy - Neutralising monoclonal antibodies or antivirals for non-hospitalised patients with COVID-19 - Sotrovimab patient registration form
23 SOT02_v1.3 - NHS England - Interim Clinical Commissioning Policy - Neutralising monoclonal antibodies or antivirals for non-hospitalised patients with COVID-19 - Sotrovimab patient registration form
24 SOT02_v2.0- NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 - Sotrovimab - patient registration form
25 SOT02_v2.1- NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 - Sotrovimab - patient registration form
26 SOT02_v2.2 - NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 (adults) - Sotrovimab - patient registration form
27 SOT02_v2.3 - NHS England - Interim Clinical Commissioning Policy û Treatments for non-hospitalised patients with COVID-19 (adults) - Sotrovimab - patient registration form
28 SOT02_v2.4 - NHS England - Interim Clinical Commissioning Policy û Treatments for non-hospitalised patients with COVID-19 (adults) - Sotrovimab - patient registration form
29 SOT04_v1.0 - NHS England - Interim Clinical Commissioning Policy û Antivirals or neutralising monoclonal antibodies for non-hospitalised patients with COVID-19 (children 12 to 17 years) - Sotrovimab - patient registration form
30 SOT04_v1.1 - NHS England - Interim Clinical Commissioning Policy û Treatments for non-hospitalised patients with COVID-19 (children 12 to 17 years) - Sotrovimab - patient registration form

There were 9 value(s) with <=7 occurrences (each), not shown above.

Summary of values in 'Region'

filtered on COVID_indication='non_hospitalised'

There were 7 different non-missing values

and 0 missing values.

Region
0 East of England
1 London
2 Midlands
3 North East and Yorkshire
4 North West
5 South East
6 South West

Summary of values in 'Der_LoadDate'

filtered on COVID_indication='non_hospitalised'

There were 1 different non-missing values

and 0 missing values.

Der_LoadDate
0 2023-06-28 09:23:03.760000000

Summary of values in 'AgeAtReceivedDate'

filtered on COVID_indication='non_hospitalised'

There were 98 different values, between '0' and '100' (after removing uncommon values)

and 0 missing values.

The most common value was '61' with 1340 occurrences (rounded to the nearest 5)

Summary of values in 'COVID_indication'

filtered on COVID_indication='non_hospitalised'

There were 1 different non-missing values

and 0 missing values.

COVID_indication row_count
0 non_hospitalised 58590

Summary of values in 'Intervention'

filtered on COVID_indication='non_hospitalised'

There were 5 different non-missing values

and 0 missing values.

Intervention row_count
0 Paxlovid 23455
1 Sotrovimab 21190
2 Molnupiravir 13565
3 Remdesivir 325
4 Casirivimab and imdevimab 55

Summary of values in 'CurrentStatus'

filtered on COVID_indication='non_hospitalised'

There were 4 different non-missing values

and 0 missing values.

CurrentStatus row_count
0 Approved 58535
1 Treatment Not Started 45

There were 2 value(s) with <=7 occurrences (each), not shown above.

Summary of values in 'Count'

filtered on COVID_indication='non_hospitalised'

There were 4 different non-missing values

and 0 missing values.

Count row_count
0 1 58490
1 2 90

There were 2 value(s) with <=7 occurrences (each), not shown above.

Description of Dates

In [5]:
columns = ["Received", "TreatmentStartDate"]

counts_of_distinct_values(dbconn, table, columns=columns, threshold=threshold, where="COVID_indication='non_hospitalised'", sort_values=True)

Summary of values in 'Received'

filtered on COVID_indication='non_hospitalised'

There were 558 different values, between '2021-12-16' and '2023-06-26' (after removing uncommon values)

and 0 missing values.

The most common value was '2022-04-11' with 425 occurrences (rounded to the nearest 5)

Summary of values in 'TreatmentStartDate'

filtered on COVID_indication='non_hospitalised'

There were 607 different values, between '2021-12-16' and '2023-06-26' (after removing uncommon values)

and 1-7 missing values.

The most common value was '2022-03-24' with 395 occurrences (rounded to the nearest 5)

Dates outside expected range

In [6]:
display(Markdown("## Past and future dates"))
for i in [0,1]:
    counts_of_distinct_values(dbconn, table, columns=[columns[i]], threshold=3, where=f"COVID_indication='non_hospitalised' AND CAST({columns[i]} AS DATE) >'2023-06-28'", sort_values=True)
    counts_of_distinct_values(dbconn, table, columns=[columns[i]], threshold=3, where=f"COVID_indication='non_hospitalised' AND CAST({columns[i]} AS DATE) <'2021-12-16'", sort_values=True)

Past and future dates

Summary of values in 'Received'

filtered on COVID_indication='non_hospitalised' AND CAST(Received AS DATE) >'2023-06-28'

There were no non-null values.

Summary of values in 'Received'

filtered on COVID_indication='non_hospitalised' AND CAST(Received AS DATE) <'2021-12-16'

There were no non-null values.

Summary of values in 'TreatmentStartDate'

filtered on COVID_indication='non_hospitalised' AND CAST(TreatmentStartDate AS DATE) >'2023-06-28'

There were 9 different non-missing values

and 0 missing values.

No.of rows per TreatmentStartDate Frequency (to nearest 5)

Note: counts with frequencies <=7 are not shown

Summary of values in 'TreatmentStartDate'

filtered on COVID_indication='non_hospitalised' AND CAST(TreatmentStartDate AS DATE) <'2021-12-16'

There were 39 different non-missing values

and 0 missing values.

No.of rows per TreatmentStartDate Frequency (to nearest 5)
0 1 35

Note: counts with frequencies <=7 are not shown

Date comparisons

In [7]:
columns = ["Received", "TreatmentStartDate"]

compare_two_values(dbconn, [table], columns=columns, where="COVID_indication='non_hospitalised'", include_counts=True)

Comparison of column values

filtered on COVID_indication='non_hospitalised'

comparison median difference (days) row_count %
0 Received < TreatmentStartDate 1 5710 9.7
1 Received = TreatmentStartDate 0 26750 45.7
2 Received > TreatmentStartDate 14 26130 44.6
3 TreatmentStartDate is missing 0 <=7 0.0
difference
Q1 Q3
comparison
Received < TreatmentStartDate 1.0 2.0
Received = TreatmentStartDate 0.0 0.0
Received > TreatmentStartDate 5.0 40.0
TreatmentStartDate is missing 0.0 0.0

Symptom onset dates and At-Risk groups

In [8]:
columns = ["MOL1_onset_of_symptoms", "SOT02_onset_of_symptoms", "CASIM05_date_of_symptom_onset"]
interventions = ['Molnupiravir', 'Sotrovimab', 'Casirivimab and imdevimab']
thresholds = [50, 50, 1]

for c, i, t in zip(columns, interventions, thresholds):
    counts_of_distinct_values(dbconn, table, columns=[c], threshold=t, where="COVID_indication='non_hospitalised'")
    counts_of_distinct_values(dbconn, table, columns=[c], threshold=t, where=f"COVID_indication='non_hospitalised' AND Intervention='{i}'")

columns = ["MOL1_high_risk_cohort", "SOT02_risk_cohorts", "CASIM05_risk_cohort"]

for c, i in zip(columns, interventions):
    counts_of_distinct_values(dbconn, table, columns=[c], threshold=50, where=f"COVID_indication='non_hospitalised' AND Intervention='{i}'")

Summary of values in 'MOL1_onset_of_symptoms'

filtered on COVID_indication='non_hospitalised'

There were 946 different values, between '02.01.22' and '9/1/22' (after removing uncommon values)

and 45030 missing values (to the nearest 5).

The most common value was '2021-12-28' with 95 occurrences (rounded to the nearest 5)

Summary of values in 'MOL1_onset_of_symptoms'

filtered on COVID_indication='non_hospitalised' AND Intervention='Molnupiravir'

There were 946 different values, between '02.01.22' and '9/1/22' (after removing uncommon values)

and 1-7 missing values.

The most common value was '2021-12-28' with 95 occurrences (rounded to the nearest 5)

Summary of values in 'SOT02_onset_of_symptoms'

filtered on COVID_indication='non_hospitalised'

There were 2413 different values, between '01/02/22' and '9/7/22' (after removing uncommon values)

and 37400 missing values (to the nearest 5).

The most common value was '2022-03-20' with 100 occurrences (rounded to the nearest 5)

Summary of values in 'SOT02_onset_of_symptoms'

filtered on COVID_indication='non_hospitalised' AND Intervention='Sotrovimab'

There were 2413 different values, between '01/02/22' and '9/7/22' (after removing uncommon values)

and 0 missing values.

The most common value was '2022-03-20' with 100 occurrences (rounded to the nearest 5)

Summary of values in 'CASIM05_date_of_symptom_onset'

filtered on COVID_indication='non_hospitalised'

There were 22 different values, between '2021-12-14' and '2021-12-16' (after removing uncommon values)

and 58535 missing values (to the nearest 5).

The most common value was '2021-12-16' with 10 occurrences (rounded to the nearest 5)

Summary of values in 'CASIM05_date_of_symptom_onset'

filtered on COVID_indication='non_hospitalised' AND Intervention='Casirivimab and imdevimab'

There were 22 different values, between '2021-12-14' and '2021-12-16' (after removing uncommon values)

and 0 missing values.

The most common value was '2021-12-16' with 10 occurrences (rounded to the nearest 5)

Summary of values in 'MOL1_high_risk_cohort'

filtered on COVID_indication='non_hospitalised' AND Intervention='Molnupiravir'

There were 70 different non-missing values

and 90 missing values (to the nearest 5).

MOL1_high_risk_cohort row_count
0 IMID 4535
1 solid cancer 2125
2 Patients with a haematological diseases and stem cell transplant recipients 1665
3 renal disease 1200
4 rare neurological conditions 1200
5 solid organ recipients 865
6 liver disease 385
7 primary immune deficiencies 315
8 HIV or AIDS 290
9 Downs syndrome 255
10 renal disease and solid organ recipients 70
11 solid cancer and IMID 60
12 renal disease and IMID 55
13 liver disease and IMID 50
14 sickle cell disease 45
15 IMID and Patients with a haematological diseases and stem cell transplant recipients 40
16 solid cancer and renal disease 35
17 solid cancer and Patients with a haematological diseases and stem cell transplant recipients 30
18 haematologic malignancy 30
19 IMID and primary immune deficiencies 25
20 IMID and rare neurological conditions 25
21 renal disease and Patients with a haematological diseases and stem cell transplant recipients 25
22 IMID and HIV or AIDS 15
23 solid cancer and rare neurological conditions 10
24 IMID and solid organ recipients 10
25 solid cancer and liver disease 10
26 renal disease and liver disease 10
27 liver disease and solid organ recipients 10
28 solid cancer and solid organ recipients 10

There were 41 value(s) with <=7 occurrences (each), not shown above.

Summary of values in 'SOT02_risk_cohorts'

filtered on COVID_indication='non_hospitalised' AND Intervention='Sotrovimab'

There were 87 different non-missing values

and 105 missing values (to the nearest 5).

SOT02_risk_cohorts row_count
0 IMID 6345
1 solid cancer 2980
2 Patients with a haematological diseases and stem cell transplant recipients 2970
3 rare neurological conditions 2070
4 renal disease 2005
5 solid organ recipients 1990
6 liver disease 575
7 immune deficiencies 545
8 HIV or AIDS 355
9 Downs syndrome 265
10 renal disease and solid organ recipients 150
11 primary immune deficiencies 105
12 renal disease and IMID 80
13 solid cancer and IMID 70
14 sickle cell disease 60
15 liver disease and IMID 55
16 solid cancer and Patients with a haematological diseases and stem cell transplant recipients 45
17 IMID and Patients with a haematological diseases and stem cell transplant recipients 45
18 IMID and immune deficiencies 40
19 IMID and solid organ recipients 30
20 solid cancer and renal disease 25
21 renal disease and Patients with a haematological diseases and stem cell transplant recipients 25
22 haematologic malignancy 20
23 liver disease and solid organ recipients 20
24 IMID and rare neurological conditions 15
25 IMID and HIV or AIDS 15
26 Patients with a haematological diseases and stem cell transplant recipients and immune deficiencies 10
27 liver disease and immune deficiencies 10
28 IMID and primary immune deficiencies 10
29 solid cancer and immune deficiencies 10
30 renal disease and liver disease 10
31 solid cancer and liver disease 10

There were 55 value(s) with <=7 occurrences (each), not shown above.

Summary of values in 'CASIM05_risk_cohort'

filtered on COVID_indication='non_hospitalised' AND Intervention='Casirivimab and imdevimab'

There were 12 different non-missing values

and 0 missing values.

CASIM05_risk_cohort row_count
0 IMID 15
1 solid organ recipients 10

There were 10 value(s) with <=7 occurrences (each), not shown above.

Distinct Risk Groups

In [9]:
columns = ["MOL1_high_risk_cohort", "SOT02_risk_cohorts", "CASIM05_risk_cohort"]
replacement = "Patients with a "
split_string = ' and '
merge_all = True

identify_distinct_strings(dbconn, table, columns, where=f"COVID_indication='non_hospitalised'", replacement=replacement, split_string=split_string, merge_all=merge_all)
0                      Downs syndrome
1                         HIV or AIDS
2                                IMID
3             haematologic malignancy
4             haematological diseases
5                 immune deficiencies
6                       liver disease
7         primary immune deficiencies
8        rare neurological conditions
9          rare neurological diseases
10                      renal disease
11                sickle cell disease
12                       solid cancer
13             solid organ recipients
14    stem cell transplant recipients
15                               None
dtype: object

Patients with multiple records

In [10]:
counts_of_distinct_values(dbconn, table, columns=["patient_id"], threshold=50, where=f"COVID_indication='non_hospitalised'", frequency_count=True)

Summary of values in 'patient_id'

filtered on COVID_indication='non_hospitalised'

There were 54435 different non-missing values (rounded to the nearest 5)

and 0 missing values.

No.of rows per patient_id Frequency (to nearest 5)
0 1 50550
1 2 3630
2 3 240
3 4 10

Note: counts with frequencies <=7 are not shown

Further investigation into patients with multiple records - which fields differ in each record?

In [11]:
fields_of_interest = ["AgeAtReceivedDate", "Received", "Intervention", "CurrentStatus", "TreatmentStartDate", "Region", "MOL1_high_risk_cohort", "SOT02_risk_cohorts", "CASIM05_risk_cohort"]
combinations = {1: ["Intervention", "Received"],
                2: ["Intervention", "TreatmentStartDate"],}

multiple_records(dbconn, table, fields_of_interest, combinations, where=f"COVID_indication='non_hospitalised'")

Patients appearing multiple times, and the fields in which they have different values in each appearance

filtered on COVID_indication='non_hospitalised'

Patient count
patient_ids_with_multiple_records 3885
Received 3705
TreatmentStartDate 3550
AgeAtReceivedDate 2165
Intervention 1815
Intervention_AND_Received 1710
Intervention_AND_TreatmentStartDate 1670
Region 150
SOT02_risk_cohorts 145
MOL1_high_risk_cohort 90
none_of_these 40
CurrentStatus 35

Fields with counts <=7:

CASIM05_risk_cohort