COVID-19 identification in primary care records from Feb 2020- November 2021: classification of codes for OpenSAFELY studies

Background

Primary care records offer an opportunity to ascertain cases of COVID-19 which do not necessarily result in hospital admission or death. This could be useful for studying the burden of COVID-19 in the community, risk factors for SARS-CoV-2 infection separately to risk of severe COVID-19, risk factors for mortality and case fatality ratios among those infected, and post-viral effects in people who had COVID-19 that did not require hospitalisation.

There are over 100 primary care (CTV3) codes with terms related to COVID-19 used by TPP and available for selection in studies performed in the OpenSAFELY platform (https://opensafely.org/). The majority of these codes have been newly created for use in the current pandemic. The aim of this work was to assign these codes into categories related to the identification of COVID-19 in primary care, and to provide advice for studies using the OpenSAFELY platform that require people to be classified by their COVID-19 case status as defined in primary care records (either as an exposure or as an outcome).

Methods

An initial list of TPP primary care codes related to COVID-19 was obtained by searching the TPP database for terms containing "COV-2", "Coronavirus", or "COVID". The returned terms were cross-checked against the NHS Digital COVID-19 SNOMED CT codes and CTV3 codes for any missing terms which were added to the list when found. The resulting list of terms was then reviewed by a team of clinicians, epidemiologists and statisticians in order to identify distinct categories of terms and assign terms into one of these distinct categories.

An initial analysis of (probable case and suspected case) sub-categories was then performed by plotting the following using OpenSAFELY data from between February 2020 – November 2021 (1) the frequency of codes entered into TPP software by GPs over time and (2) the proportion of people dying due to (a) COVID-19 and (b) causes other than COVID-19 (using ONS cause of death data) in the 80 days after a record of a positive test in either primary care TPP data or in SGSS data.

Results

A total of 187 terms were identified. These were assigned into the 14 categories/subcategories detailed in the table below. The 14 codelists for classifying COVID-19 are publicly available on OpenSAFELY.org for inspection and re-use codelists.opensafely.org.

Codelist Description Count
Probable case: clinical code Clinical diagnosis of COVID-19 made 102493
Probable case: positive test Record of positive test result for SARS-CoV-2 (active infection) 2505542
Probable case: sequelae Symptom or condition recorded as secondary to SARS-CoV-2 6973
Suspected case: advice General advice given about SARS-CoV-2 693747
Suspected case: had test Record of having had a test for active infection with SARS-CoV-2 624177
Suspected case: had antigen test Record of having had an antigen test for infection with SARS-CoV-2 48
Suspected case: isolation code Self- or household-isolation recorded 128491
Suspected case: non-specific clinical assessment Clinical assessments plausibly related to COVID-19 75
Suspected case: suspected codes "Suspect" mentioned, or previous COVID-19 reported 1124105
Historic case SARS-CoV-2 antibodies or immunity recorded 139268
Potential historic case Has had a test for SARS-CoV-2 antibodies 153787
Exposure to disease Record of contact/exposure/procedure 36625
Antigen test negative Record of negative test result for SARS-CoV-2 13398671
COVID-19 related but case status not specified Healthcare contact related to COVID-19 but not case status 21880912

"Probable" and "suspected" sub-categories are explored further here. Plots of frequency of codes (Figure 1) showed that the use of "probable case: positive test" (n=2505542) was far more frequent than "probable case: clinical code" (n=102493) and "probable case: sequelae" (n=6973) over the study period. The use of "probable case: positive test" was less frequent than "SGSS positive test" (n=2743940) but followed a similar distribution. Suspected case sub-categories were used much more frequently than "probable case: positive test" - suspected case: "advice given" (n=693747), "isolation code" (n=128491) and "suspected codes" (n=1124105) (Figure 1). Plots of causes of death after each code showed marked differences in the proportion of death due to COVID-19 compared to deaths due to other causes following a positive SGSS test in contrast, COVID-19 deaths were not substantially higher than non-COVID deaths following "probable case: positive test" codes (Figure 2).

Conclusion

A relatively low level of COVID-19 related mortality in people identified as "probable cases" is consistent with these codes failing to identify the most severe COVID-19 cases with high specificity. "Suspected case" codes were initially more widely used but do not seem to identify covid cases and should be used with care. Further work will include investigating code sensitivity, and understanding how individual patient characteristics relate to the varying probability of being tested.

Figure 1: Frequency of primary care code use over time.

Figure 2: Comparison of mortality from COVID-19 or other causes in the 80 days following a positive test

Technical details

This notebook was run on 2022-03-02. The information below is based on data extracted from the OpenSAFELY-TPP database on 2022-03-02.

If a clinical code appears in the primary care record on multiple dates, the earliest date is used.

This dataset was created using the study definition /analysis/study_definintion.py.

If there are multiple events per patient within the extraction period, the earliest 6 events are extracted.

Only patients registered at their practice continuously between 1 Feb 2020 and 28 Nov 2021 are included.

Notes on OpenSAFELY

OpenSAFELY is a data analytics platform built by a mixed team of software developers, clinicians, and epidemiologists from the Oxford DataLab, London School of Hygiene and Tropical Medicine Electronic Health Record research group, health software company TPP and NHS England. It represents a fundamentally different way of conducting electronic health record (EHR) research: instead of sending EHR data to a third party for analysis, we've developed a system for conducting analyses within the secure environment where the data is already stored, so that the electronic health record data never leaves the NHS ecosystem.

Currently, OpenSAFELY uses the electronic health records of all patients registered at a GP practice using the SystmOne clinical information system run by TPP, covering around 22 million people. Additional data for these patients covering COVID-related tests, hospital admissions, ITU admissions, and registered deaths are also securely imported to the platform.

For more information, visit https://opensafely.org