Skip to content

Job request: 23804

Organisation:
The London School of Hygiene & Tropical Medicine
Workspace:
covid_collateral_hf_update
ID:
z7sroa5b4q7wl77e

This page shows the technical details of what happened when the authorised researcher Emily Herrett requested one or more actions to be run against real patient data in the project, within a secure environment.

By cross-referencing the list of jobs with the pipeline section below, you can infer what security level various outputs were written to. Researchers can never directly view outputs marked as highly_sensitive ; they can only request that code runs against them. Outputs marked as moderately_sensitive can be viewed by an approved researcher by logging into a highly secure environment. Only outputs marked as moderately_sensitive can be requested for release to the public, via a controlled output review service.

Jobs

Pipeline

Show project.yaml
version: '3.0'

# Ignore this`expectation` block. It is required but not used, and will be removed in future versions.
expectations:
  population_size: 10000

actions:
  generate_dataset_prepandemic:
    run: ehrql:v1 generate-dataset analysis/dataset_definition_prepandemic.py --output output/dataset_prepandemic.csv
    outputs:
      highly_sensitive:
        dataset: output/dataset_prepandemic.csv

  generate_dataset_pandemic:
    run: ehrql:v1 generate-dataset analysis/dataset_definition_pandemic.py --output output/dataset_pandemic.csv
    outputs:
      highly_sensitive:
        dataset: output/dataset_pandemic.csv

  generate_dataset_postpandemic:
    run: ehrql:v1 generate-dataset analysis/dataset_definition_postpandemic.py --output output/dataset_postpandemic.csv
    outputs:
      highly_sensitive:
        dataset: output/dataset_postpandemic.csv

  generate_dataset_escalation:
    run: ehrql:v1 generate-dataset analysis/dataset_definition_drug_escalation.py --output output/dataset_drug_escalation.csv
    outputs:
      highly_sensitive:
        dataset: output/dataset_drug_escalation.csv

  # Generate datasets for analysis   001  
  generate_analysis_datasets:
    run: stata-mp:latest analysis/001_cr_define_covariates_cohorts.do
    needs: [generate_dataset_prepandemic, generate_dataset_pandemic, generate_dataset_postpandemic, generate_dataset_escalation]
    outputs:
      highly_sensitive:
        log1: logs/001_cr_define_covariates_cohorts.log
        data1: output/prepandemic.dta 
        data2: output/pandemic.dta 
        data3: output/postpandemic.dta 
        data4: output/drug_escalation.dta 

  # Drug prevalence dataset: 102
  generate_drugprevalence:
    run: stata-mp:latest analysis/102_cr_drug_prevalence_or.do
    needs: [generate_dataset_prepandemic, generate_dataset_pandemic, generate_dataset_postpandemic, generate_analysis_datasets]
    outputs:
      moderately_sensitive:
        log1: logs/102_cr_drug_prevalence_or.log
        data1: output/tabfig/prevalences_summary_prepandemic_redacted_rounded_or.csv
        data2: output/tabfig/prevalences_summary_pandemic_redacted_rounded_or.csv 
        data3: output/tabfig/prevalences_summary_postpandemic_redacted_rounded_or.csv

  # Drug prevalence dataset: 102A
  generate_drugprevalence_coms:
    run: stata-mp:latest analysis/102_A_cr_drug_prevalence_contraind_combinations.do
    needs: [generate_dataset_prepandemic, generate_dataset_pandemic, generate_dataset_postpandemic, generate_analysis_datasets]
    outputs:
      moderately_sensitive:
        log1: logs/102_A_cr_drug_prevalence_combinations.log
        data1: output/tabfig/combinations*.csv
        data2: output/tabfig/pillars*.csv

  # Drug prevalence dataset: 102B
  generate_drugprevalence_duration:
    run: stata-mp:latest analysis/102_B_cr_drug_prevalence_duration.do
    needs: [generate_dataset_prepandemic, generate_dataset_pandemic, generate_dataset_postpandemic, generate_analysis_datasets]
    outputs:
      moderately_sensitive:
        log1: logs/102_cr_drug_prevalence_duration.log
        data1: output/tabfig/prevalences_summary_prepandemic_redacted_rounded_duration.csv
        data2: output/tabfig/prevalences_summary_pandemic_redacted_rounded_duration.csv 
        data3: output/tabfig/prevalences_summary_postpandemic_redacted_rounded_duration.csv


  # Cohort rates: 103
  generate_rates:
    run: stata-mp:latest analysis/103_cr_cohort_rates_repeated.do
    needs: [generate_dataset_prepandemic, generate_dataset_pandemic, generate_dataset_postpandemic, generate_analysis_datasets]
    outputs:
      moderately_sensitive:
        log1: logs/103_cohort_rates_repeated.log
        data1: output/tabfig/rates_repeated_prepandemic_redacted_rounded.csv 
        #data2: output/tabfig/rates_repeated_pandemic_redacted_rounded.csv 
        #data3: output/tabfig/rates_repeated_postpandemic_redacted_rounded.csv 

  # Cohort rates in diabetes: 103A
  generate_rates_A:
    run: stata-mp:latest analysis/103_A_cr_cohort_rates_repeated_diabetes.do
    needs: [generate_dataset_prepandemic, generate_dataset_pandemic, generate_dataset_postpandemic, generate_analysis_datasets]
    outputs:
      moderately_sensitive:
        log1: logs/103_A_cohort_rates_repeated_diabetes.log
        data1: output/tabfig/rates_repeated_pandemic_redacted_rounded_diabetes.csv 
        data2: output/tabfig/rates_repeated_prepandemic_redacted_rounded_diabetes.csv 
        data3: output/tabfig/rates_repeated_postpandemic_redacted_rounded_diabetes.csv 

  # Cohort rates in those without diabetes: 103B
  generate_rates_B:
    run: stata-mp:latest analysis/103_B_cr_cohort_rates_repeated_nodiabetes.do
    needs: [generate_dataset_prepandemic, generate_dataset_pandemic, generate_dataset_postpandemic, generate_analysis_datasets]
    outputs:
      moderately_sensitive:
        log1: logs/103_B_cohort_rates_repeated_nodiabetes.log
        data1: output/tabfig/rates_repeated_pandemic_redacted_rounded_nodiabetes.csv 
        data2: output/tabfig/rates_repeated_prepandemic_redacted_rounded_nodiabetes.csv 
        data3: output/tabfig/rates_repeated_postpandemic_redacted_rounded_nodiabetes.csv 

  # Cohort rates in each overall cohort: 103C
  generate_rates_C:
    run: stata-mp:latest analysis/103_C_cr_cohort_rates_repeated_stratified.do
    needs: [generate_dataset_prepandemic, generate_dataset_pandemic, generate_dataset_postpandemic, generate_analysis_datasets]
    outputs:
      moderately_sensitive:
        log1: logs/103_C_cohort_rates_repeated_stratified.log
        data1: output/tabfig/rates_repeated_pandemic_redacted_rounded_stratified.csv 
        data2: output/tabfig/rates_repeated_prepandemic_redacted_rounded_stratified.csv 
        data3: output/tabfig/rates_repeated_postpandemic_redacted_rounded_stratified.csv 

  # Generate table 1 :104  
  generate_table1:
    run: stata-mp:latest analysis/104_cr_table1.do
    needs: [generate_dataset_prepandemic, generate_dataset_pandemic, generate_dataset_postpandemic, generate_analysis_datasets]
    outputs:
      moderately_sensitive:
        log1: logs/104_cr_table1.log
        data1: output/tabfig/table1_prepandemic_redacted_rounded.csv 
        data2: output/tabfig/table1_pandemic_redacted_rounded.csv 
        data3: output/tabfig/table1_postpandemic_redacted_rounded.csv 

  # Generate table 1 :104B  
  generate_table1_escalation:
    run: stata-mp:latest analysis/104_B_cr_table1_escalation.do
    needs: [generate_dataset_prepandemic, generate_dataset_pandemic, generate_dataset_postpandemic, generate_analysis_datasets]
    outputs:
      moderately_sensitive:
        log1: logs/104_B_cr_table1_escalation.log
        data1: output/tabfig/table1_drug_escalation_redacted_rounded.csv 

  # Drug graphs: 105
  generate_druggraphs:
    run: stata-mp:latest analysis/105_cr_drug_prevalence_graphs.do
    needs: [generate_dataset_prepandemic, generate_dataset_pandemic, generate_dataset_postpandemic, generate_analysis_datasets, generate_drugprevalence]
    outputs:
      moderately_sensitive:
        log1: logs/105_cr_graphs.log
        Figures1: output/tabfig/*_prevalences_by_drug_*.svg 
        Figures2: output/tabfig/prevalences_*.svg 

  # Drug graphs: 106
  generate_rategraphs:
    run: stata-mp:latest analysis/106_cr_graphs_rates.do
    needs: [generate_dataset_prepandemic, generate_dataset_pandemic, generate_dataset_postpandemic, generate_analysis_datasets, generate_rates]
    outputs:
      moderately_sensitive:
        log1: logs/106_cr_graphs_rates.log
        Figures1: output/tabfig/rates_*.svg 

  # Drug graphs: 107
  generate_drugescalation23:
    run: stata-mp:latest analysis/107_cr_cohorts_escalation_2_3.do
    needs: [generate_dataset_escalation, generate_analysis_datasets]
    outputs:
      moderately_sensitive:
        log1: logs/107_cr_cohorts_escalation.log
        dataset1: output/tabfig/escalation_rates_2_3_prepandemic_redacted_rounded.csv
        dataset2: output/tabfig/escalation_rates_2_3_pandemic_redacted_rounded.csv
        dataset3: output/tabfig/escalation_rates_2_3_postpandemic_redacted_rounded.csv
        Figures1: output/tabfig/escalation_2_3_*.svg 

  # Drug graphs: 108
  generate_drugescalation34:
    run: stata-mp:latest analysis/108_cr_cohorts_escalation_3_4.do
    needs: [generate_dataset_escalation, generate_analysis_datasets]
    outputs:
      moderately_sensitive:
        log1: logs/107_cr_cohorts_escalation_3_4.log
        dataset1: output/tabfig/escalation_rates_3_4_prepandemic_redacted_rounded.csv
        dataset2: output/tabfig/escalation_rates_3_4_pandemic_redacted_rounded.csv
        dataset3: output/tabfig/escalation_rates_3_4_postpandemic_redacted_rounded.csv
        Figures1: output/tabfig/escalation_3_4_*.svg 

# TIME SERIES
  generate_dataset_timeseries:
    run: ehrql:v1 generate-dataset analysis/dataset_timeseries.py --output output/dataset_timeseries.csv
    outputs:
      moderately_sensitive:
        dataset: output/dataset_timeseries.csv

# Measures 
  measures:
    run: ehrql:v1 generate-measures analysis/measures.py 
      --output output/measures/measures.csv
      --
      --start-date "2018-01-01"
      --intervals 64
    outputs:
      moderately_sensitive:
        measure_csv: output/measures/measures.csv

# Time series do file
  run_timeseries:
    run: stata-mp:latest analysis/109_time_series.do --dummy-data-file measures_dummy.csv
    needs: [generate_dataset_timeseries, measures]
    outputs:
      moderately_sensitive:
        log1: logs/time_series.log
        dataset: output/tabfig/measures_redacted_rounded.csv
        Figures1: output/tabfig/time_series_*.svg

Timeline

  • Created:

  • Started:

  • Finished:

  • Runtime: 01:20:13

These timestamps are generated and stored using the UTC timezone on the TPP backend.

Job information

Status
Succeeded
Backend
TPP
Requested by
Emily Herrett
Branch
main
Force run dependencies
No
Git commit hash
30fd947
Requested actions
  • generate_drugprevalence_coms
  • generate_rates_A
  • generate_rates_B
  • generate_table1
  • generate_table1_escalation
  • run_timeseries

Code comparison

Compare the code used in this Job Request