Skip to content

Job request: 17776

Organisation:
The London School of Hygiene & Tropical Medicine
Workspace:
openprompt_longcovid_vaccines
ID:
x6obndz43jxvh6iy

This page shows the technical details of what happened when authorised researcher Alasdair Henderson requested one or more actions to be run against real patient data in the project, within a secure environment.

By cross-referencing the indicated Requested Actions with the Pipeline section below, you can infer what security level various outputs were written to. Outputs marked as highly_sensitive can never be viewed directly by a researcher; they can only request that code runs against them. Outputs marked as moderately_sensitive can be viewed by an approved researcher by logging into a highly secure environment. Only outputs marked as moderately_sensitive can be requested for release to the public, via a controlled output review service.

Jobs

Pipeline

Show project.yaml
version: '3.0'

expectations:
 population_size: 20000

actions:

  generate_dataset_cases:
    run: >
      databuilder:v0 
        generate-dataset analysis/dataset_definition_cases.py --output output/dataset_cases.csv.gz
    outputs:
      highly_sensitive:
        dataset_cases: output/dataset_cases.csv.gz
  
  generate_dataset_controls:
    run: >
      databuilder:v0 
        generate-dataset analysis/dataset_definition_controls.py --output output/dataset_controls.csv.gz
    outputs:
      highly_sensitive:
        dataset_controls: output/dataset_controls.csv.gz
  
  generate_dataset_lc_pre_vacc:
    run: >
      databuilder:v0 
        generate-dataset analysis/dataset_definition_longcovid_prevaccine.py --output output/dataset_lc_pre_vacc.csv.gz
    outputs:
      highly_sensitive:
        dataset_controls: output/dataset_lc_pre_vacc.csv.gz
  
  clean_the_data: 
    run: > 
      r:latest
        analysis/010_cleandata.R
    needs: [generate_dataset_cases, generate_dataset_controls]
    outputs: 
      highly_sensitive:
        cleandata: output/clean_dataset.gz.parquet 
      moderately_sensitive: 
        txt1: output/data_properties/raw_dataset_skim.txt
        txt2: output/data_properties/raw_dataset_tabulate.txt
        txt3: output/data_properties/clean_dataset_skim.txt
        txt4: output/data_properties/clean_dataset_tabulate.txt

  clean_lc_pre_vacc_data: 
    run: >
      r:latest
        analysis/0101_clean_lcfirst_cohort.R
    needs: [generate_dataset_lc_pre_vacc]
    outputs: 
      highly_sensitive: 
        cleandata_lcfirst: output/clean_dataset_lc_first.gz.parquet
      moderately_sensitive: 
        txt5: output/data_properties/lcfirst_cohort_skim.txt
        txt6: output/data_properties/lcfirst_cohort_tabulate.txt
        tab1_lc_first: output/tables/tab1_full_description_lc_first.html

  time_update_data: 
    run: >
      r:latest
        analysis/011_timeupdate_data.R
    needs: [clean_the_data]
    outputs: 
      highly_sensitive: 
        timedata_longcovid: output/timeupdate_dataset_lc_all.gz.parquet
        timedata_longcovid_dx: output/timeupdate_dataset_lc_dx.gz.parquet
        timedata_fracture: output/timeupdate_dataset_fracture.gz.parquet
      moderately_sensitive: 
        txt3: output/data_properties/timeupdated_dataset_skim.txt

  summarise_timedata: 
    run: >
      r:latest
        analysis/012_timeupdated_summary.R
    needs: [time_update_data]
    outputs: 
      moderately_sensitive: 
        txt4: output/data_properties/timeupdated_lc_all_tabulate.txt
        lc_all_t_plot: output/supplementary/time_updated_t_byvaccines.pdf
        lc_dx_t_plot: output/supplementary/time_updated_t_byvaccines_lc_dx.pdf
  
  summarise_cohort_at_baseline: 
    run: >
      r:latest
        analysis/013_create_table1.R
    needs: [clean_the_data]
    outputs: 
      moderately_sensitive: 
        table1: output/tables/tab1_baseline_description.html
        table1_csv: output/tab1_baseline_data.csv
        table2: output/tables/tab2_fup_description.html
        table2_csv: output/tab2_fup_data.csv
        vaccine_lc_gap: output/supplementary/fig_vaccines_longcovid_gap.pdf
        vaccine_lc_gap_detail: output/supplementary/fig_vaccines_longcovid_gap_zoomed.pdf
        vaccine_lc_gap_csv: output/supplementary/vaccines_longcovid_gap.csv
        

  calculate_monthly_dynamics: 
    run: > 
      r:latest
        analysis/014_calculate_monthly_dynamics.R
    needs: [clean_the_data]
    outputs: 
      moderately_sensitive: 
        monthly_dynamics: output/data_monthly_dynamics.csv
        table_monthly_dynamics: output/tables/supptab01_monthly_dynamics.csv

  crude_rates:
    run: >
      r:latest
        analysis/020_cruderates.R
    needs: [clean_the_data]
    outputs:
      moderately_sensitive:
        cruderates: output/tab021_crude_lc_rates.csv

  crude_rates_timeupdated:
    run: >
      r:latest
        analysis/021_cruderates_timeupdated.R
    needs: [time_update_data]
    outputs:
      moderately_sensitive:
        t_cruderates_lc_all: output/tab022_tuv_rates_lc_all.csv
        t_cruderates_lc_dx: output/tab023_tuv_rates_lc_dx.csv
  
  output_crude_rates:
    run: >
      r:latest
        analysis/022_combine_cruderates.R
    needs: [crude_rates, crude_rates_timeupdated]
    outputs:
      moderately_sensitive:
        cruderates_redacted: output/tables/tab3_crude_rates_redacted.csv
        cruderates_plot: output/figures/fig3_crude_rates.pdf
  
  plot_incidence:
    run: >
      r:latest
        analysis/030_plotrates.R
    needs: [time_update_data, calculate_monthly_dynamics]
    outputs:
      moderately_sensitive:
        counts_line: output/figures/fig2_raw_counts_line.pdf
        counts_line_sex: output/figures/fig2a_raw_counts_line_bysex.pdf
        countscolumn: output/figures/fig2b_raw_counts_column.pdf
        countscolumn_sex: output/figures/fig2c_raw_counts_column_bysex.pdf
        stackedbar: output/figures/fig2e_longcovid_stacked_dx_rx.pdf
        multipanelfig: output/figures/fig2_longcovid_dynamics.pdf
        vaccinegap: output/supplementary/fig_agegap_vaccines.pdf
        monthly_plot_v1: output/figures/fig4a_outbreak_dynamics.pdf
        monthly_plot_v2: output/figures/fig4b_outbreak_dynamics_cumulative.pdf
        monthly_plot_v3: output/figures/fig4c_outbreak_dynamics_experimental.pdf
        monthly_plot_v4: output/figures/fig4d_outbreak_dynamics_log.pdf
        monthly_plot_v5: output/figures/fig4e_longcovid_and_national_cases.pdf

  plot_longcovid_flows:
    run: >
      r:latest
        analysis/032_pathways_to_longcovid.R
    needs: [clean_the_data]
    outputs:
      moderately_sensitive:
        tests_lc_table: output/tables/tab_tests_and_longcovid.html
        tests_density: output/supplementary/test_to_longcovid_density.pdf
        longcovid_flows: output/figures/fig5_longcovid_flows.pdf
        longcovid_flows_data: output/sankey_plot_data.csv

  poisson_rates_static:
    run: >
      r:latest
        analysis/040_poisson_regressions_staticvars.R
    needs: [clean_the_data]
    outputs:
      moderately_sensitive:
        poissonrates: output/tab023_poissonrates_static.csv

  poisson_rates_timeupdated:
    run: >
      r:latest
        analysis/041_poisson_regressions_timeupdated.R
    needs: [time_update_data]
    outputs:
      moderately_sensitive:
        poissonrates: output/tab023_poissonrates_timeupdated.csv

  poisson_plots:
    run: >
      r:latest
        analysis/042_plot_poisson_results.R
    needs: [poisson_rates_static, poisson_rates_timeupdated]
    outputs: 
      moderately_sensitive: 
        fig3a: output/figures/fig3a_crude_RRs.pdf
        fig3b: output/figures/fig3b_adjusted_RRs.pdf
        fig3c: output/figures/fig3c_longcovid_RRs.pdf
        fig3d: output/figures/fig3d_longcovid_models.pdf
        fig3e: output/figures/fig3e_vaccines.pdf
        fig3f: output/figures/fig3f_longcovid_vaccine_models.pdf
        fig3g: output/figures/fig3g_demographics.pdf
        poissonplots_all: output/figures/fig3h_rate_ratios_facet.pdf
        poissonplots_table: output/tables/tab4_poisson_rateratios.csv

Timeline

  • Created:

  • Started:

  • Finished:

  • Runtime: 17:19:16

These timestamps are generated and stored using the UTC timezone on the TPP backend.

Job information

Status
Succeeded
Backend
TPP
Requested by
Alasdair Henderson
Branch
main
Force run dependencies
No
Git commit hash
43141e3
Requested actions
  • generate_dataset_cases
  • generate_dataset_controls
  • clean_the_data
  • time_update_data
  • summarise_timedata
  • summarise_cohort_at_baseline
  • calculate_monthly_dynamics
  • crude_rates_timeupdated
  • output_crude_rates
  • poisson_rates_timeupdated
  • poisson_plots

Code comparison

Compare the code used in this Job Request