Job request: 2577

Organisation:: The London School of Hygiene & Tropical Medicine
Workspace:: carehomes
ID:: c2owqnsvpjoezmze

This page shows the technical details of what happened when authorised researcher Emily Nightingale requested one or more actions to be run against real patient data in the project, within a secure environment.

By cross-referencing the indicated Requested Actions with the Pipeline section below, you can infer what security level various outputs were written to. Outputs marked as highly_sensitive can never be viewed directly by a researcher; they can only request that code runs against them. Outputs marked as moderately_sensitive can be viewed by an approved researcher by logging into a highly secure environment. Only outputs marked as moderately_sensitive can be requested for release to the public, via a controlled output review service.

Jobs

Action:

data_setup

Status:

Status: Succeeded

Job identifier:

ymnoshxqbd7vh2om

Pipeline

Show project.yaml

version: '3.0'

expectations:
  population_size: 1000000

actions:
  generate_study_population:
    run: cohortextractor:latest generate_cohort --study-definition study_definition 
    outputs:
      highly_sensitive:
        cohort: input.csv
        
          
  check_hhid:
    needs: [generate_study_population]
    run: r:latest analysis/check_hhid.R input.csv
    outputs:
      moderately_sensitive:
        log: check_hhid.txt

  calc_coverage:
    needs: [generate_study_population]
    # last argument relates to MSOA TPP coverage >= X%
    run: r:latest analysis/calculate_tpp_coverage.R input.csv data/SAPE22DT15_mid_2019_msoa.csv 80
    outputs:
      moderately_sensitive:
        log: coverage_log.txt
        rds: tpp_coverage_included.rds
        rds2: tpp_coverage_all.rds
        csv: tpp_coverage_all.csv
        csv2: msoas_in_tpp.csv
        csv3: msoa_gt_100_cov.csv
        figure: total_vs_tpp_pop.png
        figure2: tpp_cov_filtered.png

  data_clean:
    needs: [generate_study_population, calc_coverage]
    # last argument relates to MSOA TPP coverage >= X%
    run: r:latest analysis/data_clean.R input.csv tpp_coverage_included.rds 80
    outputs:
      moderately_sensitive:
        log: data_clean_log.txt
      highly_sensitive:
        input_clean: input_clean.rds
        
  data_check_figs:
    needs: [data_clean]
    run: r:latest analysis/data_check_figs.R input_clean.rds data/msoa_shp.rds
    outputs:
      moderately_sensitive:
        figure1: tpp_coverage_msoa.png
        figure2: tpp_coverage_carehomes.png
        figure3: tpp_coverage_map.pdf
        figure4: age_dist.png
        figure5: infection_death_delays.png
        figure6: hh_size_dist.png

  data_setup:
    needs: [data_clean]
    # last argument relates to carehome TPP coverage >= X%
    run: r:latest analysis/data_setup.R input_clean.rds data/cases_rolling_nation.csv 90
    outputs:
      moderately_sensitive:
        log: data_setup_log.txt
        figure: carehome_size.png
      highly_sensitive:
        comm_prev: community_incidence.rds
        analysisdata: analysisdata.rds
        ch_linelist: ch_linelist.rds
        ch_agg_long: ch_agg_long.rds

  descriptive:
    needs: [data_clean, data_setup]
    run: r:latest analysis/descriptive.R 
    outputs:
      moderately_sensitive:
       # report: descriptive.pdf
        log: log_descriptive.txt
        data: ch_gp_permsoa.csv
        table: ch_chars_tab.csv
        figure1: ch_survival.png
        figure2: ch_survival_bytype.png
        figure3: first_event_type.png
        figure4: community_inc.png
        figure5: comm_vs_ch_risk.png
        figure6: comm_vs_ch_risk_log2.png
        figure7: compare_epidemics.png

  run_models:
    needs: [data_setup]
    run: r:latest analysis/run_models.R analysisdata.rds 0.0
    outputs:
      moderately_sensitive:
        output: output_model_run.txt
        log: log_model_run.txt
        coeffs: coeffs_all.rds
        figure: model_coeffs.pdf
        table: model_comp.csv
      highly_sensitive:
        fit: model_out.rds
        test: testdata.rds
        
  make_table:
    needs: [run_models]
    run: r:latest analysis/make_coeff_table.R coeffs_all.rds
    outputs:
      moderately_sensitive:
        table1: coeffs_table.csv
        table2: coeffs_table_all.csv
        
#  compare_models:
#    needs: [run_models]
#    run: r:latest analysis/compare_models.R model_out.rds
#    outputs:
#      moderately_sensitive:
#        log: output_model_comp.txt
#        coeffs: coeffs_all.rds
#        figure: model_coeffs.pdf
#        table: model_comp.csv

#  validate_models:
#    needs: [run_models]
#    run: r:latest analysis/validate_models.R fits.rds testdata.rds
#    outputs:
#      moderately_sensitive:
#        output: output_model_val.txt
#        report: test_pred_figs.pdf
        
  run_all:
    needs: [run_models, descriptive]
    # In order to be valid this action needs to define a run commmand and
    # some output. We don't really care what these are but the below seems to
    # do the trick.
    run: cohortextractor:latest --version
    outputs:
      moderately_sensitive:
        whatever: project.yaml

Timeline

Created: 2 years, 10 months ago 18 Jun 2021 11:22:55 UTC
Started: 2 years, 10 months ago 18 Jun 2021 11:23:11 UTC
Finished: 2 years, 10 months ago 18 Jun 2021 11:30:17 UTC
Runtime: 00:07:06

These timestamps are generated and stored using the UTC timezone on the TPP backend.

Job information

Status: Succeeded
Backend: TPP
Workspace: carehomes
Requested by: Emily Nightingale
Branch: master
Force run dependencies: No
Git commit hash: a3ac062
Requested actions: data_setup

Code comparison

Compare the code used in this Job Request