Job request: 2883

Organisation:: The London School of Hygiene & Tropical Medicine
Workspace:: carehomes
ID:: cguqlb7prhmx5x6a

This page shows the technical details of what happened when the authorised researcher Emily Nightingale requested one or more actions to be run against real patient data within a secure environment.

By cross-referencing the list of jobs with the pipeline section below, you can infer what security level the outputs were written to.

The output security levels are:

highly_sensitive
- Researchers can never directly view these outputs
- Researchers can only request code is run against them
moderately_sensitive
- Can be viewed by an approved researcher by logging into a highly secure environment
- These are the only outputs that can be requested for public release via a controlled output review service.

Jobs

Action:

data_setup

Status:

Status: Succeeded

Job identifier:

incbp7ycldidrymi

Pipeline

Show project.yaml

version: '3.0'

expectations:
  population_size: 1000000

actions:
  generate_study_population:
    run: cohortextractor:latest generate_cohort --study-definition study_definition 
    outputs:
      highly_sensitive:
        cohort: input.csv
        
          
  check_hhid:
    needs: [generate_study_population]
    run: r:latest analysis/check_hhid.R input.csv
    outputs:
      moderately_sensitive:
        log: check_hhid.txt

  calc_coverage:
    needs: [generate_study_population]
    # last argument relates to MSOA TPP coverage >= X%
    run: r:latest analysis/calculate_tpp_coverage.R input.csv data/SAPE22DT15_mid_2019_msoa.csv 80
    outputs:
      moderately_sensitive:
        log: coverage_log.txt
        rds: tpp_coverage_included.rds
        rds2: tpp_coverage_all.rds
        csv: tpp_coverage_all.csv
        csv2: msoas_in_tpp.csv
        csv3: msoa_gt_100_cov.csv
        figure: total_vs_tpp_pop.png
        figure2: tpp_cov_filtered.png

  data_clean:
    needs: [generate_study_population, calc_coverage]
    # last argument relates to MSOA TPP coverage >= X%
    run: r:latest analysis/data_clean.R input.csv tpp_coverage_included.rds 80
    outputs:
      moderately_sensitive:
        log: data_clean_log.txt
      highly_sensitive:
        input_clean: input_clean.rds
        
  data_check:
    needs: [data_clean]
    run: r:latest analysis/data_check.R input_clean.rds 
    outputs:
      moderately_sensitive:
        log: data_check_log.txt
        
  data_check_figs:
    needs: [data_clean]
    run: r:latest analysis/data_check_figs.R input_clean.rds data/msoa_shp.rds
    outputs:
      moderately_sensitive:
        figure1: tpp_coverage_msoa.png
        figure2: tpp_coverage_carehomes.png
        figure3: tpp_coverage_map.pdf
        figure4: age_dist.png
        figure5: infection_death_delays.png
        figure6: hh_size_dist.png

  data_setup:
    needs: [data_clean]
    # last argument relates to carehome TPP coverage >= X%
    run: r:latest analysis/data_setup.R input_clean.rds data/cases_rolling_nation.csv 90
    outputs:
      moderately_sensitive:
        log: data_setup_log.txt
      highly_sensitive:
        comm_prev: community_incidence.rds
        analysisdata: analysisdata.rds
        ch_linelist: ch_linelist.rds
        ch_agg_long: ch_agg_long.rds

  descriptive:
    needs: [data_clean, data_setup]
    run: r:latest analysis/descriptive.R 
    outputs:
      moderately_sensitive:
       # report: descriptive.pdf
        log: log_descriptive.txt
        table: ch_chars_tab.csv
        figure: carehome_size.png
        figure1: ch_survival.png
        figure2: ch_survival_bytype.png
        figure3: first_event_type.png
        figure4: community_inc.png
        figure5: comm_vs_ch_risk.png
        figure6: comm_vs_ch_risk_log2.png
        figure7: compare_epidemics.png

  run_models:
    needs: [data_setup]
    run: r:latest analysis/run_models.R analysisdata.rds 0.0
    outputs:
      moderately_sensitive:
        output: output_model_run.txt
        log: log_model_run.txt
        coeffs: coeffs_all.rds
        figure: model_coeffs.pdf
        table: model_comp.csv
      highly_sensitive:
        fit: model_out.rds
        test: testdata.rds
        
  make_table:
    needs: [run_models]
    run: r:latest analysis/make_coeff_table.R coeffs_all.rds
    outputs:
      moderately_sensitive:
        table1: coeffs_table.csv
        table2: coeffs_table_all.csv
        
#  compare_models:
#    needs: [run_models]
#    run: r:latest analysis/compare_models.R model_out.rds
#    outputs:
#      moderately_sensitive:
#        log: output_model_comp.txt
#        coeffs: coeffs_all.rds
#        figure: model_coeffs.pdf
#        table: model_comp.csv

#  validate_models:
#    needs: [run_models]
#    run: r:latest analysis/validate_models.R fits.rds testdata.rds
#    outputs:
#      moderately_sensitive:
#        output: output_model_val.txt
#        report: test_pred_figs.pdf
        
  run_all:
    needs: [run_models, descriptive]
    # In order to be valid this action needs to define a run commmand and
    # some output. We don't really care what these are but the below seems to
    # do the trick.
    run: cohortextractor:latest --version
    outputs:
      moderately_sensitive:
        whatever: project.yaml

Timeline

Created: 4 years, 8 months ago 14 Jul 2021 14:12:55 UTC
Started: 4 years, 8 months ago 14 Jul 2021 14:13:37 UTC
Finished: 4 years, 8 months ago 14 Jul 2021 14:19:33 UTC
Runtime: 00:05:56

These timestamps are generated and stored using the UTC timezone on the TPP backend.

Job request

Status: Succeeded
Backend: TPP
Workspace: carehomes
Requested by: Emily Nightingale
Branch: master
Force run dependencies: No
Git commit hash: 28d44a2
Requested actions: data_setup

Code comparison

Compare the code used in this job request