Skip to content

Job request: 17797

Organisation:
University of Bristol
Workspace:
vax-fourth-dose-rd-test
ID:
5lfcxnpgxm3wkmjd

This page shows the technical details of what happened when the authorised researcher Andrea Schaffer requested one or more actions to be run against real patient data within a secure environment.

By cross-referencing the list of jobs with the pipeline section below, you can infer what security level the outputs were written to.

The output security levels are:

  • highly_sensitive
    • Researchers can never directly view these outputs
    • Researchers can only request code is run against them
  • moderately_sensitive
    • Can be viewed by an approved researcher by logging into a highly secure environment
    • These are the only outputs that can be requested for public release via a controlled output review service.

Jobs

  • Action:
    demographics
    Status:
    Status: Succeeded
    Job identifier:
    ji4bh2ulok6fktx4

Pipeline

Show project.yaml
######################################

# This script defines the project pipeline - it specifies the execution orders for all the code in this
# repo using a series of actions.

######################################


version: '3.0'

expectations:
  population_size: 10000

actions:

# Generate study population and extract baseline characteristics at Sep 3, 2022
  generate_study_pop_baseline:
    run: cohortextractor:latest generate_cohort
      --study-definition study_definition_baseline
      --output-dir=feather 
      --output-format=feather
    outputs:
      highly_sensitive:
        cohort: output/input_baseline.feather
      
# Data cleaning, defining exclusions, saving final study pop
  data_process_baseline:
    run: r:latest analysis/processing/data_process_baseline.R
    needs: [generate_study_pop_baseline]
    outputs:
      highly_sensitive:
        cohort: output/cohort/cohort_*.csv
      moderately_sensitive:
        descriptive: output/descriptive/total_*.csv     

#  # Extract outcomes pre-campaign (index date = Sep 3)
#   outcomes_sep:
#     run: cohortextractor:latest generate_cohort
#       --study-definition study_definition_outcomes_1
#       --index-date-range "2022-09-03" 
#       --output-dir=feather 
#       --output-format=feather
#     needs: [data_process_baseline]
#     outputs:
#       highly_sensitive:
#         cohort: output/index/input_*.feather

#  # Extract outcomes mid-campaign (index date = Oct 15)
#   outcomes_oct:
#     run: cohortextractor:latest generate_cohort
#       --study-definition study_definition_outcomes_1
#       --index-date-range "2022-10-15" 
#       --output-dir=feather 
#       --output-format=feather
#     needs: [data_process_baseline]
#     outputs:
#       highly_sensitive:
#         cohort: output/index/input*.feather

 # Extract outcomes 
  outcomes:
    run: cohortextractor:latest generate_cohort
      --study-definition study_definition_outcomes_2
      --index-date-range "2022-09-03 to 2023-01-28 by week" 
      --output-dir=feather 
      --output-format=feather
    needs: [data_process_baseline]
    outputs:
      highly_sensitive:
        cohort: output/index/inpu*.feather

# Data cleaning of outcome data (control periods)
  # data_process_outcomes_1:
  #   run: r:latest analysis/processing/data_process_outcomes_1.R
  #   needs: [outcomes_sep, outcomes_oct]
  #   outputs:
  #     highly_sensitive:
  #       outcomes: output/cohort/outcomes*.csv

# Data cleaning of outcome data 
  data_process_outcomes_2:
    run: r:latest analysis/processing/data_process_outcomes_2.R
    needs: [outcomes]
    outputs:
      highly_sensitive:
        outcomes: output/cohort/outcome*.feather

# Split into separate datasets by start date
  data_process_by_start_date:
    run: r:latest analysis/processing/data_process_by_start_date.R
    needs: [data_process_outcomes_2]
    outputs:
      moderately_sensitive:
        outcomes: output/cohort_bydate/outcomes_*.csv

# Plots of COVID booster uptake by age
  booster_uptake:
   run: r:latest analysis/descriptive/cumulative_vax_byage.R
   needs: [data_process_baseline]
   outputs:
      moderately_sensitive:
        rates_csv: output/cumulative_rates/final_*.csv 
        plot: output/cumulative_rates/plot_*.png

# Aggregate data by age
  aggregate_outcomes:
    run: r:latest analysis/processing/aggregate_outcomes.R
    needs: [data_process_by_start_date]
    outputs:
      moderately_sensitive:
        outcomes: output/covid_outcomes/by_start_date/outcomes_*.csv
#        no_patients: output/descriptive/total_n_by_date.csv

# Outcome plots #
  # plot_outcomes:
  #  run: r:latest analysis/descriptive/plot_outcomes_byage.R
  #  needs: [aggregate_outcomes_byage]
  #  outputs:
  #     moderately_sensitive:
  #       plot: output/covid_outcomes/figures/plot_*.png

# Sharp analysis #
  sharp_analysis_lpm:
   run: r:latest analysis/statistical_analysis/sharp_analysis_lpm.R
   needs: [data_process_by_start_date]
   outputs:
      moderately_sensitive:
        predicted_csv: output/modelling/predicted_lpm*.csv
        coefficients1_csv: output/modelling/coef_lpm*.csv
        coefficients2_csv: output/modelling/final/coef_lpm*.csv
        plot: output/modelling/figures/plot_pred_lpm*.png

# Fuzzy analysis #
  fuzzy_analysis:
   run: r:latest analysis/statistical_analysis/fuzzy_analysis.R
   needs: [data_process_by_start_date]
   outputs:
      moderately_sensitive:
        coefficients_csv: output/modelling/iv/coef_iv*.csv
        final_csv: output/modelling/final/coef_i*.csv

# Check latest date of outcome
  # latest_date_outcomes:
  #  run: r:latest analysis/descriptive/latest_date_outcomes.R
  #  needs: [data_process_outcomes_2]
  #  outputs:
  #     moderately_sensitive:
  #       plot: output/descriptive/over*.png

# Discontinuity of demographics
  demographics:
   run: r:latest analysis/descriptive/demographics_byage.R
   needs: [generate_study_pop_baseline, data_process_baseline]
   outputs:
      moderately_sensitive:
        demographics_csv: output/descriptive/demographics_*.csv
        fluvaccine_csv: output/descriptive/fluvax_*.csv

# Check DOD
  # check_dod:
  #  run: r:latest analysis/descriptive/check_dod.R
  #  needs: [generate_study_pop_baseline, outcomes_sep, outcomes_nov, outcomes_oct, data_process_baseline, data_process_outcomes_1, data_process_outcomes_2]
  #  outputs:
  #     moderately_sensitive:
  #       demographics_csv: output/descriptive/dod*.csv
      
# Outcomes by week
  outcomes_by_week:
    run: r:latest analysis/descriptive/outcomes_by_week.R
    needs: [outcomes]
    outputs:
      moderately_sensitive:
        outcomes: output/descriptive/outcomes_by_*.csv
        plot: output/descriptive/outcomes_by_week.png

Timeline

  • Created:

  • Started:

  • Finished:

  • Runtime: 00:01:09

These timestamps are generated and stored using the UTC timezone on the TPP backend.

Job request

Status
Succeeded
Backend
TPP
Requested by
Andrea Schaffer
Branch
Additional-changes
Force run dependencies
No
Git commit hash
ad57f89
Requested actions
  • demographics

Code comparison

Compare the code used in this job request