Job request: 10221
- Organisation:
- Bennett Institute
- Workspace:
- covid-mortality-over-time-imd
- ID:
- 2ecuon3aezfyndmz
This page shows the technical details of what happened when the authorised researcher Linda Nab requested one or more actions to be run against real patient data within a secure environment.
By cross-referencing the list of jobs with the pipeline section below, you can infer what security level the outputs were written to.
The output security levels are:
-
highly_sensitive
- Researchers can never directly view these outputs
- Researchers can only request code is run against them
-
moderately_sensitive
- Can be viewed by an approved researcher by logging into a highly secure environment
- These are the only outputs that can be requested for public release via a controlled output review service.
Jobs
-
- Job identifier:
-
2nl2oap3a24sjsur
-
- Job identifier:
-
bq4kvizmxlh7ni2k
-
- Job identifier:
-
xgcyudxd7ubn7b3z
-
- Job identifier:
-
gj464ci6a7slfp3b
-
- Job identifier:
-
uqdx3vkfyyznk7xw
Pipeline
Show project.yaml
version: '3.0'
expectations:
population_size: 1000
actions:
# Extract ethnicity
generate_study_population_ethnicity:
run: >
cohortextractor:latest generate_cohort
--study-definition study_definition_ethnicity
--output-format=csv.gz
outputs:
highly_sensitive:
cohort: output/input_ethnicity.csv.gz
# SECOND PART OF STUDY
generate_study_population_flowchart:
run: >
cohortextractor:latest generate_cohort
--study-definition study_definition_flowchart
--skip-existing
--output-format=csv.gz
outputs:
highly_sensitive:
cohort: output/input_flowchart.csv.gz
# Process data flowchart
process_data_flowchart:
run: r:latest analysis/data_flowchart_process.R
needs: [generate_study_population_flowchart]
outputs:
highly_sensitive:
rds: output/processed/input_flowchart.rds
# Skim data flowchart
skim_data_flowchart:
run: r:latest analysis/data_skim.R output/processed/input_flowchart.rds output/data_properties
needs: [process_data_flowchart]
outputs:
moderately_sensitive:
txt1: output/data_properties/input_flowchart_skim.txt
txt2: output/data_properties/input_flowchart_coltypes.txt
txt3: output/data_properties/input_flowchart_tabulate.txt
# Numbers for flowchart
calc_numbers_flowchart:
run: r:latest analysis/flowchart.R
needs: [process_data_flowchart]
outputs:
moderately_sensitive:
cohort: output/tables/flowchart/wave1_flowchart.csv
generate_study_population_wave1:
run: >
cohortextractor:latest generate_cohort
--study-definition study_definition_wave1
--skip-existing
--output-format=csv.gz
outputs:
highly_sensitive:
cohort: output/input_wave1.csv.gz
generate_study_population_wave1_imd:
run: >
cohortextractor:latest generate_cohort
--study-definition study_definition_wave1_imd
--skip-existing
--output-format=csv.gz
outputs:
highly_sensitive:
cohort: output/input_wave1_imd.csv.gz
generate_study_population_wave2:
run: >
cohortextractor:latest generate_cohort
--study-definition study_definition_wave2
--skip-existing
--output-format=csv.gz
outputs:
highly_sensitive:
cohort: output/input_wave2.csv.gz
generate_study_population_wave3:
run: >
cohortextractor:latest generate_cohort
--study-definition study_definition_wave3
--skip-existing
--output-format=csv.gz
outputs:
highly_sensitive:
cohort: output/input_wave3.csv.gz
# Join data
join_cohorts_waves:
run: >
cohort-joiner:v0.0.7
--lhs output/input_wave*.csv.gz
--rhs output/input_ethnicity.csv.gz
--output-dir=output/joined
needs: [generate_study_population_wave1, generate_study_population_wave2, generate_study_population_wave3, generate_study_population_wave1_imd, generate_study_population_ethnicity]
outputs:
highly_sensitive:
cohort: output/joined/input_wave*.csv.gz
# Process data
process_data:
run: r:latest analysis/data_process.R
needs: [join_cohorts_waves]
outputs:
highly_sensitive:
rds: output/processed/input_wave*.rds
# Skim data
skim_data_wave1:
run: r:latest analysis/data_skim.R output/processed/input_wave1.rds output/data_properties
needs: [process_data]
outputs:
moderately_sensitive:
txt1: output/data_properties/input_wave1_skim.txt
txt2: output/data_properties/input_wave1_coltypes.txt
txt3: output/data_properties/input_wave1_tabulate.txt
# Skim data
skim_data_wave1_imd:
run: r:latest analysis/data_skim.R output/processed/input_wave1_imd.rds output/data_properties
needs: [process_data]
outputs:
moderately_sensitive:
txt1: output/data_properties/input_wave1_imd_skim.txt
txt2: output/data_properties/input_wave1_imd_coltypes.txt
txt3: output/data_properties/input_wave1_imd_tabulate.txt
skim_data_wave2:
run: r:latest analysis/data_skim.R output/processed/input_wave2.rds output/data_properties
needs: [process_data]
outputs:
moderately_sensitive:
txt1: output/data_properties/input_wave2_skim.txt
txt2: output/data_properties/input_wave2_coltypes.txt
txt3: output/data_properties/input_wave2_tabulate.txt
skim_data_wave3:
run: r:latest analysis/data_skim.R output/processed/input_wave3.rds output/data_properties
needs: [process_data]
outputs:
moderately_sensitive:
txt1: output/data_properties/input_wave3_skim.txt
txt2: output/data_properties/input_wave3_coltypes.txt
txt3: output/data_properties/input_wave3_tabulate.txt
Timeline
-
Created:
-
Started:
-
Finished:
-
Runtime: 06:32:30
These timestamps are generated and stored using the UTC timezone on the TPP backend.
Job request
- Status
-
Succeeded
- Backend
- TPP
- Workspace
- covid-mortality-over-time-imd
- Requested by
- Linda Nab
- Branch
- explore-old-imd-data-extract
- Force run dependencies
- No
- Git commit hash
- 947d309
- Requested actions
-
-
generate_study_population_wave1_imd -
join_cohorts_waves -
process_data -
skim_data_wave1 -
skim_data_wave1_imd
-
Code comparison
Compare the code used in this job request