Job request: 14136
- Organisation:
- Bennett Institute
- Workspace:
- opioids-covid-research
- ID:
- hcor5tul6nshyloz
This page shows the technical details of what happened when the authorised researcher Andrea Schaffer requested one or more actions to be run against real patient data within a secure environment.
By cross-referencing the list of jobs with the pipeline section below, you can infer what security level the outputs were written to.
The output security levels are:
-
highly_sensitive
- Researchers can never directly view these outputs
- Researchers can only request code is run against them
-
moderately_sensitive
- Can be viewed by an approved researcher by logging into a highly secure environment
- These are the only outputs that can be requested for public release via a controlled output review service.
Jobs
-
- Job identifier:
-
fkindqtgzzfw2oly
-
- Job identifier:
-
jwp3tun2mlhekdu7
-
- Job identifier:
-
amkjn2s5pyqq2t74
Pipeline
Show project.yaml
######################################
# This script defines the project pipeline - it specifies the execution orders for all the code in this
# repo using a series of actions.
######################################
version: '3.0'
expectations:
population_size: 10000
actions:
# Extract data ----
## Cohort data
generate_study_population_1:
run: cohortextractor:latest generate_cohort
--study-definition study_definition
--index-date-range "2018-01-01 to 2018-12-01 by month"
--output-dir=output
--output-format=csv
outputs:
highly_sensitive:
cohort: output/input_*.csv
generate_study_population_2:
run: cohortextractor:latest generate_cohort
--study-definition study_definition
--index-date-range "2019-01-01 to 2019-12-01 by month"
--output-dir=output
--output-format=csv
outputs:
highly_sensitive:
cohort: output/input*.csv
generate_study_population_3:
run: cohortextractor:latest generate_cohort
--study-definition study_definition
--index-date-range "2020-01-01 to 2020-12-01 by month"
--output-dir=output
--output-format=csv
outputs:
highly_sensitive:
cohort: output/inpu*.csv
generate_study_population_4:
run: cohortextractor:latest generate_cohort
--study-definition study_definition
--index-date-range "2021-01-01 to 2021-12-01 by month"
--output-dir=output
--output-format=csv
outputs:
highly_sensitive:
cohort: output/inp*.csv
generate_study_population_5:
run: cohortextractor:latest generate_cohort
--study-definition study_definition
--index-date-range "2022-01-01 to 2022-03-01 by month"
--output-dir=output
--output-format=csv
outputs:
highly_sensitive:
cohort: output/in*.csv
## Ethnicity
generate_ethnicity_cohort:
run: >
cohortextractor:latest generate_cohort
--study-definition study_definition_ethnicity
outputs:
highly_sensitive:
cohort: output/input_ethnicity.csv
# Data processing ----
## Add ethnicity
join_cohorts:
run: >
cohort-joiner:v0.0.48
--lhs output/input_*.csv
--rhs output/input_ethnicity.csv
--output-dir output/data
needs: [generate_study_population_1, generate_study_population_2,
generate_study_population_5, generate_study_population_3, generate_study_population_4,
generate_ethnicity_cohort]
outputs:
highly_sensitive:
cohort: output/data/input_*.csv
## Generate measures
generate_measures:
run: >
cohortextractor:latest generate_measures
--study-definition study_definition
--output-dir output/data
needs: [join_cohorts]
outputs:
moderately_sensitive:
measure_csv: output/data/measure_*.csv
## Process data - time series
process_data_ts:
run: r:latest analysis/process/process_data_ts.R
needs: [generate_measures, join_cohorts]
outputs:
moderately_sensitive:
measure_csv: output/joined/final_*.csv
## Process data - table
process_data_table:
run: r:latest analysis/process/process_data_table.R
needs: [generate_measures, join_cohorts]
outputs:
moderately_sensitive:
measure_csv: output/joined/final*.csv
# Results ---
## Time series
timeseries:
run: r:latest analysis/descriptive/time_series_stand.R
needs: [process_data_ts]
outputs:
moderately_sensitive:
table: output/time series/ts_*.csv
## Time series graphs
# graphs:
# run: r:latest analysis/descriptive/graphs.R
# needs: [timeseries]
# outputs:
# moderately_sensitive:
# plot: output/time series/graphs/graph*.png
## Table
table:
run: r:latest analysis/descriptive/table_stand.R
needs: [process_data_table]
outputs:
moderately_sensitive:
table: output/tables/table_*.csv
##################### TESTING ########################
generate_study_population_test:
run: cohortextractor:latest generate_cohort
--study-definition study_definition_test
--index-date-range "2020-01-01 to 2022-03-01 by month"
--output-dir=output
--output-format=csv
outputs:
highly_sensitive:
cohort: output/i*.csv
## Add ethnicity
join_cohorts_test:
run: >
cohort-joiner:v0.0.48
--lhs output/i*.csv
--rhs output/input_ethnicity.csv
--output-dir output/data
needs: [generate_study_population_test, generate_ethnicity_cohort]
outputs:
highly_sensitive:
cohort: output/data/input*.csv
## Generate measures
generate_measures_test:
run: >
cohortextractor:latest generate_measures
--study-definition study_definition_test
--output-dir output/data
needs: [join_cohorts_test]
outputs:
moderately_sensitive:
measure_csv: output/data/measure*.csv
Timeline
-
Created:
-
Started:
-
Finished:
-
Runtime: 05:07:14
These timestamps are generated and stored using the UTC timezone on the TPP backend.
Job request
- Status
-
Succeeded
- Backend
- TPP
- Workspace
- opioids-covid-research
- Requested by
- Andrea Schaffer
- Branch
- main
- Force run dependencies
- No
- Git commit hash
- 742c3e9
- Requested actions
-
-
generate_study_population_test -
join_cohorts_test -
generate_measures_test
-
Code comparison
Compare the code used in this job request