Skip to content

Job request: 14503

Organisation:
Bennett Institute
Workspace:
strepa_scarlet
ID:
muylcqc5ntifqfmp

This page shows the technical details of what happened when the authorised researcher Christine Cunningham requested one or more actions to be run against real patient data within a secure environment.

By cross-referencing the list of jobs with the pipeline section below, you can infer what security level the outputs were written to.

The output security levels are:

  • highly_sensitive
    • Researchers can never directly view these outputs
    • Researchers can only request code is run against them
  • moderately_sensitive
    • Can be viewed by an approved researcher by logging into a highly secure environment
    • These are the only outputs that can be requested for public release via a controlled output review service.

Jobs

Pipeline

Show project.yaml
version: '3.0'

expectations:
  population_size: 1000

actions:
  generate_study_population_report_ethnicity:
    run: cohortextractor:latest generate_cohort 
      --study-definition study_definition_ethnicity_report --output-dir output/report --output-format=csv.gz
    outputs:
      highly_sensitive:
        cohort: output/report/input_ethnicity_report.csv.gz

  ### Curation check ###
  curation_monthly:
    run: cohortextractor:latest generate_cohort
      --study-definition study_definition_report
      --index-date-range "2019-01-01 to 2019-01-01 by month"
      --param frequency=monthly
      --output-dir=output/curation
      --output-format=csv.gz
    outputs:
      highly_sensitive:
        cohort: output/curation/input_report_2019-01-01.csv.gz

  dataset_report_monthly:
      run: python:latest python analysis/dataset_report.py
           --input-files output/curation/input_report_2019-01-01.csv.gz
           --output-dir output/curation/
           --granularity "year"
      needs: [curation_monthly]
      outputs:
        moderately_sensitive:
          # Only output the single summary file
          cohort_report: output/curation/input_report_2019-01-01.html

  curation_weekly:
    run: cohortextractor:latest generate_cohort
      --study-definition study_definition_report
      --index-date-range "2022-07-01 to 2022-07-01 by week"
      --param frequency=weekly
      --output-dir=output/curation
      --output-format=csv.gz
    outputs:
      highly_sensitive:
        cohort: output/curation/input_report_2022-07-01.csv.gz

  dataset_report_weekly:
      run: python:latest python analysis/dataset_report.py
           --input-files output/curation/input_report_2022-07-01.csv.gz
           --output-dir output/curation/
           --granularity "day"
      needs: [curation_weekly]
      outputs:
        moderately_sensitive:
          # Only output the single summary file
          cohort_report: output/curation/input_report_2022-07-01.html
  ### End curation check

  generate_study_population_report_monthly:
    run: cohortextractor:latest generate_cohort
      --study-definition study_definition_report
      --index-date-range "2019-01-01 to 2022-06-01 by month"
      --param frequency=monthly
      --output-dir=output/report
      --output-format=csv.gz
    outputs:
      highly_sensitive:
        cohort: output/report/input_*-01.csv.gz

  generate_study_population_report_weekly:
    run: cohortextractor:latest generate_cohort
      --study-definition study_definition_report
      --index-date-range "2022-07-01 to 2023-01-08 by week"
      --param frequency=weekly
      --output-dir=output/report
      --output-format=csv.gz
    outputs:
      highly_sensitive:
        cohort: output/report/input_*.csv.gz

  join_cohorts_report:
    run: >
      cohort-joiner:v0.0.38
        --lhs output/report/input_report_20*.csv.gz
        --rhs output/report/input_ethnicity_report.csv.gz
        --output-dir output/report/joined
    needs: [generate_study_population_report_monthly, generate_study_population_report_weekly, generate_study_population_report_ethnicity]
    outputs:
      highly_sensitive:
        cohort: output/report/joined/input_report_20*.csv.gz

  generate_measures_report:
    run: cohortextractor:latest generate_measures --study-definition study_definition_report --output-dir=output/report/joined
    needs: [join_cohorts_report]
    outputs:
      moderately_sensitive:
        measure_csv: output/report/joined/measure_event_*_rate.csv

  join_measures:
      run: python:latest python analysis/join_and_round.py
           --input-files output/report/joined/measure_*_rate.csv
           --output-dir output/report/joined
           --output-name "measure_all.csv"
           --skip-round
      needs: [generate_measures_report]
      outputs:
        moderately_sensitive:
          # Only output the single summary file
          measure_csv: output/report/joined/measure_all.csv

  top_5_table_report:
    run: >
      python:latest python analysis/report/top_5_report.py
      --input-file output/report/joined/measure_all.csv
      --output-dir output/report/joined
    needs: [join_measures]
    outputs:
      moderately_sensitive:
        tables: output/report/joined/top_5*.csv

  plot_measure_report:
    run: >
      python:latest python analysis/report/plot_measures_report.py
      --measure-path output/report/joined/measure_all.csv
      --output-dir output/report/joined
    needs: [join_measures]
    outputs:
      moderately_sensitive:
        measure: output/report/joined/*measures*.jpeg

  event_counts_report:
    run: >
      python:latest python analysis/report/event_counts.py --input-dir="output/report/joined" --output-dir="output/report" --measures="amoxicillin,azithromycin,clarithromycin,erythromycin,phenoxymethypenicillin"
    needs: [join_cohorts_report]
    outputs:
      moderately_sensitive:
        measure: output/report/event_counts_*.json

  # create_notebook:
  #   run: python:latest python analysis/report/create_notebook.py
  #   outputs:
  #     moderately_sensitive:
  #       notebook: output/report/report.ipynb

  # generate_notebook:
  #   run: jupyter:latest jupyter nbconvert /workspace/output/report/report.ipynb --execute --to html --output-dir=/workspace/output/report --ExecutePreprocessor.timeout=86400 --no-input
  #   needs: [create_notebook, event_counts_report, deciles_chart_report, top_5_table_report, plot_measure_report]
  #   outputs:
  #     moderately_sensitive:
  #       notebook: output/report/report.html

Timeline

  • Created:

  • Started:

  • Finished:

  • Runtime: 23:30:18

These timestamps are generated and stored using the UTC timezone on the TPP backend.

Job request

Status
Succeeded
Backend
TPP
Workspace
strepa_scarlet
Requested by
Christine Cunningham
Branch
main
Force run dependencies
No
Git commit hash
c141d6a
Requested actions
  • curation_monthly
  • dataset_report_monthly
  • curation_weekly
  • dataset_report_weekly

Code comparison

Compare the code used in this job request

  • No previous job request available for comparison