Job request: 14503

Organisation:: Bennett Institute
Workspace:: strepa_scarlet
ID:: muylcqc5ntifqfmp

This page shows the technical details of what happened when the authorised researcher Christine Cunningham requested one or more actions to be run against real patient data within a secure environment.

By cross-referencing the list of jobs with the pipeline section below, you can infer what security level the outputs were written to.

The output security levels are:

highly_sensitive
- Researchers can never directly view these outputs
- Researchers can only request code is run against them
moderately_sensitive
- Can be viewed by an approved researcher by logging into a highly secure environment
- These are the only outputs that can be requested for public release via a controlled output review service.

Jobs

Action:

curation_monthly

Status:

Status: Succeeded

Job identifier:

v2and3buh7qtdlqy
Action:

curation_weekly

Status:

Status: Succeeded

Job identifier:

rhqnftww2aslzeew
Action:

dataset_report_weekly

Status:

Status: Succeeded

Job identifier:

rhjimqg74y2fqt65
Action:

dataset_report_monthly

Status:

Status: Succeeded

Job identifier:

lbtr5gymsurf7utj

Pipeline

Show project.yaml

version: '3.0'

expectations:
  population_size: 1000

actions:
  generate_study_population_report_ethnicity:
    run: cohortextractor:latest generate_cohort 
      --study-definition study_definition_ethnicity_report --output-dir output/report --output-format=csv.gz
    outputs:
      highly_sensitive:
        cohort: output/report/input_ethnicity_report.csv.gz

  ### Curation check ###
  curation_monthly:
    run: cohortextractor:latest generate_cohort
      --study-definition study_definition_report
      --index-date-range "2019-01-01 to 2019-01-01 by month"
      --param frequency=monthly
      --output-dir=output/curation
      --output-format=csv.gz
    outputs:
      highly_sensitive:
        cohort: output/curation/input_report_2019-01-01.csv.gz

  dataset_report_monthly:
      run: python:latest python analysis/dataset_report.py
           --input-files output/curation/input_report_2019-01-01.csv.gz
           --output-dir output/curation/
           --granularity "year"
      needs: [curation_monthly]
      outputs:
        moderately_sensitive:
          # Only output the single summary file
          cohort_report: output/curation/input_report_2019-01-01.html

  curation_weekly:
    run: cohortextractor:latest generate_cohort
      --study-definition study_definition_report
      --index-date-range "2022-07-01 to 2022-07-01 by week"
      --param frequency=weekly
      --output-dir=output/curation
      --output-format=csv.gz
    outputs:
      highly_sensitive:
        cohort: output/curation/input_report_2022-07-01.csv.gz

  dataset_report_weekly:
      run: python:latest python analysis/dataset_report.py
           --input-files output/curation/input_report_2022-07-01.csv.gz
           --output-dir output/curation/
           --granularity "day"
      needs: [curation_weekly]
      outputs:
        moderately_sensitive:
          # Only output the single summary file
          cohort_report: output/curation/input_report_2022-07-01.html
  ### End curation check

  generate_study_population_report_monthly:
    run: cohortextractor:latest generate_cohort
      --study-definition study_definition_report
      --index-date-range "2019-01-01 to 2022-06-01 by month"
      --param frequency=monthly
      --output-dir=output/report
      --output-format=csv.gz
    outputs:
      highly_sensitive:
        cohort: output/report/input_*-01.csv.gz

  generate_study_population_report_weekly:
    run: cohortextractor:latest generate_cohort
      --study-definition study_definition_report
      --index-date-range "2022-07-01 to 2023-01-08 by week"
      --param frequency=weekly
      --output-dir=output/report
      --output-format=csv.gz
    outputs:
      highly_sensitive:
        cohort: output/report/input_*.csv.gz

  join_cohorts_report:
    run: >
      cohort-joiner:v0.0.38
        --lhs output/report/input_report_20*.csv.gz
        --rhs output/report/input_ethnicity_report.csv.gz
        --output-dir output/report/joined
    needs: [generate_study_population_report_monthly, generate_study_population_report_weekly, generate_study_population_report_ethnicity]
    outputs:
      highly_sensitive:
        cohort: output/report/joined/input_report_20*.csv.gz

  generate_measures_report:
    run: cohortextractor:latest generate_measures --study-definition study_definition_report --output-dir=output/report/joined
    needs: [join_cohorts_report]
    outputs:
      moderately_sensitive:
        measure_csv: output/report/joined/measure_event_*_rate.csv

  join_measures:
      run: python:latest python analysis/join_and_round.py
           --input-files output/report/joined/measure_*_rate.csv
           --output-dir output/report/joined
           --output-name "measure_all.csv"
           --skip-round
      needs: [generate_measures_report]
      outputs:
        moderately_sensitive:
          # Only output the single summary file
          measure_csv: output/report/joined/measure_all.csv

  top_5_table_report:
    run: >
      python:latest python analysis/report/top_5_report.py
      --input-file output/report/joined/measure_all.csv
      --output-dir output/report/joined
    needs: [join_measures]
    outputs:
      moderately_sensitive:
        tables: output/report/joined/top_5*.csv

  plot_measure_report:
    run: >
      python:latest python analysis/report/plot_measures_report.py
      --measure-path output/report/joined/measure_all.csv
      --output-dir output/report/joined
    needs: [join_measures]
    outputs:
      moderately_sensitive:
        measure: output/report/joined/*measures*.jpeg

  event_counts_report:
    run: >
      python:latest python analysis/report/event_counts.py --input-dir="output/report/joined" --output-dir="output/report" --measures="amoxicillin,azithromycin,clarithromycin,erythromycin,phenoxymethypenicillin"
    needs: [join_cohorts_report]
    outputs:
      moderately_sensitive:
        measure: output/report/event_counts_*.json

  # create_notebook:
  #   run: python:latest python analysis/report/create_notebook.py
  #   outputs:
  #     moderately_sensitive:
  #       notebook: output/report/report.ipynb

  # generate_notebook:
  #   run: jupyter:latest jupyter nbconvert /workspace/output/report/report.ipynb --execute --to html --output-dir=/workspace/output/report --ExecutePreprocessor.timeout=86400 --no-input
  #   needs: [create_notebook, event_counts_report, deciles_chart_report, top_5_table_report, plot_measure_report]
  #   outputs:
  #     moderately_sensitive:
  #       notebook: output/report/report.html

Timeline

Created: 2 years, 10 months ago 19 Jan 2023 09:27:15 UTC
Started: 2 years, 10 months ago 19 Jan 2023 09:27:18 UTC
Finished: 2 years, 10 months ago 19 Jan 2023 22:00:07 UTC
Runtime: 23:30:18

These timestamps are generated and stored using the UTC timezone on the TPP backend.

Job request

Status: Succeeded
Backend: TPP
Workspace: strepa_scarlet
Requested by: Christine Cunningham
Branch: main
Force run dependencies: No
Git commit hash: c141d6a
Requested actions: curation_monthly

dataset_report_monthly

curation_weekly

dataset_report_weekly

Code comparison

Compare the code used in this job request

No previous job request available for comparison