Skip to content

Job request: 3218

Organisation:
Bennett Institute
Workspace:
cohortextractor-v2-testing-long-covid
ID:
32quqg3umwlrsra5

This page shows the technical details of what happened when the authorised researcher Dave Evans requested one or more actions to be run against real patient data within a secure environment.

By cross-referencing the list of jobs with the pipeline section below, you can infer what security level the outputs were written to.

The output security levels are:

  • highly_sensitive
    • Researchers can never directly view these outputs
    • Researchers can only request code is run against them
  • moderately_sensitive
    • Can be viewed by an approved researcher by logging into a highly secure environment
    • These are the only outputs that can be requested for public release via a controlled output review service.

Jobs

Pipeline

Show project.yaml
version: '3.0'

expectations:
  population_size: 2000

actions:

  generate_cohort:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_cohort
    outputs:
      highly_sensitive:
        cohort: output/input_cohort.csv

  count_by_strata:
    run: python:latest python analysis/all_time_counts.py
    needs: [generate_cohort]
    outputs:
      moderately_sensitive:
        table: output/counts_table.csv
        practice_distribution: output/practice_distribution.csv
        per_week: output/code_use_per_week_long_covid.csv
        per_week_pvf: output/code_use_per_week_post_viral_fatigue.csv
        code_table: output/all_long_covid_codes.csv
        practice_summ: output/practice_summ.txt

  # # to be run locally
  generate_report_notebook:
      run: jupyter:latest jupyter nbconvert /workspace/analysis/long_covid_coding_report.ipynb --execute --to html --output-dir=/workspace/released_outputs --ExecutePreprocessor.timeout=86400 --no-input
      outputs:
        moderately_sensitive:
          notebook: released_outputs/long_covid_coding_report.html

  # Uses V1 with ethnicity codes instead of categories, run locally to generate a
  # dummy data file for V2
  generate_cohort_v1_for_v2_comparison:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_cohort_v1_with_ethnicity_codes
    outputs:
      highly_sensitive:
        cohort: output/v2/input_cohort_v1_with_ethnicity_codes.csv

  # Cohort Extractor V2
  generate_cohort_v2:
    run: cohortextractor-v2:latest --cohort-definition analysis/study_definition_cohort_v2.py --output output/v2/input_cohort.csv --dummy-data-file analysis/dummy_data.csv
    outputs:
      highly_sensitive:
        cohort: output/v2/input_cohort.csv

  count_by_strata_v2:
    run: python:latest python analysis/all_time_counts_v2.py
    needs: [generate_cohort_v2]
    outputs:
      moderately_sensitive:
        table: output/v2/counts_table.csv
        practice_distribution: output/v2/practice_distribution.csv
        per_week: output/v2/code_use_per_week_long_covid.csv
        per_week_pvf: output/v2/code_use_per_week_post_viral_fatigue.csv
        code_table: output/v2/all_long_covid_codes.csv
        practice_summ: output/v2/practice_summ.txt

Timeline

  • Created:

  • Started:

  • Finished:

  • Runtime: 01:07:30

These timestamps are generated and stored using the UTC timezone on the TPP backend.

Job request

Status
Succeeded
Backend
TPP
Requested by
Dave Evans
Branch
simplified-for-ce2
Force run dependencies
No
Git commit hash
c9260c7
Requested actions
  • generate_cohort_v2

Code comparison

Compare the code used in this job request