Job request: 17778

Organisation:: The London School of Hygiene & Tropical Medicine
Workspace:: healthcare_utilisation_openprompt
ID:: lycwjomegfdd3pbv

This page shows the technical details of what happened when authorised researcher Liang-Yu Lin requested one or more actions to be run against real patient data in the project, within a secure environment.

By cross-referencing the indicated Requested Actions with the Pipeline section below, you can infer what security level various outputs were written to. Outputs marked as highly_sensitive can never be viewed directly by a researcher; they can only request that code runs against them. Outputs marked as moderately_sensitive can be viewed by an approved researcher by logging into a highly secure environment. Only outputs marked as moderately_sensitive can be requested for release to the public, via a controlled output review service.

Jobs

Action:

generate_long_covid_exposure_dataset

Status:

Status: Succeeded

Job identifier:

ycu2pw7aayrivcqj
Action:

generate_list_gp_use_long_covid_dx

Status:

Status: Succeeded

Job identifier:

kuuwtnbgewsg5rna
Action:

generate_dataset_comparator_exclude_gp_no_long_covid

Status:

Status: Succeeded

Job identifier:

frtk5wokczepbe2f
Action:

test_matching

Status:

Status: Succeeded

Job identifier:

icnqoqstbz3l4r2u
Action:

import_matched_exposure

Status:

Status: Succeeded

Job identifier:

62ricfdul3asa5ja
Action:

import_matched_controls

Status:

Status: Succeeded

Job identifier:

vavmp27jldrzaoiw
Action:

report01_matched_datasets

Status:

Status: Succeeded

Job identifier:

dca4o5hxwzldgyr4
Action:

report_03_hurdle_model

Status:

Status: Succeeded

Job identifier:

zdlc5axfdhysfo5z

Pipeline

Show project.yaml

version: '3.0'

expectations:
  population_size: 500

actions:

  generate_long_covid_exposure_dataset:
    run: 
      databuilder:v0 generate-dataset
        analysis/dataset_definition_unmatched_exp_lc.py
        --output output/dataset_exp_lc_unmatched.csv
    outputs:
      highly_sensitive:
        cohort: output/dataset_exp_lc_unmatched.csv

  generate_list_gp_use_long_covid_dx:
    run: 
      databuilder:v0 generate-dataset
        analysis/dataset_definition_lc_gp_list.py
        --output output/dataset_lc_gp_list.csv
    outputs:
      highly_sensitive:
        cohort: output/dataset_lc_gp_list.csv

  generate_dataset_comparator_exclude_gp_no_long_covid:
    needs: [generate_list_gp_use_long_covid_dx]
    run: 
      databuilder:v0 generate-dataset
        analysis/dataset_definition_unmatched_comparator.py
        --output output/dataset_comparator_unmatched.csv
    outputs:
      highly_sensitive:
        cohort: output/dataset_comparator_unmatched.csv

  test_matching:
    run:
      python:latest python analysis/match_test.py
    needs: [generate_dataset_comparator_exclude_gp_no_long_covid, generate_long_covid_exposure_dataset]
    outputs: 
      highly_sensitive:
        matched_cases: output/matched_cases_stp.csv
        matched_matches: output/matched_matches_stp.csv
        matched_all: output/matched_combined_stp.csv
      moderately_sensitive: 
        matching_report: output/matching_report_stp.txt

  import_matched_exposure:
    run: >
       databuilder:v0
        generate-dataset analysis/dataset_definition_matched_cases.py
        --output output/matched_cases_with_ehr.csv
    needs: [test_matching]
    outputs: 
      highly_sensitive:
        cohort: output/matched_cases_with_ehr.csv

  import_matched_controls:
    run: >
       databuilder:v0
        generate-dataset analysis/dataset_definition_matched_control.py
        --output output/matched_control_with_ehr.csv
    needs: [test_matching]
    outputs: 
      highly_sensitive:
        cohort: output/matched_control_with_ehr.csv

  generate_historical_exp_data:
    run: 
      databuilder:v0 generate-dataset analysis/dataset_definition_hx_unmatched_exp_lc.py
        --output output/hx_unmatched_exp.csv
    outputs:
      highly_sensitive:
        hx_cohort: output/hx_unmatched_exp.csv
  
  generate_historical_comp_data_exclude_gp_no_long_covid:
    needs: [generate_list_gp_use_long_covid_dx]
    run: 
      databuilder:v0 generate-dataset analysis/dataset_definition_hx_unmatched_com_no_lc.py
        --output output/hx_dataset_comp_unmatched.csv
    outputs:
      highly_sensitive:
        hx_cohort: output/hx_dataset_comp_unmatched.csv

  historical_matching:
    run:
      python:latest python analysis/match_historical.py
    needs: [generate_historical_exp_data, generate_historical_comp_data_exclude_gp_no_long_covid]
    outputs: 
      highly_sensitive:
        matched_cases: output/matched_cases_historical.csv
        matched_matches: output/matched_matches_historical.csv
        matched_all: output/matched_combined_historical.csv
      moderately_sensitive: 
        matching_report: output/matching_report_historical.txt

  import_matched_historical_exposure:
    run: >
       databuilder:v0
        generate-dataset analysis/dataset_definition_hx_matched_exp_lc.py
        --output output/hx_matched_cases_with_ehr.csv
    needs: [historical_matching]
    outputs: 
      highly_sensitive:
        cohort: output/hx_matched_cases_with_ehr.csv

  import_matched_historical_controls:
    run: >
       databuilder:v0
        generate-dataset analysis/dataset_definition_hx_matched_comp.py
        --output output/hx_matched_control_with_ehr.csv
    needs: [historical_matching]
    outputs: 
      highly_sensitive:
        cohort: output/hx_matched_control_with_ehr.csv

# Reporting:

  report01_matched_datasets:
    needs: [import_matched_exposure, import_matched_controls]
    run: 
      r:latest analysis/st01_report_matched.R
    outputs: 
      moderately_sensitive: 
        matched_table: output/st01_matched_numbers_table.csv
        explore_vax_fig: output/st1_exporing_vax_index_date.png
        missing_table: output/missing_distribution_table.csv
        missing_pattern: output/missing_pattern_current.png

  report02_hx_matched_datasets:
    needs: [import_matched_historical_exposure, import_matched_historical_controls]
    run: 
      r:latest analysis/st02_report_matched_historical.R
    outputs: 
      moderately_sensitive: 
        matched_table: output/hx_matched_numbers_table.csv

  report_03_hurdle_model:
    needs: [import_matched_exposure, import_matched_controls]
    run: 
      r:latest analysis/st03_hurdle_model.R
    outputs: 
      moderately_sensitive: 
        model_table: output/st03_monthly_visits_crude_hurdle.csv

Timeline

Created: 11 months ago 25 May 2023 15:23:18 UTC
Started: 11 months ago 25 May 2023 15:23:41 UTC
Finished: 10 months, 4 weeks ago 26 May 2023 08:05:11 UTC
Runtime: 17:16:12

These timestamps are generated and stored using the UTC timezone on the TPP backend.

Job information

Status: Succeeded
Backend: TPP
Workspace: healthcare_utilisation_openprompt
Requested by: Liang-Yu Lin
Branch: main
Force run dependencies: No
Git commit hash: 5f26293
Requested actions: generate_long_covid_exposure_dataset

generate_list_gp_use_long_covid_dx

generate_dataset_comparator_exclude_gp_no_long_covid

test_matching

import_matched_exposure

import_matched_controls

report01_matched_datasets

report_03_hurdle_model

Code comparison

Compare the code used in this Job Request