Skip to content

Job request: 19535

Organisation:
The London School of Hygiene & Tropical Medicine
Workspace:
healthcare_utilisation_openprompt
ID:
6r2fmupujc4crbe5

This page shows the technical details of what happened when the authorised researcher Liang-Yu Lin requested one or more actions to be run against real patient data in the project, within a secure environment.

By cross-referencing the list of jobs with the pipeline section below, you can infer what security level various outputs were written to. Researchers can never directly view outputs marked as highly_sensitive ; they can only request that code runs against them. Outputs marked as moderately_sensitive can be viewed by an approved researcher by logging into a highly secure environment. Only outputs marked as moderately_sensitive can be requested for release to the public, via a controlled output review service.

Jobs

Pipeline

Show project.yaml
version: '3.0'

expectations:
  population_size: 5000

actions:
# Contemporary comparison data management:
# # Before matching data management:
  generate_long_covid_exposure_dataset:
    run: 
      databuilder:v0 generate-dataset
        analysis/dataset_definition_unmatched_exp_lc.py
        --output output/dataset_exp_lc_unmatched.csv.gz
    outputs:
      highly_sensitive:
        cohort: output/dataset_exp_lc_unmatched.csv.gz

  generate_list_gp_use_long_covid_dx:
    run: 
      databuilder:v0 generate-dataset
        analysis/dataset_definition_lc_gp_list.py
        --output output/dataset_lc_gp_list.csv.gz
    outputs:
      highly_sensitive:
        cohort: output/dataset_lc_gp_list.csv.gz

  generate_dataset_comparator_exclude_gp_no_long_covid:
    needs: [generate_list_gp_use_long_covid_dx]
    run: 
      databuilder:v0 generate-dataset
        analysis/dataset_definition_unmatched_comparator.py
        --output output/dataset_comparator_unmatched.csv.gz
    outputs:
      highly_sensitive:
        cohort: output/dataset_comparator_unmatched.csv.gz
# # OS matching
  test_matching:
    run:
      python:latest python analysis/match_test.py
    needs: [generate_dataset_comparator_exclude_gp_no_long_covid, generate_long_covid_exposure_dataset]
    outputs: 
      highly_sensitive:
        matched_cases: output/matched_cases_stp.csv
        matched_matches: output/matched_matches_stp.csv
        matched_all: output/matched_combined_stp.csv
      moderately_sensitive: 
        matching_report: output/matching_report_stp.txt
# # After matching data management
  import_matched_exposure:
    run: >
       databuilder:v0
        generate-dataset analysis/dataset_definition_matched_cases.py
        --output output/matched_cases_with_ehr.csv.gz
    needs: [test_matching]
    outputs: 
      highly_sensitive:
        cohort: output/matched_cases_with_ehr.csv.gz

  import_matched_controls:
    run: >
       databuilder:v0
        generate-dataset analysis/dataset_definition_matched_control.py
        --output output/matched_control_with_ehr.csv.gz
    needs: [test_matching]
    outputs: 
      highly_sensitive:
        cohort: output/matched_control_with_ehr.csv.gz


  # import_matched_exposure_drug_cost:
  #   run: >
  #      databuilder:v0
  #       generate-dataset analysis/dataset_definition_matched_cases_drug_costs.py
  #       --output output/matched_cases_with_drug_costs.csv.gz
  #   needs: [test_matching]
  #   outputs: 
  #     highly_sensitive:
  #       cohort: output/matched_cases_with_drug_costs.csv.gz

  # import_matched_controls_drug_costs:
  #   run: >
  #      databuilder:v0
  #       generate-dataset analysis/dataset_definition_matched_control_drug_costs.py
  #       --output output/matched_control_with_drug_costs.csv.gz
  #   needs: [test_matching]
  #   outputs: 
  #     highly_sensitive:
  #       cohort: output/matched_control_with_drug_costs.csv.gz


# Historical comparison data management:
# # Before matching data management:
  generate_historical_exp_data:
    run: 
      databuilder:v0 generate-dataset analysis/dataset_definition_hx_unmatched_exp_lc.py
        --output output/hx_unmatched_exp.csv.gz
    outputs:
      highly_sensitive:
        hx_cohort: output/hx_unmatched_exp.csv.gz
  
  generate_historical_comp_data_exclude_gp_no_long_covid:
    needs: [generate_list_gp_use_long_covid_dx]
    run: 
      databuilder:v0 generate-dataset analysis/dataset_definition_hx_unmatched_com_no_lc.py
        --output output/hx_dataset_comp_unmatched.csv.gz
    outputs:
      highly_sensitive:
        hx_cohort: output/hx_dataset_comp_unmatched.csv.gz
# # OS matching
  historical_matching:
    run:
      python:latest python analysis/match_historical.py
    needs: [generate_historical_exp_data, generate_historical_comp_data_exclude_gp_no_long_covid]
    outputs: 
      highly_sensitive:
        matched_cases: output/matched_cases_historical.csv
        matched_matches: output/matched_matches_historical.csv
        matched_all: output/matched_combined_historical.csv
      moderately_sensitive: 
        matching_report: output/matching_report_historical.txt

# # After matching data management
  import_matched_historical_exposure:
    run: >
       databuilder:v0
        generate-dataset analysis/dataset_definition_hx_matched_exp_lc.py
        --output output/hx_matched_cases_with_ehr.csv.gz
    needs: [historical_matching]
    outputs: 
      highly_sensitive:
        cohort: output/hx_matched_cases_with_ehr.csv.gz

  import_matched_historical_controls:
    run: >
       databuilder:v0
        generate-dataset analysis/dataset_definition_hx_matched_comp.py
        --output output/hx_matched_control_with_ehr.csv.gz
    needs: [historical_matching]
    outputs: 
      highly_sensitive:
        cohort: output/hx_matched_control_with_ehr.csv.gz

# Reporting: demographic distribution 

# Contemporary comparison: 
  report01_matched_datasets:
    needs: [import_matched_exposure, import_matched_controls]
    run: 
      r:latest analysis/st01_report_matched.R
    outputs: 
      moderately_sensitive: 
        matched_table: output/st01_matched_numbers_table.csv  # Table 1
        explore_vax_fig: output/st1_exporing_vax_index_date.png # Check vax date
        missing_table: output/missing_distribution_table.csv # Missing pattern tab
        missing_pattern: output/missing_pattern_current.png # Missing patter plot

# Historical comparison
  report02_hx_matched_datasets:
    needs: [import_matched_historical_exposure, import_matched_historical_controls]
    run: 
      r:latest analysis/st02_report_matched_historical.R
    outputs: 
      moderately_sensitive: 
        matched_table: output/st02_hx_matched_numbers_table.csv

# Contemporary comparison
# # Basic: including two-part (non-cluster) analysis

  report_02_basic_model_output:
    needs: [import_matched_exposure, import_matched_controls]
    run: 
      r:latest analysis/st02_model_curr_nb_reg.R
    outputs: 
      moderately_sensitive: 
        model_non_clustered_results: output/st_02_non_cluster_model.csv



# # Advanced: cluster analysis
  report_03_model_02_clustered_analysis:
    needs: [import_matched_exposure, import_matched_controls]
    run: 
      r:latest analysis/st03_model_02_cluster_analysis.R
    outputs: 
      moderately_sensitive: 
        # model_compares_poisson: output/st03_model_02_rm_models.csv.gz
        gee_outputs: output/st03_model_02_gee_models.csv
        
        
  report_03_hurdle_model:
    needs: [import_matched_exposure, import_matched_controls]
    run: 
      r:latest analysis/st03_hurdle_model_rate_ratio.R
    outputs: 
      moderately_sensitive: 
        model_selection: output/sup_st03_0_model_comparison.csv
        hurdle_all: output/st_03_result_monthly_visit_hurdle.csv
        
  report_03_1_hurdle_model_predict:
    needs: [import_matched_exposure, import_matched_controls]
    run: 
      r:latest analysis/st03_hurdle_model_predict.R
    outputs: 
      moderately_sensitive: 
        hurdle_all: output/st_03_result_cumulative_visit_hurdle.csv
        hurdle_gp: output/st_03_gp_result_cumulative_visit_hurdle.csv
        hurdle_hos: output/st_03_hos_result_cumulative_visit_hurdle.csv
        hurdle_ae: output/st_03_ae_result_cumulative_visit_hurdle.csv
        
  report_04_hurdle_model_plot:
    needs: [report_03_1_hurdle_model_predict]
    run: 
      r:latest analysis/st04_plot_hurdle_visit.R
    outputs: 
      moderately_sensitive: 
        crude: output/st_04_crude_healthcare_visit.png
        partial: output/st_04_partial_adj_healthcare_visit.png
        full: output/st_04_full_adj_healthcare_visit.png
  
  report_03_2_hurdle_model_sugroup_predict:
    needs: [import_matched_exposure, import_matched_controls]
    run: 
      r:latest analysis/st03_sub_hurdle_model_predict.R
    outputs: 
      moderately_sensitive: 
        subgroup_hos_visit: output/st_03_subgroup_result_hos_cumulative_visit_hurdle.csv
        
  report_03_3_twopart_model_predict:
    needs: [import_matched_exposure, import_matched_controls]
    run: 
      r:latest analysis/st03_two_part_model.R
    outputs: 
      moderately_sensitive: 
        total_costs: output/st_04_result_cumulative_cost_full_2pm.csv
        
  report_03_4_twopart_model_subgroup_predict:
    needs: [import_matched_exposure, import_matched_controls]
    run: 
      r:latest analysis/st03_twopart_sub_model.R
    outputs: 
      moderately_sensitive: 
        sub_costs: output/st_04_result_sub_cumulative_cost_2pm.csv       

  report_04_plot_twopart_costs:
    needs: [report_03_3_twopart_model_predict]
    run: 
      r:latest analysis/st04_fig_plot_twopm_cost.R
    outputs: 
      moderately_sensitive: 
        plot_total_costs: "output/st_fig_04_cumulative_costs.png"
        
  report_05_historical_did_model:
    needs: [import_matched_historical_exposure, import_matched_historical_controls]
    run: 
      r:latest analysis/st05_did_model.R
    outputs:
      highly_sensitive: 
        fitted_did: output/predicted_did_counts.csv
      moderately_sensitive: 
        model_dispersion: output/sup_st01_model_compare.csv
        stats_output: output/st05_did_stats.csv
        summarised_did_predicted: output/st05_summarised_did_predicted_results.csv
        
# Supplementary materials

  report_sup_exploring_outcome_dist:
    needs: [import_matched_exposure, import_matched_controls]
    run: 
      r:latest analysis/st_sup_01_data_exploration.R
    outputs: 
      moderately_sensitive: 
        zero_percent_fig: output/st_sup_1_5_explore_zero_percentage.png
        monthly_oucome_tb: output/st_sup_1_5_monthly_outcome_distribution.csv
        visit_explore: output/st_sup_1_5_cat_visits_summary.csv
        
  report_sup_model_selection:
    needs: [import_matched_exposure, import_matched_controls]
    run: 
      r:latest analysis/st_sup_01_model_selection.R
    outputs: 
      moderately_sensitive: 
        model_01_poisson: output/st_sup_model_selection.csv

Timeline

  • Created:

  • Started:

  • Finished:

  • Runtime: 00:03:25

These timestamps are generated and stored using the UTC timezone on the TPP backend.

Job information

Status
Failed
Backend
TPP
Requested by
Liang-Yu Lin
Branch
main
Force run dependencies
No
Git commit hash
e4cf04c
Requested actions
  • generate_dataset_comparator_exclude_gp_no_long_covid
  • test_matching
  • import_matched_exposure
  • import_matched_controls

Code comparison

Compare the code used in this Job Request