Job request: 21342
- Organisation:
 - The London School of Hygiene & Tropical Medicine
 - Workspace:
 - openprompt-cohort-profile
 - ID:
 - 7qgs42i3nfyr3ozn
 
This page shows the technical details of what happened when the authorised researcher Alasdair Henderson requested one or more actions to be run against real patient data within a secure environment.
By cross-referencing the list of jobs with the pipeline section below, you can infer what security level the outputs were written to.
The output security levels are:
- 
                highly_sensitive
                
- Researchers can never directly view these outputs
 - Researchers can only request code is run against them
 
 - 
                moderately_sensitive
                
- Can be viewed by an approved researcher by logging into a highly secure environment
 - These are the only outputs that can be requested for public release via a controlled output review service.
 
 
Jobs
- 
                
- Job identifier:
 - 
                    
                    
vpxiuuiz2b5wn6xs 
 - 
                
- Job identifier:
 - 
                    
                    
ksexeyddb42jsfgy 
 - 
                
- Job identifier:
 - 
                    
                    
jb6luro4r6hbkj7u 
 - 
                
- Job identifier:
 - 
                    
                    
4pu454khipyxhtdc 
 - 
                
- Job identifier:
 - 
                    
                    
shsfpip4iher5vaz 
 - 
                
- Job identifier:
 - 
                    
                    
cidpmi6nbjroh4ua 
 - 
                
- Job identifier:
 - 
                    
                    
aq2cxhmdmez5keyv 
 - 
                
- Job identifier:
 - 
                    
                    
k3delh5e5t7vyrmv 
 - 
                
- Job identifier:
 - 
                    
                    
pqhywyw2utyu23wb 
 - 
                
- Job identifier:
 - 
                    
                    
6mnzyriona2sk2qn 
 - 
                
- Job identifier:
 - 
                    
                    
whg2rjefuethdf65 
 - 
                
- Job identifier:
 - 
                    
                    
hk4jcv6vzzljmamp 
 - 
                
- Job identifier:
 - 
                    
                    
cymj56du3cenjwjs 
 - 
                
- Job identifier:
 - 
                    
                    
dyz3a5zyt7j3e4mi 
 - 
                
- Job identifier:
 - 
                    
                    
ees67fvv3hfwwbsb 
 
Pipeline
Show project.yaml
version: '3.0'
expectations:
  population_size: 1000
actions:
  create_dummy_data: 
    run: >
      ehrql:v0
        create-dummy-tables 
        analysis/dataset_definition.py output/dummydata 
        -- 
        --day=0
    outputs: 
      highly_sensitive:
        openprompt_dummy: output/dummydata/open_prompt.csv
  edit_dummy_data:
    run: > 
      r:latest
        analysis/00_dummy_data_editing/edit_automatic_dummy_data.R
    needs: [create_dummy_data]
    outputs: 
      highly_sensitive: 
        openprompt_dummy_edited: output/dummydata/dummy_edited/open_prompt.csv
  scrape_all_data: 
    run: >
      ehrql:v0
        generate-dataset
        analysis/scrape_data_response_dates.py
        --output output/openprompt_all.csv
        --dummy-tables output/dummydata/dummy_edited
    needs: [edit_dummy_data]
    outputs:
      highly_sensitive: 
        openprompt_all: output/openprompt_all.csv
  generate_openprompt_survey1: 
    run: >
      ehrql:v0
        generate-dataset 
        analysis/dataset_definition.py 
        --output output/openprompt_survey1.csv
        --dummy-tables output/dummydata/dummy_edited
        --
        --day=0
        --window=5
    needs: [edit_dummy_data]
    outputs:
      highly_sensitive:
        openprompt_survey1: output/openprompt_survey1.csv
  generate_openprompt_survey2: 
    run: >
      ehrql:v0
        generate-dataset 
        analysis/dataset_definition.py 
        --output output/openprompt_survey2.csv
        --dummy-tables output/dummydata/dummy_edited
        --
        --day=30
        --window=5
    needs: [edit_dummy_data]
    outputs:
      highly_sensitive:
        openprompt_survey2: output/openprompt_survey2.csv
  generate_openprompt_survey3: 
    run: >
      ehrql:v0
        generate-dataset 
        analysis/dataset_definition.py 
        --output output/openprompt_survey3.csv
        --dummy-tables output/dummydata/dummy_edited
        --
        --day=60
        --window=5
    needs: [edit_dummy_data]
    outputs:
      highly_sensitive:
        openprompt_survey3: output/openprompt_survey3.csv
  generate_openprompt_survey4: 
    run: >
      ehrql:v0
        generate-dataset 
        analysis/dataset_definition.py 
        --output output/openprompt_survey4.csv
        --dummy-tables output/dummydata/dummy_edited
        --
        --day=90
        --window=5
    needs: [edit_dummy_data]
    outputs:
      highly_sensitive:
        openprompt_survey4: output/openprompt_survey4.csv
  extract_linked_tpp_info: 
    run: >
      ehrql:v0
        generate-dataset 
        analysis/add_tpp_data.py 
        --output output/openprompt_linked_tpp.csv.gz
    outputs:
      highly_sensitive:
        linked_tpp_data: output/openprompt_linked_tpp.csv.gz
  
  extract_tpp_demographics: 
    run: >
      ehrql:v0
        generate-dataset 
        analysis/extract_tpp_demographics.py 
        --output output/opensafely_tpp_demographics.csv.gz
    outputs:
      highly_sensitive:
        tpp_demog_data: output/opensafely_tpp_demographics.csv.gz
  
  datacombine_and_figure1:
      run: >
        r:latest analysis/01_data_import/101_data_combine.R
      needs: [scrape_all_data, generate_openprompt_survey1, generate_openprompt_survey2, generate_openprompt_survey3, generate_openprompt_survey4]
      outputs: 
        highly_sensitive: 
          openprompt_combined: output/openprompt_raw.gz.parquet
        moderately_sensitive:
          openprompt_raw_skim: output/data_properties/op_raw_skim.txt
          openprompt_raw_tab: output/data_properties/op_raw_tabulate.txt
          openprompt_mapped_skim: output/data_properties/op_mapped_skim.txt
          openprompt_mapped_tab: output/data_properties/op_mapped_tabulate.txt
          raw_summ_base_s: output/data_properties/op_baseline_skim.txt
          raw_summ_base_t: output/data_properties/op_baseline_tabulate.txt
          raw_summ_survey1_s: output/data_properties/op_survey1_skim.txt
          raw_summ_survey1_t: output/data_properties/op_survey1_tabulate.txt
          raw_summ_survey2_s: output/data_properties/op_survey2_skim.txt
          raw_summ_survey2_t: output/data_properties/op_survey2_tabulate.txt
          raw_summ_survey3_s: output/data_properties/op_survey3_skim.txt
          raw_summ_survey3_t: output/data_properties/op_survey3_tabulate.txt
          raw_summ_survey4_s: output/data_properties/op_survey4_skim.txt
          raw_summ_survey4_t: output/data_properties/op_survey4_tabulate.txt
          check_days_after_baseline: output/data_properties/sample_day_lags.pdf
          p1a_jpeg: output/plots/p1a_index_dates.jpeg
          p1b_jpeg: output/plots/p1b_recorded_question_responses.jpeg
          p1d_jpeg: output/plots/p1c_ltfu.jpeg
          p1c_jpeg: output/plots/p1d_anyresponse_hist.jpeg
          p1_jpeg: output/plots/p1_fup.jpeg
          p1_pdf: output/plots/p1_fup.pdf
          p1b_data: output/tables/fig1b_data.csv
          p1c_data: output/tables/fig1c_data.csv
          p1d_data: output/tables/fig1d_data.csv
  
  import_linked_tpp: 
    run: >
      r:latest analysis/01_data_import/102_import_linked_tpp.R
    needs: [extract_linked_tpp_info]
    outputs: 
      highly_sensitive: 
        linked_tpp_data_edited: output/openprompt_linked_tpp_edited.gz.parquet
      moderately_sensitive:
        openprompt_tpp_skim: output/data_properties/op_tpp_skim.txt
        openprompt_tpp_tab: output/data_properties/op_tpp_tabulate.txt
  export_table1_stats: 
    run: >
      r:latest analysis/02_baseline_descriptive/201_table1stats.R
    needs: [datacombine_and_figure1, import_linked_tpp]
    outputs: 
      moderately_sensitive: 
        table1: output/tables/table1.html
        table1_stats: output/tables/table1_stats.csv
# Then need to run analysis/02_baseline_descriptive/002-map-making.R to produce the Figure 2 for the paper
  export_table2_fup_stats: 
    run: >
      r:latest analysis/03_followup_responses/301_followup_table.R
    needs: [datacombine_and_figure1]
    outputs:
      moderately_sensitive: 
        table2: output/tables/table2_followup.html
        table2_stats: output/tables/table2_fup_stats.csv
# Then need to run analysis/03_followup_responses/302_followup_plots.R
  export_data_agreement: 
    run: > 
      r:latest analysis/04_data_agreement/401_data_agreement.R
    needs: [datacombine_and_figure1, import_linked_tpp]
    outputs: 
      moderately_sensitive: 
        venndiagram_jpeg: output/plots/venn.jpeg
        venndiagram_pdf: output/plots/venn.pdf
        agreementdata: output/tables/table3_agreement.csv
        
  export_data_representativeness: 
    run: > 
      r:latest analysis/05_representativeness/501_demographic_representativeness.R
    needs: [datacombine_and_figure1, extract_tpp_demographics]
    outputs: 
      moderately_sensitive: 
        table3_stats: output/tables/table3_rep_stats.csv
        table3: output/tables/table3_represetativeness.html
  
# Then need to run analysis/05_representativeness/502_representativeness_plot.R to produce the Figure 4 for the paper
Timeline
- 
  
    
  
  
Created:
 - 
  
    
  
  
Started:
 - 
  
    
  
  
Finished:
 - 
  
  
Runtime: 02:04:51
 
These timestamps are generated and stored using the UTC timezone on the TPP backend.
Job request
- Status
 - 
            Succeeded
 - Backend
 - TPP
 - Workspace
 - openprompt-cohort-profile
 - Requested by
 - Alasdair Henderson
 - Branch
 - main
 - Force run dependencies
 - Yes
 - Git commit hash
 - b155383
 - Requested actions
 - 
            
- 
                  
run_all 
 - 
                  
 
Code comparison
Compare the code used in this job request