Job request: 14690

Organisation:: University of Manchester
Workspace:: cc_rf
ID:: jjcljx66btc3thbm

This page shows the technical details of what happened when the authorised researcher Ya-Ting Yang requested one or more actions to be run against real patient data in the project, within a secure environment.

By cross-referencing the list of jobs with the pipeline section below, you can infer what security level various outputs were written to. Researchers can never directly view outputs marked as highly_sensitive ; they can only request that code runs against them. Outputs marked as moderately_sensitive can be viewed by an approved researcher by logging into a highly secure environment. Only outputs marked as moderately_sensitive can be requested for release to the public, via a controlled output review service.

Jobs

Action:

model_RandomForest

Status:

Status: Succeeded

Job identifier:

jzg5ouu4bguloqrt

Pipeline

Show project.yaml

version: '3.0'

expectations:
  population_size: 1000

actions:

# study cohort

  generate_study_population_covid_primarycare:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_covid_primarycare
    outputs:
      highly_sensitive:
        cohort: output/input_covid_primarycare.csv
  
  generate_study_population_covid_SGSS:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_covid_SGSS
    outputs:
      highly_sensitive:
        cohort: output/input_covid_SGSS.csv

  generate_study_population_covid_admission:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_covid_admission
    outputs:
      highly_sensitive:
        cohort: output/input_covid_admission.csv

  process_1: 
    run: r:latest analysis/process_1.R
    needs: [generate_study_population_covid_primarycare, generate_study_population_covid_SGSS,generate_study_population_covid_admission]
    outputs:
      highly_sensitive:
        case: output/case_covid_hosp.csv 
        control: output/control_covid_infection.csv 

# matching

  matching: #R MatchIt  matching with replacement
    run: r:latest -e 'rmarkdown::render("analysis/matching.Rmd", knit_root_dir = "/workspace", output_dir="/workspace/output")'
    needs: [process_1]
    outputs:
      moderately_sensitive:
        html: output/matching.html
      highly_sensitive: 
        rds1: output/matched_patients.rds
        rds2: output/unmatched_cases.rds
        csv: output/matched_patients_id.csv
        
  check_unmatched:
    run: r:latest -e 'rmarkdown::render("analysis/check_unmatched.Rmd", knit_root_dir = "/workspace", output_dir="/workspace/output")'
    needs: [matching]
    outputs:
      moderately_sensitive:
        html: output/check_unmatched.html

  extract_variables: # confounders
    run: cohortextractor:latest generate_cohort --study-definition study_definition_outcome --with-end-date-fix
    needs: [matching]
    outputs:
      highly_sensitive:
        cohort: output/input_outcome.csv

  process_Rmatching: #  confounders
    run: r:latest analysis/process_Rmatching.R
    needs: [extract_variables,matching]
    outputs:
      highly_sensitive:
        cohort1: output/matched_outcome.rds
        cohort2: output/matched_outcome_check.rds # filter died $ de-regist again
        rds1: output/abtype79.rds
        rds2: output/comor17.rds

# extract ab for RF
  extract_variables_ab_time:   # exposure variables
    run: cohortextractor:latest generate_cohort --study-definition study_definition_ab_time --with-end-date-fix
    needs: [matching]
    outputs:
      highly_sensitive:
        cohort: output/input_ab_time.csv

  process_ab_time: # exposures
    run: r:latest -e 'rmarkdown::render("analysis/process_ab_time.Rmd", knit_root_dir = "/workspace", output_dir="/workspace/output")'
    needs: [extract_variables_ab_time,process_Rmatching]
    outputs:
      moderately_sensitive:
        html: output/process_ab_time.html
      highly_sensitive: 
         rds: output/matched_ab.rds

  model_RF_process: # distinct patient, check variables
    run: r:latest -e 'rmarkdown::render("analysis/model_RF_process.Rmd", knit_root_dir = "/workspace", output_dir = "output")'
    needs: [process_ab_time,process_Rmatching]
    outputs:
      moderately_sensitive:
        html: output/model_RF_process.html
      highly_sensitive: 
        rds1: output/train_X.rds
        rds2: output/train_Y.rds
        rds3: output/valid_X.rds
        rds4: output/valid_Y.rds

  model_RF_training: #
    run: r:latest -e 'rmarkdown::render("analysis/model_RF_training.Rmd", knit_root_dir = "/workspace", output_dir = "output")'
    needs: [model_RF_process]
    outputs:
      moderately_sensitive:
        html: output/model_RF_training.html

  model_RandomForest: #
    run: r:latest -e 'rmarkdown::render("analysis/model_RandomForest.Rmd", knit_root_dir = "/workspace", output_dir = "output")'
    needs: [model_RF_process]
    outputs:
      moderately_sensitive:
        html: output/model_RandomForest.html

  check_ab_time:  
    run: r:latest -e 'rmarkdown::render("analysis/check_ab_time.Rmd", knit_root_dir = "/workspace", output_dir="/workspace/output")'
    needs: [process_ab_time]
    outputs:
      moderately_sensitive:
        html: output/check_ab_time.html
      # highly_sensitive: 
      #   rds: output/matched_patients_monthly_ab.rds

  check_RF_grid: 
    run: r:latest -e 'rmarkdown::render("analysis/check_RF_grid.Rmd", knit_root_dir = "/workspace", output_dir = "output")'
    needs: [process_ab_time]
    outputs:
      moderately_sensitive:
        html: output/check_RF_grid.html

  check_RF: 
    run: r:latest -e 'rmarkdown::render("analysis/check_RF.Rmd", knit_root_dir = "/workspace", output_dir = "output")'
    needs: [process_ab_time]
    outputs:
      moderately_sensitive:
        html: output/check_RF.html

  model_RF: 
    run: r:latest -e 'rmarkdown::render("analysis/model_RF.Rmd", knit_root_dir = "/workspace", output_dir = "output")'
    needs: [process_ab_time]
    outputs:
      moderately_sensitive:
        html: output/model_RF.html

  model_RF_process_subclass: # random sampling by subclass
    run: r:latest -e 'rmarkdown::render("analysis/model_RF_process_subclass.Rmd", knit_root_dir = "/workspace", output_dir = "output")'
    needs: [process_ab_time]
    outputs:
      moderately_sensitive:
        html: output/model_RF_process_subclass.html

  model_RF_process_check_sample: # check sample method
    run: r:latest -e 'rmarkdown::render("analysis/model_RF_process_check_sample.Rmd", knit_root_dir = "/workspace", output_dir = "output")'
    needs: [process_ab_time, process_Rmatching]
    outputs:
      moderately_sensitive:
        html: output/model_RF_process_check_sample.html

# check

  process_filter_ab: # filter ab users
    run: r:latest -e 'rmarkdown::render("analysis/process_filter_ab.Rmd", knit_root_dir = "/workspace", output_dir="/workspace/output")'
    needs: [process_Rmatching]
    outputs:
      moderately_sensitive:
        html: output/process_filter_ab.html
      highly_sensitive: 
        csv: output/matched_patients_id_ab.csv

  extract_variables_ab_yr1: 
    run: cohortextractor:latest generate_cohort --study-definition study_definition_ab_yr1 --with-end-date-fix
    needs: [process_filter_ab]
    outputs:
      highly_sensitive:
        cohort: output/input_ab_yr1.csv

  extract_variables_ab_yr2: 
    run: cohortextractor:latest generate_cohort --study-definition study_definition_ab_yr2 --with-end-date-fix
    needs: [process_filter_ab]
    outputs:
      highly_sensitive:
        cohort: output/input_ab_yr2.csv

  extract_variables_ab_yr3: 
    run: cohortextractor:latest generate_cohort --study-definition study_definition_ab_yr3 --with-end-date-fix
    needs: [process_filter_ab]
    outputs:
      highly_sensitive:
        cohort: output/input_ab_yr3.csv

  extract_variables_ab_yr3_15d: 
    run: cohortextractor:latest generate_cohort --study-definition study_definition_ab_yr3_15d --with-end-date-fix
    needs: [process_filter_ab]
    outputs:
      highly_sensitive:
        cohort: output/input_ab_yr3_15d.csv


  process_merge_ab: # merge 1-2-3 year ab 
    run: r:latest -e 'rmarkdown::render("analysis/process_merge_ab.Rmd", knit_root_dir = "/workspace", output_dir="/workspace/output")'
    needs: [process_Rmatching,extract_variables_ab_yr3_15d, extract_variables_ab_yr3,extract_variables_ab_yr2,extract_variables_ab_yr1]
    outputs:
      moderately_sensitive:
        html: output/process_merge_ab.html
      highly_sensitive: 
        rds: output/matched_patients_monthly_ab.rds

  check_ab_yr1:
    run: r:latest -e 'rmarkdown::render("analysis/check_ab_yr1.Rmd", knit_root_dir = "/workspace", output_dir="/workspace/output")'
    needs: [extract_variables_ab_yr1,matching,process_Rmatching]
    outputs:
      moderately_sensitive:
        html: output/check_ab_yr1.html

  check_ab_yr3:
    run: r:latest -e 'rmarkdown::render("analysis/check_ab_yr3.Rmd", knit_root_dir = "/workspace", output_dir="/workspace/output")'
    needs: [process_Rmatching]
    outputs:
      moderately_sensitive:
        html: output/check_ab_yr3.html
 
  check_abtype:
    run: r:latest -e 'rmarkdown::render("analysis/check_abtype.Rmd", knit_root_dir = "/workspace", output_dir="/workspace/output")'
    needs: [process_Rmatching]
    outputs:
      moderately_sensitive:
        html: output/check_abtype.html

  check_process_1: 
    run: r:latest -e 'rmarkdown::render("analysis/check_process_1.Rmd", knit_root_dir = "/workspace", output_dir = "output")'
    needs: [generate_study_population_covid_primarycare,generate_study_population_covid_SGSS,generate_study_population_covid_admission]
    outputs:
      moderately_sensitive:
        html: output/check_process_1.html

  # check_RF: 
  #   run: r:latest -e 'rmarkdown::render("analysis/check_RF.Rmd", knit_root_dir = "/workspace", output_dir = "output")'
  #   needs: [process_Rmatching]
  #   outputs:
  #     moderately_sensitive:
  #       html: output/check_RF.html
  
  # check_RF_grid: 
  #   run: r:latest -e 'rmarkdown::render("analysis/check_RF_grid.Rmd", knit_root_dir = "/workspace", output_dir = "output")'
  #   needs: [process_Rmatching]
  #   outputs:
  #     moderately_sensitive:
  #       html: output/check_RF_grid.html
  
  check_RF_yr1: 
    run: r:latest -e 'rmarkdown::render("analysis/check_RF_yr1.Rmd", knit_root_dir = "/workspace", output_dir = "output")'
    needs: [extract_variables_ab_yr1,matching,process_Rmatching]
    outputs:
      moderately_sensitive:
        html: output/check_RF_yr1.html

Timeline

Created: 1 year, 10 months ago 24 Jan 2023 20:46:29 UTC
Started: 1 year, 10 months ago 24 Jan 2023 20:47:27 UTC
Finished: 1 year, 10 months ago 25 Jan 2023 04:35:19 UTC
Runtime: 07:47:52

These timestamps are generated and stored using the UTC timezone on the TPP backend.

Job information

Status: Succeeded
Backend: TPP
Workspace: cc_rf
Requested by: Ya-Ting Yang
Branch: CC_ML
Force run dependencies: No
Git commit hash: be00445
Requested actions: model_RandomForest

Code comparison

Compare the code used in this Job Request