Skip to content

Job request: 16421

Organisation:
DataLab
Workspace:
winter-pressures
ID:
z4isns5hpnuckaqy

This page shows the technical details of what happened when authorised researcher Colm Andrews requested one or more actions to be run against real patient data in the project, within a secure environment.

By cross-referencing the indicated Requested Actions with the Pipeline section below, you can infer what security level various outputs were written to. Outputs marked as highly_sensitive can never be viewed directly by a researcher; they can only request that code runs against them. Outputs marked as moderately_sensitive can be viewed by an approved researcher by logging into a highly secure environment. Only outputs marked as moderately_sensitive can be requested for release to the public, via a controlled output review service.

Jobs

Pipeline

Show project.yaml
version: "3.0"

expectations:
  population_size: 5000

actions:

  monthly_practice_listsize_2018:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_listsize_monthly
        --index-date-range '2018-06-01 to 2019-05-31 by month' --output-dir=output/listsize
        --output-format=feather
    outputs:
      highly_sensitive:
        extract: output/listsize/input_listsize_monthl*.feather

  monthly_practice_listsize_2019:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_listsize_monthly
        --index-date-range '2019-06-01 to 2020-05-31 by month' --output-dir=output/listsize
        --output-format=feather
    outputs:
      highly_sensitive:
        extract: output/listsize/input_listsize_month*.feather

  monthly_practice_listsize_2020:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_listsize_monthly
        --index-date-range '2020-06-01 to 2021-05-31 by month' --output-dir=output/listsize
        --output-format=feather
    outputs:
      highly_sensitive:
        extract: output/listsize/input_listsize_mont*.feather

  monthly_practice_listsize_2021:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_listsize_monthly
        --index-date-range '2021-06-01 to 2022-12-31 by month' --output-dir=output/listsize
        --output-format=feather
    outputs:
      highly_sensitive:
        extract: output/listsize/input_listsize_mon*.feather

  monthly_practice_listsize_measures:
    run: cohortextractor:latest generate_measures --study-definition study_definition_listsize_monthly --output-dir=output/listsize
    needs:
    - monthly_practice_listsize_2018
    - monthly_practice_listsize_2019
    - monthly_practice_listsize_2020
    - monthly_practice_listsize_2021
    outputs:
      moderately_sensitive:
        measure_csv: output/listsize/measure_*.csv


  # Other data
  # ----------
  # Add actions for other data to this section. Prefix them with a suitable name; place
  # scripts in a similarly named sub-directory of the analysis directory; write outputs
  # to a similarly named sub-directory of the output directory.
  #
  # For example, let's call our other data "metrics". We would prefix our actions
  # "metrics_"; we would place our scripts in analysis/metrics; we would write outputs
  # to output/metrics.

  metrics_monthly_sro_data:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_sro_monthly
        --index-date-range '2018-06-01 to 2022-12-31 by month' --output-dir=output/metrics/monthly
        --output-format=feather
    outputs:
      highly_sensitive:
        extract: output/metrics/monthly/input_sro_monthly*.feather

  metrics_monthly_sro_data_measures:
    run: cohortextractor:latest generate_measures --study-definition study_definition_sro_monthly --output-dir=output/metrics/monthly
    needs:
    - metrics_monthly_sro_data
    outputs:
      moderately_sensitive:
        measure_csv: output/metrics/monthly/measure_*.csv

  metrics_monthly_kids_data:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_kids_monthly
        --index-date-range '2018-06-01 to 2022-12-31 by month' --output-dir=output/metrics/monthly
        --output-format=feather
    outputs:
      highly_sensitive:
        extract: output/metrics/monthly/input_kids_monthly*.feather

  metrics_monthly_kids_data_measures:
    run: cohortextractor:latest generate_measures --study-definition study_definition_kids_monthly --output-dir=output/metrics/monthly
    needs:
    - metrics_monthly_kids_data
    outputs:
      moderately_sensitive:
        measure_csv: output/metrics/monthly/measure*.csv

  metrics_generate_deciles_charts:
    run: >
      deciles-charts:v0.0.33
        --input-files output/metrics/monthly/measure_*.csv
        --output-dir output/metrics
    config:
      show_outer_percentiles: true
    needs:
      - metrics_monthly_sro_data_measures
      - metrics_monthly_kids_data_measures
    outputs:
      moderately_sensitive:
        deciles_charts: output/metrics/deciles_chart_*.png
        deciles_tables: output/metrics/deciles_table_*.csv

  metrics_generate_deciles_charts_alternative:
    run: >
      python:latest
        python
        -m analysis.deciles_alternative
        --input-files output/metrics/monthly/measure_*.csv
        --output-dir output/metrics/shaded
    config:
      show_outer_percentiles: true
    needs: 
      - metrics_monthly_sro_data_measures
      - metrics_monthly_kids_data_measures
    outputs:
      moderately_sensitive:
        deciles_charts: output/metrics/shaded/deciles_chart*.png
        deciles_tables: output/metrics/shaded/deciles_table*.csv

  # Metrics data extraction
  # ------------
  metrics_generate_sro_dataset_winter:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_sro
      --param start_date='2021-12-01' --param end_date='2022-03-30' --output-dir=output/metrics
      --output-format=feather --output-file output/metrics/input_sro_2021-12-01.feather
    outputs:
      highly_sensitive:
        extract: output/metrics/input_sro_2021-12-01.feather

  metrics_generate_sro_dataset_summer:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_sro
      --param start_date='2021-06-01' --param end_date='2021-09-30' --output-dir=output/metrics
      --output-format=feather --output-file output/metrics/input_sro_2021-06-01.feather
    outputs:
      highly_sensitive:
        extract: output/metrics/input_sro_2021-06-01.feather

  metrics_generate_kids_dataset_winter:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_kids
      --param start_date='2021-12-01' --param end_date='2022-03-30' --output-dir=output/metrics
      --output-format=feather --output-file output/metrics/input_kids_2021-12-01.feather
    outputs:
      highly_sensitive:
        extract: output/metrics/input_kids_2021-12-01.feather

  metrics_generate_kids_dataset_summer:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_kids
      --param start_date='2021-06-01' --param end_date='2021-09-30' --output-dir=output/metrics
      --output-format=feather --output-file output/metrics/input_kids_2021-06-01.feather
    outputs:
      highly_sensitive:
        extract: output/metrics/input_kids_2021-06-01.feather

  metrics_generate_endpopulation_dataset_winter:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_endpopulation
      --param start_date='2021-12-01' --param end_date='2022-03-30' --output-dir=output/metrics
      --output-format=feather --output-file output/metrics/input_endpopulation_2021-12-01.feather
    outputs:
      highly_sensitive:
        extract: output/metrics/input_endpopulation_2021-12-01.feather

  metrics_generate_endpopulation_dataset_summer:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_endpopulation
      --param start_date='2021-06-01' --param end_date='2021-09-30' --output-dir=output/metrics
      --output-format=feather --output-file output/metrics/input_endpopulation_2021-06-01.feather
    outputs:
      highly_sensitive:
        extract: output/metrics/input_endpopulation_2021-06-01.feather

  metrics_generate_sro_measures:
    run: cohortextractor:latest generate_measures --param start_date='2021-06-01'
      --param end_date='2021-09-30' --study-definition study_definition_sro --output-dir=output/metrics
    needs:
    - metrics_generate_sro_dataset_summer
    - metrics_generate_sro_dataset_winter
    outputs:
      moderately_sensitive:
        measure_csv: output/metrics/mea*.csv

  metrics_generate_kids_measures:
    run: cohortextractor:latest generate_measures --param start_date='2021-06-01'
      --param end_date='2021-09-30' --study-definition study_definition_kids --output-dir=output/metrics
    needs:
    - metrics_generate_kids_dataset_summer
    - metrics_generate_kids_dataset_winter
    outputs:
      moderately_sensitive:
        measure_csv: output/metrics/me*.csv

  metrics_generate_endpopulation_measures:
    run: cohortextractor:latest generate_measures --param start_date='2021-06-01'
      --param end_date='2021-09-30' --study-definition study_definition_endpopulation
      --output-dir=output/metrics
    needs:
    - metrics_generate_endpopulation_dataset_summer
    - metrics_generate_endpopulation_dataset_winter
    outputs:
      moderately_sensitive:
        measure_csv: output/metrics/m*.csv

  identify_practices_to_remove:
    run: r:latest analysis/metrics/identify_practices_to_remove.R
    needs:
    - metrics_generate_sro_measures
    - metrics_generate_kids_measures
    - metrics_generate_endpopulation_measures
    outputs:
      highly_sensitive:
        csv: output/metrics/practices_to_remove.csv

  metrics_create_seasonal_data:
    run: r:latest analysis/metrics/create_seasonal_data.R
    needs:
    - metrics_generate_sro_measures
    - metrics_generate_kids_measures
    outputs:
      highly_sensitive:
        csv: output/metrics/season_data_*.csv

  metrics_create_seasonal_measures:
    run: r:latest analysis/metrics/seasonal_measures.R
    needs:
    - metrics_create_seasonal_data
    - appointments_generate_seasonal_summaries_by_booked_month
    - appointments_generate_seasonal_summaries_by_start_month
    - identify_practices_to_remove
    outputs:
      moderately_sensitive:
        raw_csv: output/*/*/summer_winter_all_metrics.csv
        plot_csv: output/*/*/summer_winter_*_histogram_data.csv
        plot_png: output/*/*/summer_winter_*_histogram.png
        plot_csv_redacted: output/*/*/redacted/summer_winter_*_histogram_data_redacted.csv
        plot_png_redacted: output/*/*/redacted/summer_winter_*_histogram_redacted.png
        
  # Epi  
  # -----------------
  epi_generate_winter_outcomes:
    run: cohortextractor:latest generate_cohort --study-definition study_definition_epi
        --param start_date='2021-12-01' --param end_date='2022-03-30' --output-dir=output/epi/
        --output-format=feather
    outputs:
      highly_sensitive:
        extract: output/epi/*.feather

  epi_model_irr:
    run: r:latest analysis/epi/model_irr.R
    needs: 
      - epi_generate_winter_outcomes
      - classify_practices_by_decile
    outputs:
      moderately_sensitive:
        data: output/epi/irr_data.csv
  
  epi_plot_irr:
    run: r:latest analysis/epi/plot_irr.R
    needs: 
      - epi_model_irr
    outputs:
      moderately_sensitive:
        single_plots: output/epi/plots/*.png
        combined_plots: output/epi/plots/combined/*.png


  # Appointments data
  # -----------------
  appointments_generate_dataset_sql:
    run: >
      sqlrunner:latest
        analysis/appointments/dataset_query.sql
        --output output/appointments/dataset_long.csv.gz
        --dummy-data-file analysis/appointments/dummy_dataset_long.csv.gz
    outputs:
      highly_sensitive:
        dataset: output/appointments/dataset_long.csv.gz
        

  appointments_generate_monthly_measures_by_booked_month:
    run: >
      python:latest
        python
        -m analysis.appointments.generate_appointment_monthly_measures
        --value-thresholds 0 2
        --index-cols booked_month practice
    needs: [appointments_generate_dataset_sql]
    outputs:
      highly_sensitive:
        monthly_measure: output/appointments/measure_monthly_*_by_booked_month.csv


  appointments_generate_monthly_measures_by_start_month:
    run: >
      python:latest
        python
        -m analysis.appointments.generate_appointment_monthly_measures
        --value-thresholds 0 2
        --index-cols start_month practice
    needs: [appointments_generate_dataset_sql]
    outputs:
      highly_sensitive:
        monthly_measure: output/appointments/measure_monthly_*_by_start_month.csv

  calculate_total_num_appointments_monthly:
    run: r:latest analysis/calculate_monthly_counts.R
    needs:
      - appointments_generate_monthly_measures_by_booked_month
      - appointments_generate_monthly_measures_by_start_month
    outputs:
      moderately_sensitive:
        overall_csvs: output/appointments/measure_monthly_overall_*.csv
  
  appointments_generate_seasonal_summaries_by_start_month:
    run: >
      python:latest
        python
        -m analysis.appointments.generate_appointment_seasonal_summaries
        --value-thresholds 0 2
        --start-date 2021-06-01
        --end-date 2022-03-31
        --index-cols start_month practice
    needs: [appointments_generate_dataset_sql]
    outputs:
      highly_sensitive:
        monthly_measure: output/appointments/measure_seasonal_*_by_start_month.csv


  appointments_generate_seasonal_summaries_by_booked_month:
    run: >
      python:latest
        python
        -m analysis.appointments.generate_appointment_seasonal_summaries
        --value-thresholds 0 2
        --start-date 2021-06-01
        --end-date 2022-03-31
        --index-cols booked_month practice
    needs: [appointments_generate_dataset_sql]
    outputs:
      highly_sensitive:
        monthly_measure: output/appointments/measure_seasonal_*_by_booked_month.csv

  ### Some of the measures need to be normalised by list size, specifically
  ### anything that starts with 'measure_monthly_num'
  appointments_generate_normalised_counts:
    run: r:latest analysis/normalise_counts.R
    needs:
      - appointments_generate_monthly_measures_by_booked_month
      - appointments_generate_monthly_measures_by_start_month
      - monthly_practice_listsize_measures
    outputs:
      highly_sensitive:
        normalised_csvs: output/appointments/measure_monthly_normalised_*.csv
      moderately_sensitive:
        check_csvs: output/check/*CHECK.csv

  appointments_generate_deciles_charts:
    run: >
      deciles-charts:v0.0.33
        --input-files output/appointments/measure_monthly_*.csv
        --output-dir output/appointments
    config:
      show_outer_percentiles: true
    needs:
      - appointments_generate_monthly_measures_by_booked_month
      - appointments_generate_monthly_measures_by_start_month
      - appointments_generate_normalised_counts
    outputs:
      moderately_sensitive:
        deciles_charts: output/appointments/deciles_chart_*.png
        deciles_tables: output/appointments/deciles_table_*.csv


  appointments_generate_deciles_charts_alternative:
    run: >
      python:latest
        python
        -m analysis.deciles_alternative
        --input-files output/appointments/measure_monthly_*.csv
        --output-dir output/appointments/shaded
    config:
      show_outer_percentiles: true
    needs: 
      - appointments_generate_monthly_measures_by_booked_month
      - appointments_generate_monthly_measures_by_start_month
      - appointments_generate_normalised_counts
    outputs:
      moderately_sensitive:
        deciles_charts: output/appointments/shaded/deciles_chart*.png
        deciles_tables: output/appointments/shaded/deciles_table*.csv


  # Combining data
  # -----------------
  combine_seasonal_data:
    run: r:latest analysis/combine_seasonal_data.R
    needs:
    - metrics_create_seasonal_measures
    - appointments_generate_seasonal_summaries_by_booked_month
    - appointments_generate_seasonal_summaries_by_start_month
    outputs:
      moderately_sensitive:
        csv: output/combined/combined_seasonal_data.csv

  classify_practices_by_decile:
      run: r:latest analysis/classify_practices_by_decile.R
      needs:
      - combine_seasonal_data
      outputs:
        moderately_sensitive:
          csv: output/combined/combined_seasonal_data_with_deciles.csv

  summarise_seasonal_comparisons:
      run: r:latest analysis/summarise_seasonal_comparisons.R
      needs:
      - combine_seasonal_data
      outputs:
        moderately_sensitive:
          csv: output/combined/seasonal_summaries.csv
          redacted_csv: output/combined/seasonal_summaries_nondisclosive.csv



  # Tests
  # -----------------
  run_python_tests:
      run: python:latest python -m pytest --junit-xml=output/pytest.xml --verbose
      outputs:
        moderately_sensitive:
          log: output/pytest.xml

  run_r_tests:
    run: r:latest tests/testthat/run-all.R
    outputs:
      moderately_sensitive:
        log: output/tests/run-all.log

  # Presentations and documents
  # ---------------------------
  generate_preliminary_report:
    run: r:latest -e 'rmarkdown::render("reports/preliminary_report.rmd", output_dir = "/workspace/output/docs", knit_root_dir = "/workspace",)'
    needs: 
      - metrics_create_seasonal_measures
      # - metrics_generate_deciles_charts
      - metrics_generate_deciles_charts_alternative
      # - appointments_generate_deciles_charts
      - appointments_generate_deciles_charts_alternative
      - appointments_generate_seasonal_summaries_by_booked_month
      - appointments_generate_seasonal_summaries_by_start_month
      - summarise_seasonal_comparisons
      - identify_practices_to_remove
      - epi_plot_irr
    outputs:
      moderately_sensitive:
        report: output/docs/preliminary_report.html

  create_release:
    run: r:latest analysis/create_release.R
    needs:
      - generate_preliminary_report
      # - appointments_generate_deciles_charts
      - appointments_generate_deciles_charts_alternative
      # - metrics_generate_deciles_charts
      - metrics_generate_deciles_charts_alternative
      - metrics_create_seasonal_measures
      - summarise_seasonal_comparisons
      - epi_plot_irr
    outputs:
      moderately_sensitive:
        html: output/release/*.html
        txt: output/meta-release/*.txt
        plot_data: output/release/*/*.csv
        plot_png: output/release/*/*.png

Timeline

  • Created:

  • Started:

  • Finished:

  • Runtime: 11:55:32

These timestamps are generated and stored using the UTC timezone on the TPP backend.

Job information

Status
Succeeded
Backend
TPP
Workspace
winter-pressures
Requested by
Colm Andrews
Branch
main
Force run dependencies
No
Git commit hash
6ad46cb
Requested actions
  • epi_generate_winter_outcomes
  • epi_model_irr
  • epi_plot_irr
  • generate_preliminary_report
  • create_release