Module information

Module details

Title
Applied Statistics, Data Science and Quality in Clinical Bioinformatics
Type
Specialist
Module code
S-BG-S3
Credits
15
Phase
3
Requirement
Compulsory

Aim of this module

This module will develop trainees’ familiarity with data types encountered in genomic laboratories, including: how to hold and statistically evaluate data, use programmatic methods to examine, interrogate and draw conclusions from data and how to communicate the insight derived from the data to support clinical decision making.

A large proportion of this module is well suited for a major project that touches many of the competencies detailed. An example project is developing and validating a bioinformatics tool or pipeline and then bringing it into service.

Work-based content

Training activities

# Learning outcome Training activity Type Action
# 1 Learning outcome 1 Training activities

Retrieve data from a REST application programming interfaces (API) and manipulate to create dataframes in both python 3 (pandas) and R

Type DTA Action View
# 2 Learning outcome 2, 3 Training activities

Hold genomics data in dataframes and perform statistical analyses using both python 3 (pandas) and R

Type DTA Action View
# 3 Learning outcome 1 Training activities

Design and describe a relational data model for genomics

Type DTA Action View
# 4 Learning outcome 1 Training activities

Collect a complex dataset and store in an open-source relational database management system suitable for further genomic analysis

Type DTA Action View
# 5 Learning outcome 2, 3 Training activities

Analyse the variation in a genomics dataset by deriving summary statistics programmatically, and justify the choice of summary statistics

Type DTA Action View
# 6 Learning outcome 3, 4 Training activities

Visualise the variation for multiple aspects of a genomics dataset programmatically using multiple plots

Type DTA Action View
# 7 Learning outcome 5 Training activities

Review and critique an existing quality control process for a diagnostic assay

Type DTA Action View
# 8 Learning outcome 4, 5 Training activities

Determine an appropriate threshold for a quality control metric for genomic data, and explain the quality control metric and threshold to a laboratory colleague who is not a bioinformatician

Type DTA Action View
# 9 Learning outcome 3, 4, 5 Training activities

Plan an implementation for an improvement to an existing quality control process for a diagnostic assay

Type DTA Action View
# 10 Learning outcome 5 Training activities

Review tests which have failed next generation sequencing (NGS) quality control metric thresholds, and identify the reason for failure and downstream consequences

Type ETA Action View
# 11 Learning outcome 3 Training activities

Describe the differences between two genomics datasets by applying statistical methods including several of the following:

  • Pearson, Spearman or distance correlation coefficients
  • Significance by t test or Chi squared test
  • Linear regression analysis
  • Confidence intervals
  • p-values and effect sizes
Type DTA Action View
# 12 Learning outcome 4 Training activities

Summarise and present the results of a statistical data analysis of genomic data to laboratory colleagues, using appropriate visualisation of the data to support your explanation

Type DTA Action View
# 13 Learning outcome 3, 4 Training activities

Select one widely used and one as-of-yet unestablished metric used for variant interpretation and explain to Clinical Scientists in Genomics how they are calculated, how to apply them appropriately for clinical use and their limitations

Type DTA Action View
# 14 Learning outcome 4, 6 Training activities

Present the opportunities and challenges in applying machine learning for genomics

Type DTA Action View
# 15 Learning outcome 4, 7 Training activities

Complete or revise a data protection impact assessment for a data analysis process and make recommendations for action where required

Type DTA Action View

Assessments

Complete 3 Case-Based Discussions

Complete 3 DOPS or OCEs

Direct Observation of Practical Skills Titles

  • Select an appropriate statistical test and justify its use to answer a specific question of data.
  • Generate a quality control report for an assay.
  • Plot the results of a statistical data analysis.

Observed Clinical Event Titles

  • Present the results of a statistical data analysis to clinicians, clinical scientists (non-bioinformatician) or technologists.
  • Demonstrate the use of a pathogenicity prediction algorithm to clinicians, clinical scientists (non-bioinformatician) or technologists.
  • Explain a quality control report to a non-bioinformatician.

Learning outcomes

# Learning outcome
1

Arrange and store data for programmatic analysis.

2

Perform programmatic data analysis.

3

Apply statistical methods to derive meaningful conclusions from data to support clinical decision making.

4

Summarise results of data analysis to stakeholders.

5

Appraise laboratory quality control systems.

6

Evaluate the potential of emerging methods in data science and the application to Clinical Bioinformatics Genomics.

7

Practice in accordance with data protection legislation.

Clinical experiences

Clinical experiences help you to develop insight into your practice and a greater understanding of your specialty's impact on patient care. Clinical experiences should be included in your training plan and you may be asked to help organise your experiences. Reflections and observations from your experiences may help you to advance your practice and can be used to develop evidence to demonstrate your awareness and appreciation of your specialty.

Activities

  1. Attend an information governance meeting to understand the application of information governance guidance in a clinical setting.
  2. Observe how data is entered into a hospital system, such as a patient administration system or electronic health record system and appreciate manual and automated aspects.
  3. Observe the review of a QC report by laboratory staff to gain insight into how quality is maintained and the process for failing samples.
  4. Appreciate emerging international bioinformatics standards and the impact of their adoption in the NHS.
  5. Attend a Genetic Counselling appointment where risk and statistics are being explained to a patient.

Academic content (MSc in Clinical Science)

Important information

The academic parts of this module will be detailed and communicated to you by your university. Please contact them if you have questions regarding this module and its assessments. The module titles in your MSc may not be exactly identical to the work-based modules shown in the e-portfolio. Your modules will be aligned, however, to ensure that your academic and work-based learning are complimentary.

Learning outcomes

On successful completion of this module the trainee will be able to:

  1. Demonstrate the application of SQL, R and a high-level programming language to perform data analyses.
  2. Apply integrative knowledge of fundamental statistical concepts.
  3. Critically evaluate and select appropriate statistical tests for genomic datasets.
  4. Design, build, populate and query genomics databases.
  5. Critically evaluate, select and apply effective data visualisation methods suitable for genomics datasets.

Indicative content

Databases

  • Designing and using relational databases
  • Common RDMS, including: MySQL/MariaDB and PostGres
  • Structured query language (SQL) commands
  • Database programmatic access

Data analysis

  • Python3 and data analysis packages such as numpy and pandas
  • R for data analysis, R Studio and tidyverse

Statistics

  • Common statistical concepts in genomics and bioinformatics
  • Normal distribution, standard deviation and standard error of the mean
  • Sample size and power calculations
  • Odd ratios and effect sizes
  • Linear and logistic regression
  • Correct selection of statistical tests

Data visualisation

  • Plotting data, ggplot and matplotlib

Machine learning

  • Machine learning principles
  • Critical evaluation of machine learning applications

Module assigned to

Specialties

Specialty code Specialty title Action
Specialty code SBI1-1-22 Specialty title Clinical Bioinformatics Genomics [2022] Action View
Specialty code SBI1-1-23 Specialty title Clinical Bioinformatics Genomics [2023] Action View
Specialty code SBI1-1-24 Specialty title Clinical Bioinformatics Genomics [2024] Action View