Training activity information

Details

Analyse the variation in a genomics dataset by deriving summary statistics programmatically, and justify the choice of summary statistics

Type

Developmental training activity (DTA)

Evidence requirements

Evidence the activity has been undertaken by the trainee​.

Reflection on the activity at one or more time points after the event including learning from the activity and/or areas of the trainees practice for development.

An action plan to implement learning and/or to address skills or knowledge gaps identified.

Considerations

  • For example:
    • Mean
    • Median
    • Quartiles
    • Determine if the data are normally distributed
    • Cumulative distribution analysis
    • Standard deviation
    • Identify outliers
    • Correlation

Reflective practice guidance

The guidance below is provided to support reflection at different time points, providing you with questions to aid you to reflect for this training activity. They are provided for guidance and should not be considered as a mandatory checklist. Trainees should not be expected to provide answers to each of the guidance questions listed.

Before action

  • What statistical knowledge is needed to select and interpret summary statistics? What programming skills in Python or R are required?
  • How will you improve your ability to programmatically derive and interpret summary statistics for genomics data? How will you learn to justify the selection of specific statistical measures? What is your current understanding of different types of genomic variation and relevant summary statistics?
  • Will you review different types of summary statistics and their relevance to genomic variation? Will you practice using Python or R to calculate these statistics? Have you discussed the dataset and appropriate analyses with your training officer? What challenges do you foresee in justifying your choice of statistics? How do you feel about applying statistical concepts programmatically?

In action

  • What programming language and libraries are you using to derive the summary statistics? Which specific summary statistics are you calculating and why did you choose these? How are you handling missing data or outliers?
  • Are you able to successfully calculate the chosen summary statistics? Do the results make sense in the context of the genomics data? Are you documenting your code and the rationale behind your choices?
  • If the initial summary statistics don’t reveal the variation you expected, are you considering calculating other statistics? Are you visualising the data to gain a better understanding of its distribution?

On action

  • Describe the genomics dataset you analysed. What summary statistics did you derive programmatically?
  • What programming skills did you use or develop to derive the summary statistics? How did you approach the task of choosing appropriate summary statistics for the dataset? What factors did you consider? Were there any unexpected patterns or issues revealed by the summary statistics? What did you learn from these? How did your understanding of the data (‘reflect-in-action’) influence the choice of summary statistics? How does the ability to derive and justify summary statistics relate to your future role in interpreting genomic data?
  • What areas of programmatic summary statistic generation do you want to explore further? How can you apply your understanding of different summary statistics to various types of genomic data? What specific actions will you take to improve your skills in choosing and deriving summary statistics? What resources or support would help you to enhance your skills in this area?

Beyond action

  • Have you had to analyse variation in other genomics datasets since this training activity? How did your ability to derive summary statistics programmatically prove useful?
  • Can you recall instances where you had to justify your choice of summary statistics to colleagues? How confident were you in your reasoning based on this experience?
  • How will your understanding of summary statistics and their appropriate application contribute to your ability to interpret and explain genomic data in the future?

Relevant learning outcomes

# Outcome
# 2 Outcome

Perform programmatic data analysis.

# 3 Outcome

Apply statistical methods to derive meaningful conclusions from data to support clinical decision making.