Training activity: 2 (S-BG-S1-1) — Scientist Training Programme | Curriculum Library

Details

Generate a genomic sequence alignment in an appropriate format

Type

Entrustable training activity (ETA)

Evidence requirements

Evidence the activity has been undertaken by the trainee repeatedly, consistently, and effectively over time, in a range of situations. This may include occasions where the trainee has not successfully achieved the outcome of the activity themselves. For example, because it was not appropriate to undertake the task in the circumstances or the trainees recognised their own limitations and sought help or advice to ensure the activity reached an appropriate conclusion.

Reflection at multiple timepoints on the trainee learning journey for this activity.

Considerations

Alignment methodologies e.g., against a reference sequence, indel realignment or de novo
SAM/BAM/CRAM files
Genomic architecture e.g., genes, introns, exons etc.
Reference genome and different genome builds
Quality metrics used for sequence alignment and any caveats of e.g., difficulties with short reads, pseudogenes, soft/hard-clipping, CIGAR strings etc.
Store data in appropriate formats according to local and national standards, taking into consideration patient confidentiality
Data integrity and patient safety

Reflective practice guidance

The guidance below is provided to support reflection at different time points, providing you with questions to aid you to reflect for this training activity. They are provided for guidance and should not be considered as a mandatory checklist. Trainees should not be expected to provide answers to each of the guidance questions listed.

Before action

What is the objective of creating a genomic sequence alignment in an appropriate format (e.g., BAM)? What specific alignment software or pipeline, reference genome version, and criteria for a ‘successful’ alignment (e.g., mapping rate) are expected?
What do you already know about performing sequence alignment and the concept of mapping reads to a reference genome? What possible challenges might you face e.g. computational resource requirements for large datasets, choosing appropriate alignment parameters, or dealing with complex genomic regions? How might you handle these challenges? When would you need to ask for help if you encounter persistent alignment issues or error messages? How do you feel about generating sequence alignments
What specific skills do you want to develop, such as improving your proficiency in a particular alignment tool or understanding different alignment algorithms? What specific insights do you hope to gain about the impact of alignment parameters on resulting data, or the nuances of aligning short vs. long reads?
If previous alignment attempts were problematic, what caused the issues and how did you plan to address them? What important information do you need to consider before embarking on the activity e.g. ensuring access to the correct reference genome and understanding sample-specific requirements (e.g., paired-end reads, library type), and reviewing best practice guidelines for alignment?

In action

As you are generating the sequence alignment, make a note of anything that feels surprising or different from what you anticipate. For example, does the alignment process take significantly longer than expected, does the software produce unexpected error messages, or do you find that the mapping rate is unusually low for the given data? Consider how this experience compares with previous experiences of similar activities, such as other sequence alignment tasks or general bioinformatics processing. Does it feel more or less familiar, or are there new computational or data challenges you have not anticipated?

Identify how any unexpected developments, such as a slow alignment or persistent errors, impact your immediate actions. Do you immediately initiate a detailed debugging session, consult specific API documentation for the problematic library, or search for relevant solutions on developer forums? Do you adapt or change your alignment approach or strategy as a result? For instance, do you refactor a section of the code, decide to use a different alignment algorithm, or adjust your data handling for the input files? Do you find it difficult to adapt your strategy when faced with a persistent problem? Does it affect your confidence in your ability to resolve the issue independently? Do you feel positive you can reach a successful conclusion?

Do you recognise when you might need to seek immediate advice or help, such as when a performance issue requires specialist knowledge of computational optimisation or a bug points to a fundamental architectural flaw you cannot resolve? Identify what you learn as a result of the unexpected development. For instance, do you learn a new debugging technique, a specific workaround for a known software issue, or a more efficient parameter setting for a particular alignment scenario?

On action

Summarise the key steps involved in generating the genomic sequence alignment. Describe the tools or software you used and the input and output formats, noting if the output format was appropriate. Were there any specific events, actions, or interactions that felt important during the alignment process?

What specific learning can you take from generating the alignment? For example, what strengths did you demonstrate in using alignment tools or understanding parameters? What skills or knowledge gaps were evident regarding different alignment strategies or handling large files in formats like BAM/SAM? How did this experience compare against previous times you have generated sequence alignments? Were any previous actions for development in this area achieved? Do you feel your practice has improved? Identify any challenges you experienced, such as computational issues, understanding specific output flags, or dealing with complex regions, and how you reacted to them. Did these challenges affect your ability to deal with the situation? Were you able to overcome them? Was there anything significant about this activity, such as needing to seek advice or clarification on tool parameters or format interpretation?

Identify the specific actions or ‘next steps’ you will take based on this experience to support your learning. What will you do differently next time you generate a genomic sequence alignment? Has anything changed in terms of what you would do if faced with a similar situation again? Do you need to practice using specific alignment tools, understanding format complexities, or troubleshooting common issues further?

Beyond action

Reflect on the various instances where you have generated genomic sequence alignments. Have you reviewed your reflections from those previous attempts, particularly concerning challenging alignments? What actions did you previously identify to improve your alignment strategies or handle specific issues (e.g., dealing with indels, repetitive regions)? Have you worked on implementing those changes in subsequent alignment tasks? Do you feel confident applying refined alignment techniques based on the lessons learned from multiple previous experiences? Has discussing alignment strategies or difficult cases with others transformed your approach to generating alignments?

How has performing sequence alignment repeatedly refined your skills and efficiency in this area over time? How does the knowledge gained from past alignment experiences prepare you for assessments such as viewing an alignment BAM file? Do your accumulated experiences help you recognise when an alignment task might be particularly complex or require input from others due to its nature or specific data characteristics, indicating it is beyond your current scope?

Relevant learning outcomes

#	Outcome
# 1	Outcome Explain the structure of the human genome and the impact of variation on human development, health, and disease.
# 2	Outcome Evaluate sources of information about variation in the human genome including access, application and clinical impact.
# 3	Outcome Select appropriate tools for next generation sequencing (NGS) analysis of inherited and acquired disease.
# 4	Outcome Analyse NGS data in a clinical setting applying appropriate quality control and data validation.

Diagnostic Sequencing —
Training activity: 2

Training activity information

Details

Type

Evidence requirements

Considerations

Reflective practice guidance

Before action

In action

On action

Beyond action

Relevant learning outcomes

Diagnostic Sequencing — Training activity: 2

Training activity information

Details

Type

Evidence requirements

Considerations

Reflective practice guidance

Before action

In action

On action

Beyond action

Relevant learning outcomes

Cookies on the NSHCS - Curriculum Library website

Diagnostic Sequencing —
Training activity: 2