Training activity information
Details
Prepare data for AI use
Type
Developmental training activity (DTA)
Evidence requirements
Evidence the activity has been undertaken by the trainee​.
Reflection on the activity at one or more time points after the event including learning from the activity and/or areas of the trainees practice for development.
An action plan to implement learning and/or to address skills or knowledge gaps identified.
Considerations
- Identify features
- Feature normalisation
- Test and train split
Reflective practice guidance
The guidance below is provided to support reflection at different time points, providing you with questions to aid you to reflect for this training activity. They are provided for guidance and should not be considered as a mandatory checklist. Trainees should not be expected to provide answers to each of the guidance questions listed.
Before action
- Consider the typical data requirements for training and deploying AI models.
- How might decisions in data preparation influence the safety and effectiveness of the AI solution?
- How might reproducibility of data preparation be important where different individuals or organisations are wanting to use the same model?
- What types of data might be relevant for AI applications in healthcare? What steps are typically involved in preparing data for machine learning (e.g., cleaning, transformation, feature engineering)?
- What specific insights do you hope to gain about the practical challenges and best practices in preparing healthcare data for AI?
- How will this activity develop your skills in data handling, preprocessing, and feature engineering for machine learning?
- What is your current understanding of the data preparation pipeline for AI?
- Discuss with your training officer the specific dataset you will be working with and the intended AI application.
- Research common data pre-processing techniques for different data types (e.g., numerical, categorical, textual).
- Consider potential issues such as missing values, outliers, and data imbalances, and how to address them.
- How do you feel about working with data and performing the necessary steps to make it suitable for AI algorithms?
In action
- How are you approaching the task of preparing data for AI use? What initial steps are you taking to understand the data and the requirements for AI models?
- What decisions are you making about data cleaning, transformation, and feature engineering techniques to apply?
- Which aspects of data preparation (e.g., handling missing values, scaling, encoding categorical variables) feel more familiar, and which require more conscious effort and learning?
- How effectively are you transforming the raw data into a format suitable for AI algorithms?
- What challenges are you encountering in dealing with data quality issues or determining the most appropriate pre-processing steps?
- What insights are you gaining about the importance of data preparation in the overall AI workflow as you proceed?
- How does this data preparation activity connect with your existing knowledge of data analysis and machine learning pipelines?
- If your initial data preparation steps are not yielding satisfactory results or are causing issues with subsequent AI tasks, what alternative techniques could you consider?
- Do you need to consult documentation or seek advice on best practices for preparing specific types of data for AI?
- Are you ensuring that your data preparation methods are appropriate for the intended AI use case and within your current technical capabilities?
On action
- Describe the data you prepared for AI use and the key steps involved in the preparation process. What data cleaning, transformation, or feature engineering techniques did you apply? What challenges did you encounter in preparing the data?
- How reproducible are your preparation steps? How would you make them more reproducible? How would you translate your steps into a specification that another individual or organisation could use?
- What new knowledge or skills did you gain regarding the importance and techniques of data preparation for AI? Were there any unexpected issues with the data that you had to address? What did you learn from these? How did this activity enhance your understanding of the critical role of data quality in AI applications? How will your experience in data preparation inform your future work with AI and machine learning?
- What specific data preparation techniques do you want to improve or learn more about? How will you apply your data preparation skills in future AI-related projects? What actions will you take to enhance your proficiency in preparing data for machine learning and AI? What support or resources might you need to further develop your data preparation skills?
Beyond action
- Have you reviewed the data preparation steps you undertook? Have you worked with other datasets for AI since completing this training activity? How did your previous experience inform your approach?
- Has your understanding of data preparation challenges and best practices influenced your approach to data management in other contexts? Has this training activity given you a greater appreciation for the importance of data quality and governance in AI applications?
- What transferable skills (e.g., data cleaning, feature engineering, understanding data requirements for AI) did you develop that will be valuable in future AI projects? What specific aspects of data preparation for AI (e.g., handling missing data, dealing with bias) might you want to learn more about?
Relevant learning outcomes
| # | Outcome |
|---|---|
| # 3 |
Outcome
Plan the implementation of AI and machine learning solutions. |