Training activity information
Details
Clean and prepare a healthcare dataset for an AI study and make recommendations for appropriate AI algorithms by undertaking Exploratory Data Analysis (EDA)
Type
Developmental training activity (DTA)
Evidence requirements
Evidence the activity has been undertaken by the trainee.
Reflection on the activity at one or more time points after the event including learning from the activity and/or areas of the trainees practice for development.
An action plan to implement learning and/or to address skills or knowledge gaps identified.
Considerations
- Characterisation of the problem and/or question to be solved
- Existing methods and gold standards
- Legislation
- Feature engineering and selection
- Missing data and outliers
- Statistical methods for representing and summarising datasets
- Dimensionality reduction
- Classification vs regression
- Supervised vs unsupervised machine learning
- Appropriate algorithms
- Processing appropriate to data type
- Best practice, sharing knowledge and output
Reflective practice guidance
The guidance below is provided to support reflection at different time points, providing you with questions to aid you to reflect for this training activity. They are provided for guidance and should not be considered as a mandatory checklist. Trainees should not be expected to provide answers to each of the guidance questions listed.
Before action
- What do you need to know before starting this task? This includes understanding data cleaning techniques, EDA methodologies, and different types of AI algorithms and their suitability for various data types and problems.
- What do you anticipate you will learn from this experience? Consider developing skills in data pre-processing, exploratory data analysis, and linking data characteristics to appropriate AI approaches. Reflect on your current knowledge of data science and machine learning.
- What actions will you take in preparation for this experience? Will you review data cleaning and EDA techniques? Will you research different AI algorithms relevant to healthcare data? Will you discuss the study objectives and dataset with your supervisor? Consider potential challenges in handling missing data, outliers, or selecting suitable algorithms and how you might address them. Identify how you feel about embarking on this training activity.
In action
- As you clean the data, what specific techniques are you applying to handle missing values, outliers, or inconsistencies? Why are you choosing these methods?
- During EDA, what visualisations and statistical summaries are you generating? What decisions are you making about which aspects of the data to explore further?
- Which data cleaning and EDA techniques feel more intuitive based on your experience, and where do you need to consciously apply specific knowledge or refer to resources?
- How effective do you believe your data preparation is in making the dataset suitable for AI modelling? What patterns or insights are you uncovering through EDA that inform your algorithm recommendations? What challenges are you facing in handling data quality issues or selecting appropriate algorithms?
- What are you learning about the characteristics of this healthcare dataset and the specific considerations for preparing it for AI? How does this relate to your understanding of data preprocessing and machine learning requirements?
- If you encounter unexpected data issues or are unsure about the best algorithms to recommend, what alternative data cleaning or EDA techniques could you try? Would consulting literature on AI applications in healthcare or discussing with a data scientist be helpful at this point? Are your recommendations for AI algorithms justified by your EDA findings?
On action
- Describe the healthcare dataset you worked with, the data cleaning and preparation steps you took, the EDA techniques you applied, and the AI algorithms you recommended.
- What did you learn about the challenges and importance of data cleaning and preparation for AI studies? How did you apply EDA techniques to gain insights from the data? What factors did you consider when recommending appropriate AI algorithms? Did your initial understanding of the data change after performing EDA?
- What specific data cleaning, preparation, or EDA techniques do you want to learn more about? How will you improve your ability to recommend suitable AI algorithms based on data characteristics? What are your next steps in further developing your skills in preparing data for AI? Do you require any further resources on data cleaning, EDA, or the selection of AI algorithms?
Beyond action
- Have you revisited the dataset you cleaned and prepared, and the AI algorithms you recommended? How has your understanding of data preparation for AI and algorithm selection evolved? Have you worked on other AI projects since?
- How has this activity enhanced your understanding of the data requirements for AI and machine learning in healthcare in your current practice? Have the data analysis skills been transferable?
- What transferable skills, such as data analysis and critical thinking, did you develop? What further learning in data cleaning, EDA techniques, and AI algorithms would be valuable?
Relevant learning outcomes
| # | Outcome |
|---|---|
| # 10 |
Outcome
Design, develop, train and validate AI models using alphanumeric and imaging datasets. |