Critical infrastructure for training and innovation with synthetic health data
Project Participants
Status: Ongoing
Opportunity
Problem: Training and innovation in digital health and data science require access to realistic data. This is difficult while maintaining appropriate privacy and confidentiality controls.
Gap: We lack infrastructure for training and innovation. Current commercial synthetic data solutions target application development, not research. Open-source solutions for research don’t support best practice for engineering reproducible data and machine learning pipelines.
Proposal: Rapid alpha prototype evaluation for existing open-source synthetic data tools: (1) generate synthetic versions of Australian health data sets and (2) test usability for training and innovation using real-world projects from industry.
Impact:
- Unlock innovation opportunities for health service and technology partners, providing Australian industry a means to test methods on realistic data.
- Scale delivery of high-quality experiential learning to undergraduate, microcredential, CPD, masters and research students in digital health and data science.
Project Objectives:
- Gap analysis of tools for data pipelines in production informatics, data science and health research with medical-device grade traceability and reproducibility.
- Trial capstone projects to support experiential learning with realistic data challenges for undergraduate, postgraduate and professional short course students.
- Trial datathon and industry projects using synthetic data supporting innovation over retrospective data for clinical and informatics research and development.
- Evaluation of algorithm accuracy when trained on synthetic vs real data. Utility assessment for research and innovation. Roadmap for tooling and platform.