Critical infrastructure for training and innovation with synthetic health data

Project Participants

Status: Ongoing


Problem: Training and innovation in digital health and data science require access to realistic data. This is difficult while maintaining appropriate privacy and confidentiality controls.

Gap: We lack infrastructure for training and innovation. Current commercial synthetic data solutions target application development, not research. Open-source solutions for research don’t support best practice for engineering reproducible data and machine learning pipelines.

Proposal: Rapid alpha prototype evaluation for existing open-source synthetic data tools: (1) generate synthetic versions of Australian health data sets and (2) test usability for training and innovation using real-world projects from industry.


  • Unlock innovation opportunities for health service and technology partners, providing Australian industry a means to test methods on realistic data.
  • Scale delivery of high-quality experiential learning to undergraduate, microcredential, CPD, masters and research students in digital health and data science.


Project Objectives:

  • Gap analysis of tools for data pipelines in production informatics, data science and health research with medical-device grade traceability and reproducibility.
  • Trial capstone projects to support experiential learning with realistic data challenges for undergraduate, postgraduate and professional short course students.
  • Trial datathon and industry projects using synthetic data supporting innovation over retrospective data for clinical and informatics research and development.
  • Evaluation of algorithm accuracy when trained on synthetic vs real data. Utility assessment for research and innovation. Roadmap for tooling and platform.

Integrity, Excellence,
Teamwork and Authenticity