CAIRE4Aus Feasibility Study

Project Participants

Status: Ongoing

Opportunity

The CAIRE4Aus project aims to establish a global-first clinical data resource repository of electronic health record (EHR) data linking primary, secondary, and tertiary care and including a diversity of data modalities. This will:

  • Enable researchers to access diverse, high-quality clinical text data for advancing healthcare innovation.
  • Provide a trusted foundation for large-scale data initiatives – supporting not only CAIRE4Aus but also future large scale-scale data initiatives.
  • Create a scalable model for privacy-preserving clinical data sharing, strengthening public trust and accelerating research.

In this feasibility study, we address a critical enabling technology for building this repository – the need for robust de-identification and anonymisation strategies to ensure that clinical data, specifically clinical texts, can be safely stored and shared.

Project Objectives

The project team will:

  • Develop reliable de-identification methods and robust protocols to prepare clinical text for safe inclusion in shareable datasets.
  • Benchmark existing de-identification tools on local health service datasets and enhance methods for detecting personally identifiable information (PII).
  • Annotate a representative sample of clinical texts to support model development and validation, leveraging an existing clinical text corpus created with Austin Health.
  • Explore anonymisation techniques—such as the “hide in plain sight” approach – to ensure personally identifiable information (PII) is appropriately obfuscated.
  • Establish foundational privacy protocols that will underpin the creation of a secure clinical data lake.

Integrity, Excellence,
Teamwork and Authenticity

Hand holding a smartphone against a colourful, defocused background