Enhancing Australia’s capability for secure and responsible sharing of human genome research data

Note this project completed in 2023. Australian BioCommons are continuing to work in this space through our Human Genome Informatics activities.


Background

Affordable DNA sequencing at scale has enabled the genomes of hundreds of thousands of people to be determined across the world and has led to a better understanding of the causes of complex diseases, better diagnosis / early disease detection and more options for identifying tailored treatment options. 

To achieve these outcomes, genomic information from one individual needs to be compared with multiple other genomes from similar cases to form cohorts of sufficient size to produce statistically meaningful outputs. This is often done across multiple efforts/jurisdictions, at national or global scales, and requires the genomic data to be findable, searchable, shareable, and linkable to analytical capabilities.

Due to the sensitive nature of genomic information, the privacy of individuals must always be protected, and any data processing must always be done ethically, securely and safely.

Many human genome sequencing and analysis efforts across Australia have developed in-house solutions based on different technologies for storing/warehousing genome data and describing the content of these collections, and have largely manual/laborious systems for managing and providing access to data for bona fide researchers.

The content of each collection is often inaccessible to outside users, and although there is a desire to share data wherever possible for research use, most have no efficient way to expose the collection content to researchers or to distribute the data, so there is currently a substantial burden to do so.

PROJECT OUTLINE

The Australian BioCommons, with support from the Australian Research Data Commons (ARDC) and Bioplatforms Australia, has brought together a multidisciplinary team comprising organisations that represent many of the largest human genome sequencing and analysis efforts in Australia to deliver a $3.3M, nationally funded collaborative project – the Human Genomes Platform Project.

Icons from the Noun Project: search by Flatart, database by Start Up Graphic Design, identified by Tippawan Sookruay, group by Gregor Cresnar, Data File by Blangcon, Unlock by Arthur Shlain, archive by Adrien Coquet, support by Komkrit Noenpoempisut, documentation by lastspark, Scientist by Maxim Kulikov.

The project is designed to enhance capability for securely and responsibly sharing human genome research data nationally and internationally, ensuring maximum value can be derived from these valuable assets. It is investigating best practice technologies that have been globally developed for the purposes of human genome data sharing, and deploying Australian first technologies in the form of a ‘services toolbox’ (visualised to the right) for improving FAIRness of genomic data at the organisations that hold most human genomes collected for research in Australia.

The toolbox will build on existing browse and search functions in use at participating repositories to achieve controlled access and sharing by implementing standards and APIs from the Global Alliance for Genomic Health (GA4GH); and also bring the data holdings at each repository into better alignment with the European Genome Phenome Archive (EGA), the global human genome repository. 

Research gains will include fundamental improvements in data management and access to new capabilities, especially the identification of cohorts within and across data holdings to enable new science and translation. 

Critically, the project will establish and implement a working template any other institution can adopt and deploy.

AIMS

The specific aims of the project are to investigate and subsequently implement:

  • systems for identifying cohorts of human genomes across multiple participating repositories

  • semi-automated systems that can be used by Data Access Committees (DACs) at participating repositories to expedite user approvals

  • federated identity and access management systems with assurance levels appropriate for human genome data

  • systems for streamlined encryption and uploading genome files to international repositories such as the EGA

The project is also exploring the feasibility of Local EGA node deployment(s) in Australia from a technical, policy and funding perspective.
Underpinning the project is a documentation and training component to enable other researchers and clinicians to use the systems and IT infrastructure providers to deploy the systems elsewhere.

outputs

Outputs from the project to date include:


The project forms part of the Human Genome Informatics initiative and is funded through NCRIS funding via the Australian Research Data Commons (https://doi.org/10.47486/PL032) and Bioplatforms Australia, as well as contributions from each partner organisation.


Project partners: