Enhancing Australia’s capability for secure and responsible sharing of human genome research data
Note this project completed in 2023. Australian BioCommons are continuing to work in this space through our Human Genome Informatics activities.
Background
Affordable DNA sequencing at scale has enabled the genomes of hundreds of thousands of people to be determined across the world and has led to a better understanding of the causes of complex diseases, better diagnosis / early disease detection and more options for identifying tailored treatment options.
To achieve these outcomes, genomic information from one individual needs to be compared with multiple other genomes from similar cases to form cohorts of sufficient size to produce statistically meaningful outputs. This is often done across multiple efforts/jurisdictions, at national or global scales, and requires the genomic data to be findable, searchable, shareable, and linkable to analytical capabilities.
Due to the sensitive nature of genomic information, the privacy of individuals must always be protected, and any data processing must always be done ethically, securely and safely.
Many human genome sequencing and analysis efforts across Australia have developed in-house solutions based on different technologies for storing/warehousing genome data and describing the content of these collections, and have largely manual/laborious systems for managing and providing access to data for bona fide researchers.
The content of each collection is often inaccessible to outside users, and although there is a desire to share data wherever possible for research use, most have no efficient way to expose the collection content to researchers or to distribute the data, so there is currently a substantial burden to do so.
PROJECT OUTLINE
The Australian BioCommons, with support from the Australian Research Data Commons (ARDC) and Bioplatforms Australia, has brought together a multidisciplinary team comprising organisations that represent many of the largest human genome sequencing and analysis efforts in Australia to deliver a $3.3M, nationally funded collaborative project – the Human Genomes Platform Project.
The project is designed to enhance capability for securely and responsibly sharing human genome research data nationally and internationally, ensuring maximum value can be derived from these valuable assets. It is investigating best practice technologies that have been globally developed for the purposes of human genome data sharing, and deploying Australian first technologies in the form of a ‘services toolbox’ (visualised to the right) for improving FAIRness of genomic data at the organisations that hold most human genomes collected for research in Australia.
The toolbox will build on existing browse and search functions in use at participating repositories to achieve controlled access and sharing by implementing standards and APIs from the Global Alliance for Genomic Health (GA4GH); and also bring the data holdings at each repository into better alignment with the European Genome Phenome Archive (EGA), the global human genome repository.
Research gains will include fundamental improvements in data management and access to new capabilities, especially the identification of cohorts within and across data holdings to enable new science and translation.
Critically, the project will establish and implement a working template any other institution can adopt and deploy.
AIMS
The specific aims of the project are to investigate and subsequently implement:
systems for identifying cohorts of human genomes across multiple participating repositories
semi-automated systems that can be used by Data Access Committees (DACs) at participating repositories to expedite user approvals
federated identity and access management systems with assurance levels appropriate for human genome data
systems for streamlined encryption and uploading genome files to international repositories such as the EGA
The project is also exploring the feasibility of Local EGA node deployment(s) in Australia from a technical, policy and funding perspective.
Underpinning the project is a documentation and training component to enable other researchers and clinicians to use the systems and IT infrastructure providers to deploy the systems elsewhere.
outputs
Outputs from the project to date include:
Federated Identity and Access Management (IAM) Discovery Phase Report
Data Access Committee (DAC) Automation Discovery Phase Report
Instructions for deploying REMS (Resource Entitlement Management System) on AWS
Webinar: Genomic data - improving discovery and access management
Final project showcase on YouTube
The project forms part of the Human Genome Informatics initiative and is funded through NCRIS funding via the Australian Research Data Commons (https://doi.org/10.47486/PL032) and Bioplatforms Australia, as well as contributions from each partner organisation.
Project partners: