Interfacing with international omics data repositories
Life sciences researchers continue to generate larger and larger omics datasets, with many research funders and journals requiring this data to be stored in a publicly accessible manner, following the FAIR (findability, accessibility, interoperability, and reusability) principles. To meet these requirements, best practice for 'omics data and associated contextual metadata is generally considered to be submission to international data repositories. Examples include the repositories managed by the DNA Data Bank of Japan (DDBJ), the European Bioinformatics Institute (EMBL-EBI), and the National Center for Biotechnology Information (NCBI), and others identified by the Global Biodata Coalition as core global biodata resources.
Ongoing engagement between BioCommons and Australian life scientists have identified several challenges that some researchers face either in the data submission process to these repositories, or the timely retrieval of data from them.
In alignment with our mission to provide access to services that support digital asset stewardship and management, retention, integration and publication solutions as they evolve, we are devising approaches that can help Australian researchers overcome these challenges.
Data submission
Our Omics Data Publishing to International Repositories from Australia report outlines the challenges faced by the Australian research community in submitting data to various international repositories and includes a set of recommendations to address them.
We are now following the blueprint set out by the recommendations to:
Establish collaborations with international repository management teams to design new training offerings and improve existing documentation
Raise awareness of existing training resources
Deploy local services that simplify data or metadata submission for Australian life science researchers
Ongoing activities
Collaborating with EMBL-EBI:
We are collaborating with EMBL-EBI and the European Nucleotide Archive (ENA) to investigate the value of bringing ENA team members to Australia for a variety of events in early 2025, including training workshops for data submission to ENA, an opportunity to provide feedback on existing documentation, and hands-on sessions to trial community-driven documentation design to improve the usability, and accessibility of EMBL-EBI resources.
Simplifying data submission:
We are working to deploy the Galaxy ENA Upload Tool in Galaxy Australia, which will present a mechanism for the 30,000+ users to submit raw sequencing reads, consensus sequences and their associated metadata to ENA without leaving the Galaxy interface.
Collating training material relevant to data management:
There are many excellent self-paced training materials available that explain how to collect and record contextual metadata, and submit data and associated metadata to various international repositories. We curate a collection of recommended resources for data management that are housed in the Australian BioCommons Learning Library.
Data retrieval
Uploading data to international repositories is only one part of the research data life cycle. Many researchers also obtain data for their analyses from various international data repositories, and our consultations have revealed that some researchers encounter challenges when attempting to retrieve data at scale.
We are running a series of community town hall style meetings (for Q3 2024) to investigate this problem further. The first town hall was held in July.
Get involved
BioCommons’ awareness of the challenges facing Australian life sciences researchers when interfacing with international repositories have stemmed from our broad ranging engagement activities. New engagements are welcome at any time through our research-domain focused mailing groups, or through the contact us form.
More opportunities to get involved will be listed here as they arise. Be sure to subscribe to the BioCommons monthly newsletter to stay in the know!