Complete: Australian Reference Genome Atlas: enabling discovery and use of genomic data from Australian native and agricultural species
Note: This activity completed in February 2024 with the launch of the ARGA portal. BioCommons continues to support ARGA through a collaboration with the Atlas of Living Australia.
BACKGROUND
Genomes, and their associated downstream applications, are powerful tools for discovery and application of knowledge around species behaviour and biology.
Genomics based approaches can improve our understanding of species’ taxonomy, provide information regarding past and (potential) future evolutionary processes, complement current ecological survey and study methods, inform crop and livestock breeding, and support conservation management.
Australia is home to at least 157,000 native animal, plant and fungi species, many of which are unique to the continent. Additionally, farmed species including livestock, poultry, fish, crustaceans, shellfish, grains, legumes, oilseeds, fruits, vegetables and forestry trees are extremely important to the Australian economy - collectively generating $67 billion in 2019–20.
THE CHALLENGE
Genomic data from some of these species is publicly available but stored in a multitude of largely disconnected online databases - some well known (like EMBL-EBI or NCBI), others not so well known. There are also significant amounts of genomics data stored in herbaria, museums and other organisational repositories that are undiscoverable (at the time of writing). Data for many species is actively being generated through genome sequencing and assembly projects in Australia (eg. Bioplatforms Australia’s Framework Data Initiatives) and elsewhere (eg. the Vertebrate Genome Project).
In order to realise a vision where genomic approaches are applied widely across conservation and agriculture, data from as many Australian-relevant species as possible needs to be findable and combinable for subsequent analyses. This includes considering contextual information about those species when locating appropriate genomic data (eg. which populations of a species within a range of latitudes are drought tolerant or fire resistant?). A lack of appropriate resources and infrastructure currently renders this approach laborious and time consuming.
PROJECT OUTLINE
To consolidate genomic data from Australian species, and make it more readily available for researchers to use, Australian BioCommons collaborated with Atlas of Living Australia (ALA), Bioplatforms Australia, and the Australian Research Data Commons (ARDC) to establish ARGA - the Australian Reference Genome Atlas.
ARGA is a ‘one stop shop’: locating and aggregating descriptions of relevant genomic data from Australian native or agricultural taxa in one place (eg. genome assemblies, genome annotations, barcodes, raw data, other ‘omics’ data). These data are housed in multiple places around the world and are easily discoverable in an integrated way through an ARGA web portal.
ARGA enables scientists to search for these data by taxonomic group (genus, species, subspecies), as well as by functional classification (eg. drought/salt/fire tolerance, conservation status), and by geographical classification (eg. location and altitude).
ARGA facilitates subsequent comparative analysis of genomic data, with researchers being able to download data from wherever it resides for offline analysis, or to push their data directly to Galaxy Australia for analysis in the cloud.
This activity was funded through NCRIS funding via Bioplatforms Australia, Atlas of Living Australia and Australian Research Data Commons.
The project is documented on the Atlas of Living Australia website.
Project partners: