New high memory computers fast-track insights into large genomes
Galaxy Australia’s new high memory servers have delivered an impressive leap forward for scientists with large datasets and computationally-intensive analyses. In a promising test of new infrastructure capacity, the genome of Australia’s national floral emblem, the Golden Wattle (Acacia pycnantha) was assembled this month on Galaxy Australia, demonstrating game-changing efficiency for the service.
To complete this massive task seamlessly in less than 24 hours vindicates the recent expansion to the data integration and analysis possibilities of the Galaxy Australia platform. The assembly tool, FLYE, was used to process 50GB of raw nanopore data - representing approximately one billion base pairs from the whole Acacia genome.
Large and complex in comparison to something like a microbial genome, the plant genome was an ideal candidate to test previous limits. This type of analysis could take months using conventional institutional compute resources, and worse still, the job can sometimes be prone to failing at the end of the long processing time. Due to memory limitations, this type of analysis wasn’t previously supported by Galaxy Australia.
The capacity upgrade came after capital investment from the Australian Research Data Commons (ARDC) and Australian BioCommons was made into the Melbourne node of the ARDC Nectar Research Cloud at the University of Melbourne. An equivalent investment into high memory servers at Galaxy Australia’s QCIF node will also come online this year.
Rapid genome assembly not only increases research productivity, but it also facilitates iterative pipeline optimisation and increases the opportunity to gain more insights quickly. Research becomes more reproducible when it’s possible to update an analysis with a new tool version, or tweak and repeat analyses at the request of a research partner or reviewer.
The tests were performed using unique and valuable data that has been years in the making. Researchers at the Royal Botanic Gardens Victoria have been compiling a Golden Wattle reference genome as part of a national consortium tasked with generating whole genome sequences of iconic Australian flora and fauna. Genomics for Australian Plants was initiated by Bioplatforms Australia in partnership with researchers from the Australian State and National Herbaria and Botanic Gardens, and will ultimately ensure that the Golden Wattle genome is sequenced, assembled and shared.
David Cantrill, Executive Director Science, Royal Botanic Gardens Victoria, said,
The genome of Golden Wattle will help us identify which genes are important using comparative genomics for Acacia species of conservation concern, and will boost biogeographic studies on widespread species. We want to identify the genes affecting traits of economic significance such as bioactive compounds or salinity resistance.
The testing of the new Galaxy Australia capability is just the beginning of a comprehensive program of work to ensure all researchers can utilise the new powerful machines. Running whole pipelines that include quality testing of input data, trying different tool usage on data subsets and submitting jobs with a range of different parameters will keep the Galaxy Australia team busy for some weeks. Testing various tools on other big data sets and optimising workflows will ensure that researchers around the country can soon bring their own large genomes and assemble them like never before.
The ARDC and Bioplatforms Australia are enabled by the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS)