Rapid genome assembly on Gadi at NCI

The Australian BioCommons is identifying community-supported bioinformatics tools used for assembly of non-model organism reference genomes, and subsequently coordinating the install, optimisation and documentation of these tools across Australian computing facilities, including the national (tier 1) high performance computing centres. A major aim is to provide reusable and reproducible methods that can be applied across these and other infrastructures available to the genome assembly community.

The first tool considered by this activity was Canu, a long read assembly package for Nanopore and PacBio sequencing data. Collaboration between researchers from the Genomics for Australian Plants (GAP) consortium and specialists at the National Computational Infrastructure (NCI) resulted in a decrease in assembly time for the Golden Wattle (Acacia pycnantha Benth.) from more than 2 weeks on institutional resources to 3 days on the Gadi supercomputer. This was achieved using a wrapper script that makes distributed jobs from Canu compatible with the scheduler on Gadi: allowing the tool to make use of multiple nodes. The Gadi-optimised implementation of Canu is described in detail on the BioCommons GitHub Canu repository.

The success of this work has led to multiple additional activities:

  • Completion of the Waratah (Telopea speciosissima) genome assembly for GAP during a user test of the optimised Canu installation

  • Sharing of the optimised Canu with BioCommons stakeholder researchers

  • Additional optimisation and troubleshooting on Gadi for larger mammalian genomes (> 3 Gb) to support Oz Mammals Genomics (OMG)

  • Benchmarking activities for Canu to support merit applications by the bioinformatics community.

Australian BioCommons regularly engages with Australian bioscience research communities to document challenges and define requirements for shared bioinformatics resources. Please join the discussion with the Genome Assembly community to develop a vision for shared national infrastructure that will support your research. For further information: contact@biocommons.org.au

Christina HallGenomics