BioCommons ‘Bring Your Own Data’ Expansion Project

Note: This project was completed in 2023. Work is ongoing in the BioCLI and Workflow Commons projects.

This ARDC and BioCommons sponsored project is delivering a key component of BioCommon’s vision for an ecosystem of data analysis and digital asset stewardship platforms.

Project timeline: July 2020 to December 2023

Web based bioinformatics workbenches

Online access to best-practice life science tools, workflows, data and training, underpinned by compute and storage that we manage for you.

Project achievements

Established the Australian Apollo Service for real-time community curation and editing of genome annotations

Established the Australian Alphafold Service allowing AI prediction of a protein’s 3D structure from its amino acid sequence
Expanded Galaxy Australia’s capacity and capability
- Increased on-demand support for more users by moving core services to AARNet
- Increased capacity thanks to Pawsey, NCI and Microsoft Azure joining The University of Melbourne and QCIF as infrastructure providers
- Access to specialised high memory servers to speed up assembly jobs, and GPU capability to underpin Alphafold
Job scheduling to the most appropriate compute resources based on criteria like system load, available memory, user identity, and the type of job being run
Established Galaxy Australia’s Training Infrastructure as a Service allowing instructors to apply for dedicated training capacity and monitor the status of trainees’ jobs
Established the Galaxy Australia Genome Lab: a user-friendly view of Galaxy Australia providing rapid access to sophisticated genome assembly and annotation resources

Improved access to proteomics tools by:
- Co-developing an initial release of the Galaxy Australia Proteomics Lab with the Australian Proteomics Bioinformatics community
- Making Monash Proteomic Analyst Suites available in Galaxy Australia

Command line for life scientists

Community curated life science workflows, tools, training and support across Australian command line infrastructures.

Project achievements

Established the Australian Fgenesh++ Service for automatic prediction of genes in eukaryotic genomes
Installed and optimised partner-developed bioinformatics workflows across command line infrastructures at NCI, Pawsey, QRIScloud, and the University of Melbourne Research Computing Services
- And made it easier to discover and reuse these workflows by registering them with WorkflowHub
Developed services, documentation and configurations making it easier to run Nextflow pipelines, including:
- An Australian Seqera Platform Pilot Project with early adopters
- Configurations for running pipelines developed by the nf-core community at NCI-Gadi and Pawsey-Nimbus, reducing the need for development and support, as well as reducing maintenance burdens
- A template to aid beginners in developing their own Nextflow workflows
Provided access to more bioinformatics tools on national computing infrastructures through:
- CernVM-FS, biocontainers and Singularity Registry HPC software sharing and configuration technologies
- The BioImage - A purpose built environment for bioinformatics on the command-line, currently available on the Pawsey Nimbus cloud
Established Tool Finder, a searchable table detailing which versions of bioinformatics software are installed across Australian computational infrastructures
Made it simpler to move between workflow languages using the Janis system:
- An abstraction layer for describing workflows, and a tool that can translate workflows between languages such as CWL, WDL, Galaxy and Nextflow
- Example translations that are available in the GitHub repository
Supported webinars and workshops on the use of command line bioinformatics tools and workflows

Data infrastructure for life scientists

Making it easier for life scientists to access, analyse, visualise and share data coming from data generating facilities, or generated by research consortia.

Project achievements

Connecting Galaxy Australia to external data sources. It is now simpler to move data between Galaxy Australia and:
- Cloud storage services like Dropbox and Owncloud thanks to Australian contributions to the Galaxy codebase
- The Bioplatforms Australia Data Portal (see our How-to-Guide)
Community endorsed de novo genome assembly workflows
- HiFi Genome Assembly on Galaxy Australia, developed in consultation with the Bioplatforms Australia Threatened Species Initiative
- HiFi de novo Genome Assembly using Nextflow, developed by the Australian Genome Research Facility
- Large genome assembly tutorial and workflows.

Deployment of Vertebrates Genome Project assembly workflows in Galaxy Australia
Methods and tools deployed for analysing and visualising new omics data types, including:
- Tandem mass tag proteomics data in collaboration with Monash proteomics
- Single cell data via Galaxy Australia workflows and a How-to Guide developed by Griffith University’s Central Facility for Genomics
- Shotgun metagenomics data analysis via Galaxy Australia workflows and a How-to-Guide, adapted by Griffith University’s Central Facility for Genomics from a Galaxy Training Network tutorial

The Australian BioCommons BYOD Expansion Project was funded through NCRIS investments from
Bioplatforms Australia and the Australian Research Data Commons (http://doi.org/10.47486/PL105)
that were matched by co-investments from AARNet, Melbourne Bioinformatics, NCI, Pawsey,
QCIF via the Queensland Government RICF fund, The University of Sydney, AGRF, Griffith University and Monash University.

Project partners