BioCommons ‘Bring Your Own Data’ Expansion Project
Note: This project was completed in 2023. Work is ongoing in the BioCLI and Workflow Commons projects.
This ARDC and BioCommons sponsored project is delivering a key component of BioCommon’s vision for an ecosystem of data analysis and digital asset stewardship platforms.
Project timeline: July 2020 to December 2023
Web based bioinformatics workbenches
Online access to best-practice life science tools, workflows, data and training, underpinned by compute and storage that we manage for you.
Project achievements
Established the Australian Apollo Service for real-time community curation and editing of genome annotations
Established the Australian Alphafold Service allowing AI prediction of a protein’s 3D structure from its amino acid sequence
Expanded Galaxy Australia’s capacity and capability
Increased on-demand support for more users by moving core services to AARNet
Increased capacity thanks to Pawsey, NCI and Microsoft Azure joining The University of Melbourne and QCIF as infrastructure providers
Access to specialised high memory servers to speed up assembly jobs, and GPU capability to underpin Alphafold
Job scheduling to the most appropriate compute resources based on criteria like system load, available memory, user identity, and the type of job being run
Established Galaxy Australia’s Training Infrastructure as a Service allowing instructors to apply for dedicated training capacity and monitor the status of trainees’ jobs
Established the Galaxy Australia Genome Lab: a user-friendly view of Galaxy Australia providing rapid access to sophisticated genome assembly and annotation resources
Improved access to proteomics tools by:
Co-developing an initial release of the Galaxy Australia Proteomics Lab with the Australian Proteomics Bioinformatics community
Making Monash Proteomic Analyst Suites available in Galaxy Australia
Command line for life scientists
Community curated life science workflows, tools, training and support across Australian command line infrastructures.
Project achievements
Established the Australian Fgenesh++ Service for automatic prediction of genes in eukaryotic genomes
Installed and optimised partner-developed bioinformatics workflows across command line infrastructures at NCI, Pawsey, QRIScloud, and the University of Melbourne Research Computing Services
And made it easier to discover and reuse these workflows by registering them with WorkflowHub
Developed services, documentation and configurations making it easier to run Nextflow pipelines, including:
An Australian Seqera Platform Pilot Project with early adopters
Configurations for running pipelines developed by the nf-core community at NCI-Gadi and Pawsey-Nimbus, reducing the need for development and support, as well as reducing maintenance burdens
A template to aid beginners in developing their own Nextflow workflows
Provided access to more bioinformatics tools on national computing infrastructures through:
CernVM-FS, biocontainers and Singularity Registry HPC software sharing and configuration technologies
The BioImage - A purpose built environment for bioinformatics on the command-line, currently available on the Pawsey Nimbus cloud
Established Tool Finder, a searchable table detailing which versions of bioinformatics software are installed across Australian computational infrastructures
Made it simpler to move between workflow languages using the Janis system:
An abstraction layer for describing workflows, and a tool that can translate workflows between languages such as CWL, WDL, Galaxy and Nextflow
Example translations that are available in the GitHub repository
Supported webinars and workshops on the use of command line bioinformatics tools and workflows
Portable, reproducible and scalable bioinformatics workflows using Nextflow and Pawsey Nimbus Cloud
Getting started with whole genome mapping and variant calling on the command line
Launch, monitor and manage data pipelines on any infrastructure with Nextflow Tower
Portable pipelines: build once and run everywhere with Janis
High performance bioinformatics: submitting your best NCMAS application
Introductory RNASeq webinar (Getting started with RNASeq) and subsequent online workshop (RNASeq: reads to differential genes and pathways)
Data infrastructure for life scientists
Making it easier for life scientists to access, analyse, visualise and share data coming from data generating facilities, or generated by research consortia.
Project achievements
Connecting Galaxy Australia to external data sources. It is now simpler to move data between Galaxy Australia and:
Cloud storage services like Dropbox and Owncloud thanks to Australian contributions to the Galaxy codebase
The Bioplatforms Australia Data Portal (see our How-to-Guide)
Community endorsed de novo genome assembly workflows
HiFi Genome Assembly on Galaxy Australia, developed in consultation with the Bioplatforms Australia Threatened Species Initiative
HiFi de novo Genome Assembly using Nextflow, developed by the Australian Genome Research Facility
Deployment of Vertebrates Genome Project assembly workflows in Galaxy Australia
Methods and tools deployed for analysing and visualising new omics data types, including:
Tandem mass tag proteomics data in collaboration with Monash proteomics
Single cell data via Galaxy Australia workflows and a How-to Guide developed by Griffith University’s Central Facility for Genomics
Shotgun metagenomics data analysis via Galaxy Australia workflows and a How-to-Guide, adapted by Griffith University’s Central Facility for Genomics from a Galaxy Training Network tutorial
The Australian BioCommons BYOD Expansion Project was funded through NCRIS investments from
Bioplatforms Australia and the Australian Research Data Commons (http://doi.org/10.47486/PL105)
that were matched by co-investments from AARNet, Melbourne Bioinformatics, NCI, Pawsey,
QCIF via the Queensland Government RICF fund, The University of Sydney, AGRF, Griffith University and Monash University.
Project partners