BioCLI: Improving command-line infrastructure for life scientists
Vision
A collection of command-line environments and services, tailored to the needs of Australian life science researchers, deployed at the compute infrastructures they use, supporting both research and training.
The BioCLI Project aims to empower life scientists with user-focused CLI environments and services that reduce friction for processing and analysis of molecular data at scale.
Challenges
Data Complexity: Handling the growing scale and complexity of ‘omics data and complex analyses requires the flexibility, scalability, and control uniquely afforded by the command-line interface (CLI).
Varied methods and expertise: The diversity of bioinformatics data, scale of work, available tools presents significant challenges when configuring CLI environments. Most life scientists do not have the expertise required, and need to be empowered with the resources, skills, and knowledge to navigate CL environments and handle substantial workloads confidently and efficiently.
Current activities
Streamlining access and execution
Making it easier to access and execute bioinformatics software and workflows via the CLI by:
Enabling Nextflow plugins at NCI. Nextflow plugins are very popular in the Nextflow community, but they can be tricky to implement on different systems. A new version of Nextflow on NCI allows users to run plugins like nf-schema that streamlines workflow execution and parameter validation
Developing a custom Nextflow task monitor for national HPC job schedulers, including detailed cost reporting
Configuring national HPCs to better accommodate complex and long running bioinformatics workflows (e.g. specialised nodes on Pawsey’s Setonix)
Developing a simple Nextflow workflow template that assists newcomers to construct and configure workflows for execution on HPC and cloud systems.
Reference datasets
Enabling access to curated datasets that are needed for standard processing and analysis.
BioImage
A welcoming interface to HPC environments specifically created for bioinformatics users.
Porting the BioImage to NCI’s Nirin Cloud
Training program
Empowering users to work at the CLI.
Webinar - What exactly is bioinformatics?
Hardware access
Facilitating wide access to specialist hardware by:
Developing code to run GPU-enabled structural biology tools like Alphafold on Pawsey’s HPC, Setonix
Project timeline
January 2024 - December 2026
Project partners
Australian BioCommons is collaborating with our partners at Sydney Informatics Hub, the National Computational Infrastructure (NCI), and the Pawsey Supercomputing Research Centre to deliver the BioCLI Project.