Uncovering the secrets of parasites through AI and Cloud computing

Adult Schistosoma haematobium


Adult Schistosoma haematobium. Image credit: Sara Li, NIH NIAID Schistosomiasis Resource Center, Biomedical Research Institute, Rockville, MD.

An exciting novel approach to investigating human and animal parasites is underway. New software and computational capabilities enabling automated structure-based genome annotation are overcoming some of the challenges of working with non-model organisms. A partnership between Australian BioCommons and Microsoft has resulted in University of Melbourne researchers being able to quickly leverage new methods through a large grant to use the Azure Cloud, with significant implications for the research community worldwide.

Conventional, primary sequence-based methods of annotation leave a large proportion of non-model organisms’ genes and proteins “unannotatable” because of the lack of known homologs in other species. But a new world of possibilities opens up by using structure to guide the search using AlphaFold, the AI system that accurately predicts the 3D structure of proteins.

Parasites that cause major human and animal suffering are prime candidates for this new structure-based approach. The research of Dr Neil Young from Prof Robin B Gasser’s laboratory at The University of Melbourne’s Faculty of Veterinary and Agricultural Sciences focuses on a range of socioeconomically important parasites. They are poised to fast track our understanding of how these complex organisms work by utilising AlphaFold to annotate all proteins encoded in the genomes of parasitic flatworms and their intermediate hosts (snails).

AlphaFold 2 is now available via the easy-to-use Galaxy Australia interface, with the platform also providing access to a rich catalogue of computational resources including the GPU clusters required to power AlphaFold. Further, Australian BioCommons has been working to deploy Galaxy Australia on commercial cloud resources, to enable massive scale-up of the platform and access specialised resources. The new Australian AlphaFold Service brings these significant developments together: researchers’ jobs are now running on the Azure Cloud thanks to an Australian BioCommons collaboration with BizData and Microsoft on Azure.

Connecting the Microsoft team with researchers who could immediately get to work, the proposal from the University of Melbourne Team  to perform automated structure-based proteome annotation on a genome-wide scale was granted a massive computational boost. Jobs have already begun capitalising on the generous 12,000-hour A100 GPGPU allocation on Microsoft’s Azure cloud (equivalent to approximately $65,000 AUD). 

By applying new capabilities in structure-based homology, this project will overcome the problems associated with primary sequence-based methods that have long dogged researchers seeking to annotate protein-coding genes and their products in silico. A substantial improvement in the accuracy of the functional annotation of nine parasite proteomes is expected, but this approach can be immediately applied to any other groups of eukaryotic organisms by researchers worldwide.

Read more about the work of Robin B Gasser’s Lab.

All Australian researchers can now access AlphaFold 2 through Galaxy Australia by simply applying through the Australian AlphaFold Service

Australian BioCommons’ Australian AlphaFold Service is part of Galaxy Australia which is managed by QCIF, Melbourne Bioinformatics and AARNet. The AlphaFold Service is specifically underpinned by scalable computational resources procured from Microsoft Azure. This service is supported by funding from the Queensland Government’s Research Infrastructure Co-investment Fund (RICF), Bioplatforms Australia (BPA) and Australian Research Data Commons (ARDC). 


Galaxy Australia is underpinned by computational resources provided by AARNet, Nectar Research Cloud, University of Melbourne, QCIF, National Computational Infrastructure and the Pawsey Supercomputing Research Centre. These efforts are supported by funding from The University of Melbourne, the Queensland Government’s Research Infrastructure Co-investment Fund (RICF), Bioplatforms Australia (BPA) and Australian Research Data Commons (ARDC). BPA and ARDC are enabled by NCRIS.