AlphaFold now available in Galaxy
It seems the whole world is talking about AlphaFold, the AI system that predicts a protein’s 3D structure from its amino acid sequence that achieves accuracy comparable with real-life experiments. There was much fanfare last year when DeepMind published the scientific paper and source code that explained their innovative system.
In partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI) the predictions for the shape of every single protein in the human body, as well as for the proteins of 20 other important organisms were made freely available to the scientific community.
Understanding a protein’s structure helps us to understand their function, and is traditionally achieved through slow, laborious experimentation. Painstaking effort over many years has determined the structures of around 100,000 unique proteins, but this represents just a tiny fraction of the billions of known protein sequences. Using computational approaches to enable large-scale structural bioinformatics to predict protein structures now promises to fast track our understanding of protein structure.
The team at Galaxy Australia saw an opportunity to further democratise access to this useful tool by making AlphaFold 2.0 available in Galaxy. Galaxy Australia provides Australian researchers access to a rich catalogue of computational resources and now includes the GPU clusters required to power AlphaFold 2.0. Life scientists can now easily visualise the consequences of DNA variants at the protein level, accelerating understanding of protein-protein interactions, activation or inhibition studies and drug design as examples.
It was an ambitious and technically challenging task, made possible through the work of multiple people around Australia, and indeed the world. While Galaxy Australia Developers and Admins laboured away to make the specific hardware, reference data and environment setup work together, Galaxy EU provided the necessary GPU-enabled development machine to test the approach. This technical triumph means that AlphaFold 2.0 is now available for installation on all Galaxy services globally, via the Galaxy toolshed. If you are interested in the technical details, you can visit the development repository.
Thanks to Galaxy Australia, the new Australian AlphaFold Service is now taking amino acid sequence uploads. All the set-up and provisioning of underlying GPU infrastructure are taken care of, so researchers can focus on generating the protein 3D structure itself. The service is currently available to early adopters for testing and benchmarking. If you think you might have a great research application for using AlphaFold in Galaxy Australia, please submit an expression of interest.
The work forms part of an exciting broader project to expand Galaxy Australia to utilise commercial cloud resources, enabling massive scale-up of the platform and access to specialised resources. It is expected that Galaxy Australia’s AlphaFold jobs will run soon on the Azure Cloud thanks to an Australian BioCommons collaboration with BizData and Microsoft Azure.
Stay tuned for upcoming stories explaining how early adopters are using the new resource on Galaxy Australia. If you are excited that your research questions can be answered by using the new Australian AlphaFold Service, let Galaxy Australia know via this form.