24 November 2022
AlphaFill: filling in the blanks in protein structure prediction
The development of tools like AlphaFold and RoseTTAfold, and more recently of the ESM Metagenomic Atlas and OmegaFold, was a gamechanger in the field of 3D protein structure prediction. However, for many proteins a pivotal piece to the puzzle was still lacking, as most proteins do not function without specific cofactors. New work by the group of Oncode Investigator Anastassis Perrakis at the Netherlands Cancer Institute literally fills this void and brings AlphaFill into the picture. Their work was published today in Nature Methods.
The function of a protein is highly dependent on its structure. However, correctly predicting 3D protein structures based on their – 1D - sequence has been a major scientific challenge. The recent development of AI-based tools has significantly changed this, as predictions became far more reliable. “You could say that the tools we have in our toolbox have undergone a revolutionary upgrade with the introduction of Artificial Intelligence”, Tassos Perrakis explains. “But what we soon realized is that proteins need other proteins, or smaller co-factors to function properly: a zinc finger domain does not have its proper structure without a zinc ion and a kinase will not work without ATP. Such elements were still missing in the newest prediction tools.”
Transplanting & validating
The team at the Netherlands Cancer Institute, co-led by Tassos Perrakis and Robbie Joosten, set out to enrich the models presented in the AlphaFold-EBI database by “transplanting” small molecules from homologous proteins with experimentally determined structures, that are all available in their PDB-REDO databank. “Many 3D structures of protein domains containing various small molecules, tens of thousands of them, have been experimentally determined. We used these data to transplant these co-factors, ligands, and metal ions to proteins with similar domains in the AlphaFold database. We then validated these predictions by comparing them to 3D protein structures which were experimentally solved and are 100% identical based on sequence”, explains Ida de Vries, a PhD student who focused on the validation of the algorithm. This showed that the algorithm was successful and yielded over 12 million transplants with validation metrics on almost 1 million AlphaFold structures.
The complete database, which was built by Maarten Hekkelman who also performed the “hardcore” programming for the new tool, is available through alphafill.eu. This is a new resource to help the scientific community develop new hypotheses and design targeted experiments. Tassos summarizes: “Our work adds a new element to the upgraded toolbox to predict 3D protein structures. A crucial moment is when it occurred to us that all these amazing AlphaFold structures lack their natural co-factors and ligands. For example, hemoglobin without heme is not functional. A kinase needs ATP and magnesium. Many enzymes depend on metal ions to work. We also know that many colleagues that would be using the AlphaFold database as a resource to understand mechanisms relevant to cancer, don’t have a lot of experience in understanding protein structure. They could be misled by the absence of such crucial information from the protein models or would not know how to add it even when they realize it’s missing. That’s why we created the AlphaFill resource to provide better functional context to the amazing structure prediction of the AlphaFold database.”