AlphaFold2, one of the AI programs behind the 2024 Nobel Prize win in Chemistry, provides an innovative breakthrough for tackling "The Protein Folding Problem." The protein structure prediction capability provided by AlphaFold2 not only provides a new perspective for solving this long-standing problem but also indicates the potential of AI in shaping the future of medicine. By Aryan Boruah
“Finding the native folded state of a protein by a random search among all possible configurations can take an enormously long time” – Levinthal’s Paradox
The 2024 Nobel Prize in Chemistry was awarded to David Baker, Demis Hassabis, and John Jumper, the developers of AlphaFold2 and RoseTTAFold. Yes, you heard that right! A computational AI model, for the first time, was recognised in a Nobel Prize win. It is quite fascinating to observe how the tide has shifted and the advent of Large Language Models (e.g. ChatGPT) and advancements in Machine Learning have altered the STEM landscape.
One such recent important advancement is transformer architecture, introduced in the landmark research paper, “Attention is all you need.” Google DeepMind took leverage of this Machine Learning algorithm and incorporated it within the AlphaFold2 architecture, resulting in an enhanced boost in performance and more accurate predictability. As the saying goes, “Nothing is perfect in the first try,” which is applied AlphaFold2 is the second iteration of the AlphaFold system. AlphaFold2 has solved one of the most significant and long-standing problems in the field of structural biology, “The Protein Folding Problem.”
Proteins are fundamental building blocks of life, functioning as bio-machines essential for biological processes (Branden & Tooze, 1999). Formed by chains of amino acids, proteins must fold into precise three-dimensional (3D) shapes to function correctly. Predicting these structures from an amino acid sequence, termed “the protein folding problem,” has been a scientific challenge for decades (DeepMind, 2021). Understanding this relationship can revolutionize biology and medicine.
The complexity of protein folding is underscored by the Levinthal Paradox (Levinthal, 1969). Levinthal noted that the number of potential conformations a protein could assume is astronomically high—ranging from 1050 for small proteins to as many as 10300 for large ones. The number, 10300, is incomprehensible and amazingly, is larger than the total number of atoms in the entire universe! Yet, inside the cell, proteins often fold into their functional states in less than a few seconds. Considering the exponentially large number of possibilities, predicting the correct 3D conformation of a protein from a simple amino acid sequence is a major computational challenge!
Breakthrough with AlphaFold
DeepMind’s AlphaFold system achieved a breakthrough in 2021, reaching accuracy levels comparable to experimental methods (DeepMind, 2021). At the CASP (Critical Assessment of Protein Structure Prediction) competition, AlphaFold2 attained a Global Distance Test (GDT) score of 87, a significant leap from the 58 scored by its predecessor in 2018 (CASP, 2021).
AlphaFold2 models folded proteins as spatial graphs, where nodes represent amino acid residues, and edges denote spatial proximity. Its attention-based neural network predicts structures by processing two key components: the Evoformer Block and the Structure Module.
The Evoformer integrates sequence and evolutionary relationships by employing Multiple Sequence Alignment (MSA) Representation, essentially capturing variability and homogeneity in amino acid sequences across species. Pair Representation encodes spatial and contact relationships between residue pairs, understanding their 3D interactions. Through iterative refinements, the Evoformer improves structural accuracy with each layer.
The Structure Module inputs refined pair representations, which predicts the 3D coordinates of protein atoms. The structure module is composed of two key elements termed End-to-End Training, which ensures cohesion between the Evoformer, the Structure Module, and Loss Function, allowing local accuracy (bond lengths and angles) and global accuracy (overall folding). By leveraging evolutionary data and iterative refinements, AlphaFold2 delivers highly accurate protein structure predictions.
Significance and Applications
AlphaFold2 highlights the potential that AI holds to reshape the dynamics of science and technology. Needless to say, the program's excellent performance of protein structure prediction, with the release of more than 200 million protein structures, is reshaping structural biology. AlphaFold2 and its predicted protein structures will enable researchers to solve problems that were previously thought to be highly challenging within structural biology, drug discovery, and protein design.
Undoubtedly, structural biology is the area most impacted by AlphaFold2. While some argue that AlphaFold2 may make structural biologists unemployed, it's more likely that AlphaFold2 and its predicted structures will accelerate the processes used by these researchers to study structural biology, including X-ray crystallography, cryo-EM, and NMR spectroscopy. By reducing the time and cost of experimental determination, AlphaFold2 has the potential to accelerate innovation in biology and medicine.
References
Branden, C. & Tooze, J., 1999. Introduction to Protein Structure. New York: Garland Science.
DeepMind, 2021. AlphaFold: Solving the Protein Folding Problem. [online] Available at: alphafold.deepmind.com
Levinthal, C., 1969. How to Fold Graciously. Mossbauer Spectroscopy in Biological Systems: Proceedings of a Meeting Held at Allerton House, Monticello, Illinois.
CASP, 2021. Critical Assessment of Protein Structure Prediction. [online] Available at: predictioncenter.org
Yang, Z., Zeng, X., Zhao, Y. et al. AlphaFold2 and its applications in the fields of biology and medicine. Sig Transduct Target Ther 8, 115 (2023). https://doi.org/10.1038/s41392-023-01381-z