AlphaFold

Last updated

AlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure. [1] The program is designed as a deep learning system. [2]

Contents

AlphaFold software has had three major versions. A team of researchers that used AlphaFold 1 (2018) placed first in the overall rankings of the 13th Critical Assessment of Structure Prediction (CASP) in December 2018. The program was particularly successful at predicting the most accurate structure for targets rated as the most difficult by the competition organisers, where no existing template structures were available from proteins with a partially similar sequence. A team that used AlphaFold 2 (2020) repeated the placement in the CASP14 competition in November 2020. [3] The team achieved a level of accuracy much higher than any other group. [2] [4] It scored above 90 for around two-thirds of the proteins in CASP's global distance test (GDT), a test that measures the degree to which a computational program predicted structure is similar to the lab experiment determined structure, with 100 being a complete match, within the distance cutoff used for calculating GDT. [2] [5]

AlphaFold 2's results at CASP14 were described as "astounding" [6] and "transformational". [7] Some researchers noted that the accuracy is not high enough for a third of its predictions, and that it does not reveal the mechanism or rules of protein folding for the protein folding problem to be considered solved. [8] [9] Nevertheless, there has been widespread respect for the technical achievement, and analysis suggests that AlphaFold 2 is accurate enough to predict even single-mutation effects. [10] On 15 July 2021 the AlphaFold 2 paper was published in Nature as an advance access publication alongside open source software and a searchable database of species proteomes. [11] [12] [13]

AlphaFold 3 was announced on 8 May 2024. It can predict the structure of complexes created by proteins with DNA, RNA, various ligands, and ions. [14]

Background

Amino-acid chains, known as polypeptides, fold to form a protein. Protein folding figure.png
Amino-acid chains, known as polypeptides, fold to form a protein.

Proteins consist of chains of amino acids which spontaneously fold, in a process called protein folding, to form the three dimensional (3-D) structures of the proteins. The 3-D structure is crucial to understanding the biological function of the protein.

Protein structures can be determined experimentally through techniques such as X-ray crystallography, cryo-electron microscopy and nuclear magnetic resonance, which are both expensive and time-consuming. [15] Such efforts, using the experimental methods, have identified the structures of about 170,000 proteins over the last 60 years, while there are over 200 million known proteins across all life forms. [5]

Over the years, researchers have applied numerous computational methods to resolve the issue of ab initio protein structure prediction, but their accuracy has not been close to experimental techniques except for small simple proteins, thus limiting their value. CASP, which was launched in 1994 to challenge the scientific community to produce their best protein structure predictions, found that GDT scores of only about 40 out of 100 can be achieved for the most difficult proteins by 2016. [5] AlphaFold started competing in the 2018 CASP using an artificial intelligence (AI) deep learning technique. [15]

Algorithm

DeepMind is known to have trained the program on over 170,000 proteins from a public repository of protein sequences and structures. The program uses a form of attention network, a deep learning technique that focuses on having the AI identify parts of a larger problem, then piece it together to obtain the overall solution. [2] The overall training was conducted on processing power between 100 and 200 GPUs. [2] Training the system on this hardware took "a few weeks", after which the program would take "a matter of days" to converge for each structure. [16]

AlphaFold 1, 2018

AlphaFold 1 (2018) was built on work developed by various teams in the 2010s, work that looked at the large databanks of related DNA sequences now available from many different organisms (most without known 3D structures), to try to find changes at different residues that appeared to be correlated, even though the residues were not consecutive in the main chain. Such correlations suggest that the residues may be close to each other physically, even though not close in the sequence, allowing a contact map to be estimated. Building on recent work prior to 2018, AlphaFold 1 extended this to estimate a probability distribution for just how close the residues might be likely to be—turning the contact map into a likely distance map. It also used more advanced learning methods than previously to develop the inference. [17] [18]

AlphaFold 2, 2020

AlphaFold 2 performance, experiments, and architecture AlphaFold 2.png
AlphaFold 2 performance, experiments, and architecture
Architectural details of AlphaFold 2 Architectural details of AlphaFold 2.png
Architectural details of AlphaFold 2

The 2020 version of the program (AlphaFold 2, 2020) is significantly different from the original version that won CASP 13 in 2018, according to the team at DeepMind. [20] [21]

The DeepMind team had identified that its previous approach, combining local physics with a guide potential derived from pattern recognition, had a tendency to over-account for interactions between residues that were nearby in the sequence compared to interactions between residues further apart along the chain. As a result, AlphaFold 1 had a tendency to prefer models with slightly more secondary structure (alpha helices and beta sheets) than was the case in reality (a form of overfitting). [22]

The software design used in AlphaFold 1 contained a number of modules, each trained separately, that were used to produce the guide potential that was then combined with the physics-based energy potential. AlphaFold 2 replaced this with a system of sub-networks coupled together into a single differentiable end-to-end model, based entirely on pattern recognition, which was trained in an integrated way as a single integrated structure. [21] [23] Local physics, in the form of energy refinement based on the AMBER model, is applied only as a final refinement step once the neural network prediction has converged, and only slightly adjusts the predicted structure. [22]

A key part of the 2020 system are two modules, believed to be based on a transformer design, which are used to progressively refine a vector of information for each relationship (or "edge" in graph-theory terminology) between an amino acid residue of the protein and another amino acid residue (these relationships are represented by the array shown in green); and between each amino acid position and each different sequences in the input sequence alignment (these relationships are represented by the array shown in red). [23] Internally these refinement transformations contain layers that have the effect of bringing relevant data together and filtering out irrelevant data (the "attention mechanism") for these relationships, in a context-dependent way, learnt from training data. These transformations are iterated, the updated information output by one step becoming the input of the next, with the sharpened residue/residue information feeding into the update of the residue/sequence information, and then the improved residue/sequence information feeding into the update of the residue/residue information. [23] As the iteration progresses, according to one report, the "attention algorithm ... mimics the way a person might assemble a jigsaw puzzle: first connecting pieces in small clumps—in this case clusters of amino acids—and then searching for ways to join the clumps in a larger whole." [5] [ needs update ]

The output of these iterations then informs the final structure prediction module, [23] which also uses transformers, [24] and is itself then iterated. In an example presented by DeepMind, the structure prediction module achieved a correct topology for the target protein on its first iteration, scored as having a GDT_TS of 78, but with a large number (90%) of stereochemical violations – i.e. unphysical bond angles or lengths. With subsequent iterations the number of stereochemical violations fell. By the third iteration the GDT_TS of the prediction was approaching 90, and by the eighth iteration the number of stereochemical violations was approaching zero. [25]

The AlphaFold team stated in November 2020 that they believe AlphaFold can be further developed, with room for further improvements in accuracy. [20] A recent analysis suggests that the current version of AlphaFold2 is already accurate enough to predict even single-mutation effects. [10]

The training data was originally restricted to single peptide chains. However, the October 2021 update, named AlphaFold-Multimer, included protein complexes in its training data. DeepMind stated this update succeeded about 70% of the time at accurately predicting protein-protein interactions. [26]

AlphaFold 3, 2024

Announced on 8 May 2024, AlphaFold 3 was co-developed by Google DeepMind and Isomorphic Labs, both subsidiaries of Alphabet. AlphaFold 3 is not limited to proteins, as it can also predict the structure and interactions of DNA, RNA and of some ligands and ions. [27] [14]

AlphaFold 3 introduces the "Pairformer", a deep learning architecture inspired from the transformer, considered similar but simpler than the Evoformer introduced with AlphaFold 2. [28] [29] The raw predictions from the Pairformer module are passed to a diffusion model, which starts with a cloud of atoms and uses these predictions to iteratively progress towards a 3D depiction of the molecular structure. [14]

The AlphaFold server was created to provide free access to AlphaFold 3 for non-commercial research. [30] The ability to predict how proteins interact with molecules typically found in drugs (such as ligands or antibodies) is expected to significantly accelerate drug discovery. Isomorphic Labs declared in May 2024 that it was already using AlphaFold 3 along with other AI models to automate the process of drug discovery. [31]

Competitions

Results achieved for protein prediction by the best reconstructions in the CASP 2018 competition (small circles) and CASP 2020 competition (large circles), compared with results achieved in previous years.
The crimson trend-line shows how a handful of models including AlphaFold 1 achieved a significant step-change in 2018 over the rate of progress that had previously been achieved, particularly in respect of the protein sequences considered the most difficult to predict.
(Qualitative improvement had been made in earlier years, but it is only as changes bring structures within 8 A of their experimental positions that they start to affect the CASP GDS-TS measure).
The orange trend-line shows that by 2020 online prediction servers had been able to learn from and match this performance, while the best other groups (green curve) had on average been able to make some improvements on it. However, the black trend curve shows the degree to which AlphaFold 2 had surpassed this again in 2020, across the board.
The detailed spread of data points indicates the degree of consistency or variation achieved by AlphaFold. Outliers represent the handful of sequences for which it did not make such a successful prediction. CASP results 2020.png
Results achieved for protein prediction by the best reconstructions in the CASP 2018 competition (small circles) and CASP 2020 competition (large circles), compared with results achieved in previous years.
The crimson trend-line shows how a handful of models including AlphaFold 1 achieved a significant step-change in 2018 over the rate of progress that had previously been achieved, particularly in respect of the protein sequences considered the most difficult to predict.
(Qualitative improvement had been made in earlier years, but it is only as changes bring structures within 8 Å of their experimental positions that they start to affect the CASP GDS-TS measure).
The orange trend-line shows that by 2020 online prediction servers had been able to learn from and match this performance, while the best other groups (green curve) had on average been able to make some improvements on it. However, the black trend curve shows the degree to which AlphaFold 2 had surpassed this again in 2020, across the board.
The detailed spread of data points indicates the degree of consistency or variation achieved by AlphaFold. Outliers represent the handful of sequences for which it did not make such a successful prediction.

CASP13

In December 2018, DeepMind's AlphaFold placed first in the overall rankings of the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP). [32] [33]

The program was particularly successfully predicting the most accurate structure for targets rated as the most difficult by the competition organisers, where no existing template structures were available from proteins with a partially similar sequence. AlphaFold gave the best prediction for 25 out of 43 protein targets in this class, [33] [34] [35] achieving a median score of 58.9 on the CASP's global distance test (GDT) score, ahead of 52.5 and 52.4 by the two next best-placed teams, [36] who were also using deep learning to estimate contact distances. [37] [38] Overall, across all targets, the program achieved a GDT score of 68.5. [39]

In January 2020, implementations and illustrative code of AlphaFold 1 was released open-source on GitHub. [40] [15] but, as stated in the "Read Me" file on that website: "This code can't be used to predict structure of an arbitrary protein sequence. It can be used to predict structure only on the CASP13 dataset (links below). The feature generation code is tightly coupled to our internal infrastructure as well as external tools, hence we are unable to open-source it." Therefore, in essence, the code deposited is not suitable for general use but only for the CASP13 proteins. The company has not announced plans to make their code publicly available as of 5 March 2021.

CASP14

In November 2020, DeepMind's new version, AlphaFold 2, won CASP14. [16] [41] Overall, AlphaFold 2 made the best prediction for 88 out of the 97 targets. [6]

On the competition's preferred global distance test (GDT) measure of accuracy, the program achieved a median score of 92.4 (out of 100), meaning that more than half of its predictions were scored at better than 92.4% for having their atoms in more-or-less the right place, [42] [43] a level of accuracy reported to be comparable to experimental techniques like X-ray crystallography. [20] [7] [39] In 2018 AlphaFold 1 had only reached this level of accuracy in two of all of its predictions. [6] 88% of predictions in the 2020 competition had a GDT_TS score of more than 80. On the group of targets classed as the most difficult, AlphaFold 2 achieved a median score of 87.

Measured by the root-mean-square deviation (RMS-D) of the placement of the alpha-carbon atoms of the protein backbone chain, which tends to be dominated by the performance of the worst-fitted outliers, 88% of AlphaFold 2's predictions had an RMS deviation of less than 4 Å for the set of overlapped C-alpha atoms. [6] 76% of predictions achieved better than 3 Å, and 46% had a C-alpha atom RMS accuracy better than 2 Å, [6] with a median RMS deviation in its predictions of 2.1 Å for a set of overlapped CA atoms. [6] AlphaFold 2 also achieved an accuracy in modelling surface side chains described as "really really extraordinary".

To additionally verify AlphaFold-2 the conference organisers approached four leading experimental groups for structures they were finding particularly challenging and had been unable to determine. In all four cases the three-dimensional models produced by AlphaFold 2 were sufficiently accurate to determine structures of these proteins by molecular replacement. These included target T1100 (Af1503), a small membrane protein studied by experimentalists for ten years. [5]

Of the three structures that AlphaFold 2 had the least success in predicting, two had been obtained by protein NMR methods, which define protein structure directly in aqueous solution, whereas AlphaFold was mostly trained on protein structures in crystals. The third exists in nature as a multidomain complex consisting of 52 identical copies of the same domain, a situation AlphaFold was not programmed to consider. For all targets with a single domain, excluding only one very large protein and the two structures determined by NMR, AlphaFold 2 achieved a GDT_TS score of over 80.

CASP15

In 2022 DeepMind did not enter CASP15, but most of the entrants used AlphaFold or tools incorporating AlphaFold. [44]

Reception

AlphaFold 2 scoring more than 90 in CASP's global distance test (GDT) is considered a significant achievement in computational biology [5] and great progress towards a decades-old grand challenge of biology. [7] Nobel Prize winner and structural biologist Venki Ramakrishnan called the result "a stunning advance on the protein folding problem", [5] adding that "It has occurred decades before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research." [16]

Propelled by press releases from CASP and DeepMind, [45] [16] AlphaFold 2's success received wide media attention. [46] As well as news pieces in the specialist science press, such as Nature , [7] Science , [5] MIT Technology Review , [2] and New Scientist , [47] [48] the story was widely covered by major national newspapers, [49] [50] [51] [52] . A frequent theme was that ability to predict protein structures accurately based on the constituent amino acid sequence is expected to have a wide variety of benefits in the life sciences space including accelerating advanced drug discovery and enabling better understanding of diseases. [7] [53] Some have noted that even a perfect answer to the protein prediction problem would still leave questions about the protein folding problem—understanding in detail how the folding process actually occurs in nature (and how sometimes they can also misfold). [54]

In 2023, Demis Hassabis and John Jumper won the Breakthrough Prize in Life Sciences [55] as well as the Albert Lasker Award for Basic Medical Research for their management of the AlphaFold project. [56]

Source code

The open access to source code of several AlphaFold versions (excluding AlphaFold 3) has been provided by DeepMind after requests from the scientific community. [57] [58] [59]

Database of protein models generated by AlphaFold

AlphaFold Protein Structure Database
Content
Data types
captured
protein structure prediction
Organisms all UniProt proteomes
Contact
Research center EMBL-EBI
Primary citation [11]
Access
Website https://www.alphafold.ebi.ac.uk/
Download URLyes
Tools
Web yes
Miscellaneous
License CC-BY 4.0
Curation policyautomatic

The AlphaFold Protein Structure Database was launched on July 22, 2021, as a joint effort between AlphaFold and EMBL-EBI. At launch the database contains AlphaFold-predicted models of protein structures of nearly the full UniProt proteome of humans and 20 model organisms, amounting to over 365,000 proteins. The database does not include proteins with fewer than 16 or more than 2700 amino acid residues, [60] but for humans they are available in the whole batch file. [61] AlphaFold planned to add more sequences to the collection, the initial goal (as of beginning of 2022) being to cover most of the UniRef90 set of more than 100 million proteins. As of May 15, 2022, 992,316 predictions were available. [62]

In July 2021, UniProt-KB and InterPro [63] has been updated to show AlphaFold predictions when available. [64]

On July 28, 2022, the team uploaded to the database the structures of around 200 million proteins from 1 million species, covering nearly every known protein on the planet. [65]

Limitations

The AlphaFold DB uses a monomeric model similar to the CASP14 version. As a result, many of the same limitations are expected: [66]

Applications

SARS-CoV-2

AlphaFold has been used to predict structures of proteins of SARS-CoV-2, the causative agent of COVID-19. The structures of these proteins were pending experimental detection in early 2020. [73] [7] Results were examined by the scientists at the Francis Crick Institute in the United Kingdom before release into the larger research community. The team also confirmed accurate prediction against the experimentally determined SARS-CoV-2 spike protein that was shared in the Protein Data Bank, an international open-access database, before releasing the computationally determined structures of the under-studied protein molecules. [74] The team acknowledged that although these protein structures might not be the subject of ongoing therapeutical research efforts, they will add to the community's understanding of the SARS-CoV-2 virus. [74] Specifically, AlphaFold 2's prediction of the structure of the ORF3a protein was very similar to the structure determined by researchers at University of California, Berkeley using cryo-electron microscopy. This specific protein is believed to assist the virus in breaking out of the host cell once it replicates. This protein is also believed to play a role in triggering the inflammatory response to the infection. [75]

Alternative methods

Alternatives to AlphaFold have emerged, such as ESMFold, OmegaFold, RoseTTAFold, IntFOLD, and RaptorX. [76]

Published works

See also

Related Research Articles

<span class="mw-page-title-main">Protein secondary structure</span> General three-dimensional form of local segments of proteins

Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.

<span class="mw-page-title-main">Protein tertiary structure</span> Three dimensional shape of a protein

Protein tertiary structure is the three-dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains and the backbone may interact and bond in a number of ways. The interactions and bonds of side chains within a particular protein determine its tertiary structure. The protein tertiary structure is defined by its atomic coordinates. These coordinates may refer either to a protein domain or to the entire tertiary structure. A number of these structures may bind to each other, forming a quaternary structure.

<span class="mw-page-title-main">Protein folding</span> Change of a linear protein chain to a 3D structure

Protein folding is the physical process by which a protein, after synthesis by a ribosome as a linear chain of amino acids, changes from an unstable random coil into a more ordered three-dimensional structure. This structure permits the protein to become biologically functional.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; it is important in medicine and biotechnology.

<span class="mw-page-title-main">CASP</span> Protein structure prediction challenge

Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP provides research groups with an opportunity to objectively test their structure prediction methods and delivers an independent assessment of the state of the art in protein structure modeling to the research community and software users. Even though the primary goal of CASP is to help advance the methods of identifying protein three-dimensional structure from its amino acid sequence many view the experiment more as a “world championship” in this field of science. More than 100 research groups from all over the world participate in CASP on a regular basis and it is not uncommon for entire groups to suspend their other research for months while they focus on getting their servers ready for the experiment and on performing the detailed predictions.

<span class="mw-page-title-main">Rosetta@home</span> BOINC based volunteer computing project researching protein folding

Rosetta@home is a volunteer computing project researching protein structure prediction on the Berkeley Open Infrastructure for Network Computing (BOINC) platform, run by the Baker lab. Rosetta@home aims to predict protein–protein docking and design new proteins with the help of about fifty-five thousand active volunteered computers processing at over 487,946 GigaFLOPS on average as of September 19, 2020. Foldit, a Rosetta@home videogame, aims to reach these goals with a crowdsourcing approach. Though much of the project is oriented toward basic research to improve the accuracy and robustness of proteomics methods, Rosetta@home also does applied research on malaria, Alzheimer's disease, and other pathologies.

<span class="mw-page-title-main">Demis Hassabis</span> British entrepreneur and artificial intelligence researcher (born 1976)

Sir Demis Hassabis is a British computer scientist, artificial intelligence researcher and entrepreneur. In his early career he was a video game AI programmer and designer, and an expert board games player. He is the chief executive officer and co-founder of DeepMind and Isomorphic Labs, and a UK Government AI Advisor.

<span class="mw-page-title-main">Homology modeling</span> Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence. It has been seen that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

The global distance test (GDT), also written as GDT_TS to represent "total score", is a measure of similarity between two protein structures with known amino acid correspondences but different tertiary structures. It is most commonly used to compare the results of protein structure prediction to the experimentally determined structure as measured by X-ray crystallography, protein NMR, or, increasingly, cryoelectron microscopy. The metric was developed by Adam Zemla at Lawrence Livermore National Laboratory and originally implemented in the Local-Global Alignment (LGA) program. It is intended as a more accurate measurement than the common root-mean-square deviation (RMSD) metric - which is sensitive to outlier regions created, for example, by poor modeling of individual loop regions in a structure that is otherwise reasonably accurate. The conventional GDT_TS score is computed over the alpha carbon atoms and is reported as a percentage, ranging from 0 to 100. In general, the higher the GDT_TS score, the more closely a model approximates a given reference structure.

<span class="mw-page-title-main">David Baker (biochemist)</span> American biochemist and computational biologist

David Baker is an American biochemist and computational biologist who has pioneered methods to predict and design the three-dimensional structures of proteins. He is the Henrietta and Aubrey Davis Endowed Professor in Biochemistry and an adjunct professor of genome sciences, bioengineering, chemical engineering, computer science, and physics at the University of Washington. He serves as the director of the Rosetta Commons, a consortium of labs and researchers that develop biomolecular structure prediction and design software. The problem of protein structure prediction to which Baker has contributed significantly has now been largely solved by DeepMind using artificial intelligence. Baker is a Howard Hughes Medical Institute investigator and a member of the United States National Academy of Sciences. He is also the director of the University of Washington's Institute for Protein Design.

In computational biology, de novo protein structure prediction refers to an algorithmic process by which protein tertiary structure is predicted from its amino acid primary sequence. The problem itself has occupied leading scientists for decades while still remaining unsolved. According to Science, the problem remains one of the top 125 outstanding issues in modern science. At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins within 1.5 angstroms over the entire structure.

Phyre and Phyre2 are free web-based services for protein structure prediction. Phyre is among the most popular methods for protein structure prediction having been cited over 1500 times. Like other remote homology recognition techniques, it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods. Its development is funded by the Biotechnology and Biological Sciences Research Council.

RaptorX is a software and web server for protein structure and function prediction that is free for non-commercial use. RaptorX is among the most popular methods for protein structure prediction. Like other remote homology recognition/protein threading techniques, RaptorX is able to regularly generate reliable protein models when the widely used PSI-BLAST cannot. However, RaptorX is also significantly different from those profile-based methods in that RaptorX excels at modeling of protein sequences without a large number of sequence homologs by exploiting structure information. RaptorX Server has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods.

<span class="mw-page-title-main">I-TASSER</span>

I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.

Jianlin (Jack) Cheng is the William and Nancy Thompson Missouri Distinguished Professor in the Electrical Engineering and Computer Science (EECS) Department at the University of Missouri, Columbia. He earned his PhD from the University of California-Irvine in 2006, his MS degree from Utah State University in 2001, and his BS degree from Huazhong University of Science and Technology in 1994.

<span class="mw-page-title-main">Google DeepMind</span> Artificial intelligence division

DeepMind Technologies Limited, doing business as Google DeepMind, is a British-American artificial intelligence research laboratory which serves as a subsidiary of Google. Founded in the UK in 2010, it was acquired by Google in 2014 and merged with Google AI's Google Brain division to become Google DeepMind in April 2023. The company is based in London, with research centres in Canada, France, Germany, and the United States.

Molecular Operating Environment (MOE) is a drug discovery software platform that integrates visualization, modeling and simulations, as well as methodology development, in one package. MOE scientific applications are used by biologists, medicinal chemists and computational chemists in pharmaceutical, biotechnology and academic research. MOE runs on Windows, Linux, Unix, and macOS. Main application areas in MOE include structure-based design, fragment-based design, ligand-based design, pharmacophore discovery, medicinal chemistry applications, biologics applications, structural biology and bioinformatics, protein and antibody modeling, molecular modeling and simulations, virtual screening, cheminformatics & QSAR. The Scientific Vector Language (SVL) is the built-in command, scripting and application development language of MOE.

John Michael Jumper is an American senior research scientist at DeepMind Technologies. Jumper and his colleagues created AlphaFold, an artificial intelligence (AI) model to predict protein structures from their amino acid sequence with high accuracy. Jumper has stated that the AlphaFold team plans to release 100 million protein structures. The scientific journal Nature included Jumper as one of the ten "people who mattered" in science in their annual listing of Nature's 10 in 2021.

Nir1 or membrane-associated phosphatidylinositol transfer protein 3 (PITPNM3) is a mammalian protein that localizes to endoplasmic reticulum (ER) and plasma membrane (PM) membrane contact sites (MCS) and aids the transfer of phosphatidylinositol between these two membranes, potentially by recruiting additional proteins to the ER-PM MCS. It is encoded by the gene PITPNM3.

<span class="mw-page-title-main">Predicted Aligned Error</span> Predicted output by AlphaFold indicating expected position error of protein structures

The Predicted Aligned Error (PAE) is a quantitative output produced by AlphaFold, a protein structure prediction system developed by DeepMind. PAE estimates the expected positional error for each residue in a predicted protein structure if it were aligned to a corresponding residue in the true protein structure. This measurement helps scientists assess the confidence in the relative positions and orientations of different parts of the predicted protein model.

References

  1. "AlphaFold". Deepmind. Retrieved 30 November 2020.
  2. 1 2 3 4 5 6 "DeepMind's protein-folding AI has solved a 50-year-old grand challenge of biology". MIT Technology Review. Retrieved 2020-11-30.
  3. Shead, Sam (2020-11-30). "DeepMind solves 50-year-old 'grand challenge' with protein folding A.I." CNBC. Retrieved 2020-11-30.
  4. Stoddart, Charlotte (1 March 2022). "Structural biology: How proteins got their close-up". Knowable Magazine. doi: 10.1146/knowable-022822-1 . S2CID   247206999 . Retrieved 25 March 2022.
  5. 1 2 3 4 5 6 7 8 Robert F. Service, 'The game has changed.' AI triumphs at solving protein structures, Science , 30 November 2020
  6. 1 2 3 4 5 6 Mohammed AlQuraishi, CASP14 scores just came out and they're astounding, Twitter, 30 November 2020.
  7. 1 2 3 4 5 6 Callaway, Ewen (2020-11-30). "'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures". Nature. 588 (7837): 203–204. Bibcode:2020Natur.588..203C. doi:10.1038/d41586-020-03348-4. PMID   33257889. S2CID   227243204.
  8. Stephen Curry, No, DeepMind has not solved protein folding, Reciprocal Space (blog), 2 December 2020
  9. Balls, Phillip (9 December 2020). "Behind the screens of AlphaFold". Chemistry World.
  10. 1 2 McBride, John M.; Polev, Konstantin; Abdirasulov, Amirbek; Reinharz, Vladimir; Grzybowski, Bartosz A.; Tlusty, Tsvi (2023-11-20). "AlphaFold2 Can Predict Single-Mutation Effects". Physical Review Letters. 131 (21): 218401. arXiv: 2204.06860 . Bibcode:2023PhRvL.131u8401M. doi:10.1103/PhysRevLett.131.218401. ISSN   0031-9007. PMID   38072605.
  11. 1 2 Jumper, John; Evans, Richard; Pritzel, Alexander; Green, Tim; Figurnov, Michael; Ronneberger, Olaf; Tunyasuvunakool, Kathryn; Bates, Russ; Žídek, Augustin; Potapenko, Anna; Bridgland, Alex; Meyer, Clemens; Kohl, Simon A A; Ballard, Andrew J; Cowie, Andrew; Romera-Paredes, Bernardino; Nikolov, Stanislav; Jain, Rishub; Adler, Jonas; Back, Trevor; Petersen, Stig; Reiman, David; Clancy, Ellen; Zielinski, Michal; Steinegger, Martin; Pacholska, Michalina; Berghammer, Tamas; Bodenstein, Sebastian; Silver, David; Vinyals, Oriol; Senior, Andrew W; Kavukcuoglu, Koray; Kohli, Pushmeet; Hassabis, Demis (2021-07-15). "Highly accurate protein structure prediction with AlphaFold". Nature. 596 (7873): 583–589. Bibcode:2021Natur.596..583J. doi: 10.1038/s41586-021-03819-2 . PMC   8371605 . PMID   34265844.
  12. "GitHub - deepmind/alphafold: Open source code for AlphaFold". GitHub. Retrieved 2021-07-24.
  13. "AlphaFold Protein Structure Database". alphafold.ebi.ac.uk. Retrieved 2021-07-24.
  14. 1 2 3 "AlphaFold 3 predicts the structure and interactions of all of life's molecules". Google. 2024-05-08. Retrieved 2024-05-09.
  15. 1 2 3 "AlphaFold: Using AI for scientific discovery". Deepmind. 15 January 2020. Retrieved 2020-11-30.
  16. 1 2 3 4 "AlphaFold: a solution to a 50-year-old grand challenge in biology". Deepmind. 30 November 2020. Retrieved 30 November 2020.
  17. Mohammed AlQuraishi (May 2019), AlphaFold at CASP13, Bioinformatics, 35(22), 4862–4865 doi : 10.1093/bioinformatics/btz422. See also Mohammed AlQuraishi (December 9, 2018), AlphaFold @ CASP13: "What just happened?" (blog post).
    Mohammed AlQuraishi (15 January 2020), A watershed moment for protein structure prediction, Nature 577, 627–628 doi : 10.1038/d41586-019-03951-0
  18. AlphaFold: Machine learning for protein structure prediction, Foldit, 31 January 2020
  19. 1 2 Jumper, John; et al. (August 2021). "Highly accurate protein structure prediction with AlphaFold". Nature. 596 (7873): 583–589. Bibcode:2021Natur.596..583J. doi:10.1038/s41586-021-03819-2. ISSN   1476-4687. PMC   8371605 . PMID   34265844.
  20. 1 2 3 "DeepMind is answering one of biology's biggest challenges". The Economist. 2020-11-30. ISSN   0013-0613 . Retrieved 2020-11-30.
  21. 1 2 Jeremy Kahn, Lessons from DeepMind's breakthrough in protein-folding A.I., Fortune , 1 December 2020
  22. 1 2 John Jumper et al., conference abstract (December 2020)
  23. 1 2 3 4 See block diagram. Also John Jumper et al. (1 December 2020), AlphaFold 2 presentation, slide 10
  24. The structure module is stated to use a "3-d equivariant transformer architecture" (John Jumper et al. (1 December 2020), AlphaFold 2 presentation, slide 12).
    One design for a transformer network with SE(3)-equivariance was proposed in Fabian Fuchs et al SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks, NeurIPS 2020; also website. It is not known how similar this may or may not be to what was used in AlphaFold.
    See also the blog post by AlQuaraishi on this, or the more detailed post by Fabian Fuchs
  25. John Jumper et al. (1 December 2020), AlphaFold 2 presentation, slides 12 to 20
  26. Callaway, Ewen (13 April 2022). "What's next for AlphaFold and the AI protein-folding revolution". Nature. 604 (7905): 234–238. Bibcode:2022Natur.604..234C. doi: 10.1038/d41586-022-00997-5 . PMID   35418629. S2CID   248156195.
  27. Metz, Cade (2024-05-08). "Google Unveils A.I. for Predicting Behavior of Human Molecules". The New York Times. ISSN   0362-4331 . Retrieved 2024-05-09.
  28. Abramson, Josh; Adler, Jonas; Dunger, Jack; Evans, Richard; Green, Tim; Pritzel, Alexander; Ronneberger, Olaf; Willmore, Lindsay; Ballard, Andrew J.; Bambrick, Joshua; Bodenstein, Sebastian W.; Evans, David A.; Hung, Chia-Chun; O’Neill, Michael; Reiman, David (2024-05-08). "Accurate structure prediction of biomolecular interactions with AlphaFold 3". Nature: 1–3. doi: 10.1038/s41586-024-07487-w . ISSN   1476-4687.
  29. Accurate structure prediction of biomolecular interactions with AlphaFold 3, pdf of preprint of the article in Nature.
  30. A non-commercial server of AlphaFold-3
  31. Thomason, James (May 8, 2024). "Google's AlphaFold 3 AI predicts the very building blocks of life". VentureBeat.
  32. Group performance based on combined z-scores, CASP 13, December 2018. (AlphaFold = Team 043: A7D)
  33. 1 2 Sample, Ian (2 December 2018). "Google's DeepMind predicts 3D shapes of proteins". The Guardian. Retrieved 30 November 2020.
  34. "AlphaFold: Using AI for scientific discovery". Deepmind. Retrieved 30 November 2020.
  35. Singh, Arunima (2020). "Deep learning 3D structures". Nature Methods. 17 (3): 249. doi: 10.1038/s41592-020-0779-y . ISSN   1548-7105. PMID   32132733. S2CID   212403708.
  36. See CASP 13 data tables for 043 A7D, 322 Zhang, and 089 MULTICOM
  37. Wei Zheng et al,Deep-learning contact-map guided protein structure prediction in CASP13, Proteins: Structure, Function, and Bioinformatics, 87(12) 1149–1164 doi : 10.1002/prot.25792; and slides
  38. Hou, Jie; Wu, Tianqi; Cao, Renzhi; Cheng, Jianlin (2019-04-25). "Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13". Proteins: Structure, Function, and Bioinformatics. 87 (12). Wiley: 1165–1178. bioRxiv   10.1101/552422 . doi:10.1002/prot.25697. ISSN   0887-3585. PMC   6800999 . PMID   30985027.
  39. 1 2 "DeepMind Breakthrough Helps to Solve How Diseases Invade Cells". Bloomberg.com. 2020-11-30. Retrieved 2020-11-30.
  40. "deepmind/deepmind-research". GitHub. Retrieved 2020-11-30.
  41. "DeepMind's protein-folding AI has solved a 50-year-old grand challenge of biology". MIT Technology Review. Retrieved 30 November 2020.
  42. For the GDT_TS measure used, each atom in the prediction scores a quarter of a point if it is within 8 Å (0.80 nm) of the experimental position; half a point if it is within 4 Å, three-quarters of a point if it is within 2 Å, and a whole point if it is within 1 Å.
  43. To achieve a GDT_TS score of 92.5, mathematically at least 70% of the structure must be accurate to within 1 Å, and at least 85% must be accurate to within 2 Å,
  44. Callaway, Ewen (2022-12-13). "After AlphaFold: protein-folding contest seeks next big breakthrough". Nature. 613 (7942): 13–14. doi: 10.1038/d41586-022-04438-1 . PMID   36513827. S2CID   254660427.
  45. Artificial intelligence solution to a 50-year-old science challenge could 'revolutionise' medical research (press release), CASP organising committee, 30 November 2020
  46. Brigitte Nerlich, Protein folding and science communication: Between hype and humility, University of Nottingham blog, 4 December 2020
  47. Michael Le Page, DeepMind's AI biologist can decipher secrets of the machinery of life, New Scientist , 30 November 2020
  48. The predictions of DeepMind's latest AI could revolutionise medicine, New Scientist , 2 December 2020
  49. Tom Whipple, Deepmind finds biology's 'holy grail' with answer to protein problem, The Times (online), 30 November 2020.
    In all science editor Tom Whipple wrote six articles on the subject for The Times on the day the news broke. (thread).
  50. Cade Metz, London A.I. Lab Claims Breakthrough That Could Accelerate Drug Discovery, New York Times , 30 November 2020
  51. Ian Sample,DeepMind AI cracks 50-year-old problem of protein folding, The Guardian , 30 November 2020
  52. Lizzie Roberts, 'Once in a generation advance' as Google AI researchers crack 50-year-old biological challenge. Daily Telegraph , 30 November 2020
  53. Tim Hubbard, The secret of life, part 2: the solution of the protein folding problem., medium.com, 30 November 2020
  54. e.g. Greg Bowman, Protein folding and related problems remain unsolved despite AlphaFold's advance, Folding@home blog, 8 December 2020
  55. Knapp, Alex. "2023 Breakthrough Prizes Announced: Deepmind's Protein Folders Awarded $3 Million". Forbes. Retrieved 2024-05-09.
  56. Sample, Ian (2023-09-21). "Team behind AI program AlphaFold win Lasker science prize". The Guardian. ISSN   0261-3077 . Retrieved 2024-05-09.
  57. Domínguez, Nuño (2020-12-02). "La inteligencia artificial arrasa en uno de los problemas más importantes de la biología". El País (in Spanish). Retrieved 2024-05-12.
  58. Briggs, David (2020-12-04). "If Google's Alphafold2 really has solved the protein folding problem, they need to show their working". The Skeptic. Retrieved 2024-05-12.
  59. Demis Hassabis, "Brief update on some exciting progress on #AlphaFold!" (tweet), via twitter, 18 June 2021
  60. "AlphaFold Protein Structure Database". alphafold.ebi.ac.uk. Retrieved 2021-07-29.
  61. "AlphaFold Protein Structure Database". alphafold.ebi.ac.uk. Retrieved 27 July 2021.
  62. "AlphaFold Protein Structure Database". www.alphafold.ebi.ac.uk.
  63. InterPro (22 July 2021). "Alphafold Structure Predictions Available In Interpro". proteinswebteam.github.io. Retrieved 2021-07-29.
  64. "Putting the power of AlphaFold into the world's hands". Deepmind. 22 July 2022.
  65. Callaway, Ewen (2022-07-28). "'The entire protein universe': AI predicts shape of nearly every known protein". Nature. 608 (7921): 15–16. Bibcode:2022Natur.608...15C. doi: 10.1038/d41586-022-02083-2 . PMID   35902752. S2CID   251159714.
  66. 1 2 "What use cases does AlphaFold not support?". AlphaFold Protein Structure Database.
  67. AlphaFold heralds a data-driven revolution in biology and medicine, by Janet M. Thornton, Roman A. Laskowski & Neera Borkakoti, Nature Medicine, volume 12, pages 1666–1669, 12 October 2021
  68. "DeepMind's latest AI breakthrough could turbocharge drug discovery". Fast Company. ISSN   1085-9241 . Retrieved 2023-01-24.
  69. Bagdonas, Haroldas; Fogarty, Carl A.; Fadda, Elisa; Agirre, Jon (2021-10-29). "The case for post-predictional modifications in the AlphaFold Protein Structure Database" (PDF). Nature Structural & Molecular Biology. 28 (11): 869–870. doi: 10.1038/s41594-021-00680-9 . ISSN   1545-9985. PMID   34716446. S2CID   240228913.
  70. An, Hyun Joo; Froehlich, John W; Lebrilla, Carlito B (2009-10-01). "Determination of glycosylation sites and site-specific heterogeneity in glycoproteins". Current Opinion in Chemical Biology. Analytical Techniques/Mechanisms. 13 (4): 421–426. doi:10.1016/j.cbpa.2009.07.022. ISSN   1367-5931. PMC   2749913 . PMID   19700364.
  71. Hekkelman, Maarten L.; de Vries, Ida; Joosten, Robbie P.; Perrakis, Anastassis (February 2023). "AlphaFill: enriching AlphaFold models with ligands and cofactors". Nature Methods. 20 (2): 205–213. doi: 10.1038/s41592-022-01685-y . PMC   9911346 . PMID   36424442.
  72. Dabrowski-Tumanski, Pawel; Stasiak, Andrzej (7 November 2023). "AlphaFold Blindness to Topological Barriers Affects Its Ability to Correctly Predict Proteins' Topology". Molecules. 28 (22): 7462. doi: 10.3390/molecules28227462 . PMC   10672856 . PMID   38005184.
  73. "AI Can Help Scientists Find a Covid-19 Vaccine". Wired. ISSN   1059-1028 . Retrieved 2020-12-01.
  74. 1 2 "Computational predictions of protein structures associated with COVID-19". Deepmind. 4 August 2020. Retrieved 2020-12-01.
  75. "How DeepMind's new protein-folding A.I. is already helping to combat the coronavirus pandemic". Fortune. Retrieved 2020-12-01.
  76. Naik, Amit Raja (2022-08-24). "Protein Wars: It's ESMFold vs AlphaFold". Analytics India Magazine. Retrieved 2024-05-05.

Further reading