Foldit

Last updated
Foldit
Developer(s) University of Washington, Center for Game Science, [1] Department of Biochemistry [2]
Initial releaseMay 8, 2008;15 years ago (2008-05-08)
Preview release
Operating system Cross-platform: Windows, macOS, Linux
Size ≈434 MB
Available in13 languages
List of languages
Czech, Dutch, English, French, German, Hebrew, Indonesian, Italian, Polish, Romanian, Russian, Spanish, Scientist
Type Puzzle video game, protein folding
License Proprietary freeware for academic and non-profit use
Website fold.it

Foldit is an online puzzle video game about protein folding. It is part of an experimental research project developed by the University of Washington, Center for Game Science, in collaboration with the UW Department of Biochemistry. The objective of Foldit is to fold the structures of selected proteins as perfectly as possible, using tools provided in the game. The highest scoring solutions are analyzed by researchers, who determine whether or not there is a native structural configuration (native state) that can be applied to relevant proteins in the real world. Scientists can then use these solutions to target and eradicate diseases and create biological innovations. A 2010 paper in the science journal Nature credited Foldit's 57,000 players with providing useful results that matched or outperformed algorithmically computed solutions. [3] [4]

Contents

History

Rosetta

Prof. David Baker, a protein research scientist at the University of Washington, founded the Foldit project. Seth Cooper was the lead game designer. Before starting the project, Baker and his laboratory coworkers relied on another research project named Rosetta [5] to predict the native structures of various proteins using special computer protein structure prediction algorithms. Rosetta was eventually extended to use the power of distributed computing: The Rosetta@home program was made available for public download, and displayed its protein-folding progress as a screensaver. Its results were sent to a central server for verification. [6]

Some Rosetta@home users became frustrated when they saw ways to solve protein structures, but could not interact with the program. Hoping that humans could improve the computers' attempts to solve protein structures, Baker approached David Salesin and Zoran Popović, computer science professors at the same university, to help conceptualize and build an interactive program, a video game, that would appeal to the public and help efforts to find native protein structures. [7] [8] [9]

Foldit

Many of the same people who created Rosetta@home worked on Foldit. The public beta version was released in May 2008 [10] and has 240,000 registered players. [11]

Since 2008, Foldit has participated in Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments, submitting its best solutions to targets based on unknown protein structures. CASP is an international program to assess methods of protein structure prediction and identify those that are most productive.

Goals

Protein structure prediction is important in several fields of science, including bioinformatics, molecular biology, and medicine. Identifying natural proteins' structural configurations enables scientists to understand them better. This can lead to creating novel proteins by design, advances in treating disease, and solutions for other real-world problems such as invasive species, waste, and pollution.

The process by which living beings create the primary structure of proteins, protein biosynthesis, is reasonably well understood, as is the means by which proteins are encoded as DNA. However, determining how a given protein's primary structure becomes a functioning three-dimensional structure, how the molecule folds, is more difficult. The general process is understood, but predicting a protein's eventual, functioning structure is computationally demanding. [12] [13]

Methods

Similarly to Rosetta@home, Foldit is a means to discover native protein structures faster through distributed computing. However, Foldit has a greater emphasis on community collaboration through its forums, where users can collaborate on certain folds. [3] Furthermore, Foldit's crowdsourced approach places a greater emphasis on the user. [6] Foldit's virtual interaction and gamification create a unique and innovative environment with the potential to greatly advance protein folding research.

Virtual interaction

Foldit attempts to apply the human brain's three-dimensional pattern matching and spatial reasoning abilities to help solve the problem of protein structure prediction. 2016 puzzles are based on well-understood proteins. By analysing how humans intuitively approach these puzzles, researchers hope to improve the algorithms used by protein-folding software. [14]

Foldit includes a series of tutorials where users manipulate simple protein-like structures and a periodically updated set of puzzles based on real proteins. It shows a graphical representation of each protein which users can manipulate using a set of tools.

Gamification

Foldit's developers wanted to attract as many people as possible to the cause of protein folding. So, rather than only building a useful science tool, they used gamification (the inclusion of gaming elements) to make Foldit appealing and engaging to the general public.

As a protein structure is modified, a score is calculated based on how well-folded the protein is, and a list of high scores for each puzzle is maintained. Foldit users may create and join groups, and members of groups can share puzzle solutions. Groups have been found to be useful in training new players. A separate list of group high scores is maintained.

Accomplishments

Results from Foldit have been included in a number of scientific publications.

Foldit players have been cited collectively as "Foldit players" or "Players, F." in some cases. Individual players have also been listed as authors on at least one paper, and on four related Protein Data Bank depositions.

Future development

Foldit's toolbox is mainly for the design of protein molecules. The game's creator announced the plan to add, by 2013, the chemical building blocks of organic subcomponents to enable players to design small molecules. [24] The small molecule design system termed Drugit was tested on the Von Hippel-Lindau tumor suppressor (VHL). Results of the VHL experiment were presented in a March 2023 preprint paper [25] and at an August 2023 American Chemical Society conference session. [26]

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Protein folding</span> Change of a linear protein chain to a 3D structure

Protein folding is the physical process where a protein chain is translated into its native three-dimensional structure, typically a "folded" conformation, by which the protein becomes biologically functional. Via an expeditious and reproducible process, a polypeptide folds into its characteristic three-dimensional structure from a random coil. Each protein exists first as an unfolded polypeptide or random coil after being translated from a sequence of mRNA into a linear chain of amino acids. At this stage, the polypeptide lacks any stable three-dimensional structure. As the polypeptide chain is being synthesized by a ribosome, the linear chain begins to fold into its three-dimensional structure.

<span class="mw-page-title-main">Structural genomics</span>

Structural genomics seeks to describe the 3-dimensional structure of every protein encoded by a given genome. This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling approaches. The principal difference between structural genomics and traditional structural prediction is that structural genomics attempts to determine the structure of every protein encoded by the genome, rather than focusing on one particular protein. With full-genome sequences available, structure prediction can be done more quickly through a combination of experimental and modeling approaches, especially because the availability of large number of sequenced genomes and previously solved protein structures allows scientists to model protein structure on the structures of previously solved homologs.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.

Levinthal's paradox is a thought experiment in the field of computational protein structure prediction where an algorithmic search for a minimum energy configuration is vastly slower than the actual process by which stable configurations are reached in protein folding. In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations. An estimate of 10300 was made in one of his papers (often incorrectly cited as the 1968 paper). For example, a polypeptide of 100 residues will have 99 peptide bonds, and therefore 198 different phi and psi bond angles. If each of these bond angles can be in one of three stable conformations, the protein may misfold into a maximum of 3198 different conformations (including any possible folding redundancy). Therefore, if a protein were to attain its correctly folded configuration by sequentially sampling all the possible conformations, it would require a time longer than the age of the universe to arrive at its correct native conformation. This is true even if conformations are sampled at rapid (nanosecond or picosecond) rates. The "paradox" is that most small proteins fold spontaneously on a millisecond or even microsecond time scale. The solution to this paradox has been established by computational approaches to protein structure prediction.

Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch or by making calculated variants of a known protein structure and its sequence. Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.

In molecular biology, protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

<span class="mw-page-title-main">Rosetta@home</span> BOINC based volunteer computing project researching protein folding

Rosetta@home is a volunteer computing project researching protein structure prediction on the Berkeley Open Infrastructure for Network Computing (BOINC) platform, run by the Baker lab. Rosetta@home aims to predict protein–protein docking and design new proteins with the help of about fifty-five thousand active volunteered computers processing at over 487,946 GigaFLOPS on average as of September 19, 2020. Foldit, a Rosetta@home videogame, aims to reach these goals with a crowdsourcing approach. Though much of the project is oriented toward basic research to improve the accuracy and robustness of proteomics methods, Rosetta@home also does applied research on malaria, Alzheimer's disease, and other pathologies.

<span class="mw-page-title-main">TIM barrel</span> Protein fold

The TIM barrel, also known as an alpha/beta barrel, is a conserved protein fold consisting of eight alpha helices (α-helices) and eight parallel beta strands (β-strands) that alternate along the peptide backbone. The structure is named after triose-phosphate isomerase, a conserved metabolic enzyme. TIM barrels are ubiquitous, with approximately 10% of all enzymes adopting this fold. Further, five of seven enzyme commission (EC) enzyme classes include TIM barrel proteins. The TIM barrel fold is evolutionarily ancient, with many of its members possessing little similarity today, instead falling within the twilight zone of sequence similarity.

<span class="mw-page-title-main">David Baker (biochemist)</span>

David Baker is an American biochemist and computational biologist who has pioneered methods to predict and design the three-dimensional structures of proteins. He is the Henrietta and Aubrey Davis Endowed Professor in Biochemistry and an adjunct professor of genome sciences, bioengineering, chemical engineering, computer science, and physics at the University of Washington. He serves as the director of the Rosetta Commons, a consortium of labs and researchers that develop biomolecular structure prediction and design software. The problem of protein structure prediction to which Baker has contributed significantly has now been largely solved by DeepMind using artificial intelligence. Baker is a Howard Hughes Medical Institute investigator and a member of the United States National Academy of Sciences. He is also the director of the University of Washington's Institute for Protein Design.

In computational biology, de novo protein structure prediction refers to an algorithmic process by which protein tertiary structure is predicted from its amino acid primary sequence. The problem itself has occupied leading scientists for decades while still remaining unsolved. According to Science, the problem remains one of the top 125 outstanding issues in modern science. At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins within 1.5 angstroms over the entire structure.

<span class="mw-page-title-main">EteRNA</span> 2010 browser-based video game

Eterna is a browser-based "game with a purpose", developed by scientists at Carnegie Mellon University and Stanford University, that engages users to solve puzzles related to the folding of RNA molecules. The project is supported by the Bill and Melinda Gates Foundation, Stanford University, and the National Institutes of Health. Prior funders include the National Science Foundation.

CS-ROSETTA is a framework for structure calculation of biological macromolecules on the basis of conformational information from NMR, which is built on top of the biomolecular modeling and design software called ROSETTA. The name CS-ROSETTA for this branch of ROSETTA stems from its origin in combining NMR chemical shift (CS) data with ROSETTA structure prediction protocols. The software package was later extended to include additional NMR conformational parameters, such as Residual Dipolar Couplings (RDC), NOE distance restraints, pseudocontact chemical shifts (PCS) and restraints derived from homologous proteins. This software can be used together with other molecular modeling protocols, such as docking to model protein oligomers. In addition, CS-ROSETTA can be combined with chemical shift resonance assignment algorithms to create a fully automated NMR structure determination pipeline. The CS-ROSETTA software is freely available for academic use and can be licensed for commercial use. A software manual and tutorials are provided on the supporting website https://csrosetta.chemistry.ucsc.edu/.

<span class="mw-page-title-main">CS23D</span>

CS23D is a web server to generate 3D structural models from NMR chemical shifts. CS23D combines maximal fragment assembly with chemical shift threading, de novo structure generation, chemical shift-based torsion angle prediction, and chemical shift refinement. CS23D makes use of RefDB and ShiftX.

<span class="mw-page-title-main">I-TASSER</span>

I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.

A neutral network is a set of genes all related by point mutations that have equivalent function or fitness. Each node represents a gene sequence and each line represents the mutation connecting two sequences. Neutral networks can be thought of as high, flat plateaus in a fitness landscape. During neutral evolution, genes can randomly move through neutral networks and traverse regions of sequence space which may have consequences for robustness and evolvability.

<span class="mw-page-title-main">AlphaFold</span> Artificial intelligence program by DeepMind

AlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure. The program is designed as a deep learning system.

<span class="mw-page-title-main">Backbone-dependent rotamer library</span> Collection of data on conformations of a given proteins amino acid side chains

In biochemistry, a backbone-dependent rotamer library provides the frequencies, mean dihedral angles, and standard deviations of the discrete conformations of the amino acid side chains in proteins as a function of the backbone dihedral angles φ and ψ of the Ramachandran map. By contrast, backbone-independent rotamer libraries express the frequencies and mean dihedral angles for all side chains in proteins, regardless of the backbone conformation of each residue type. Backbone-dependent rotamer libraries have been shown to have significant advantages over backbone-independent rotamer libraries, principally when used as an energy term, by speeding up search times of side-chain packing algorithms used in protein structure prediction and protein design.

John Michael Jumper is a senior research scientist at DeepMind Technologies. Jumper and his colleagues created AlphaFold, an artificial intelligence (AI) model to predict protein structures from their amino acid sequence with high accuracy. Jumper has stated that the AlphaFold team plans to release 100 million protein structures. The scientific journal Nature included Jumper as one of the ten "people who mattered" in science in their annual listing of Nature's 10 in 2021.

References

  1. University of Washington, Center for Game Science
  2. University of Washington, Department of Biochemistry
  3. 1 2 3 Markoff J (10 August 2010). "In a Video Game, Tackling the Complexities of Protein Folding". The New York Times. Retrieved 12 February 2013.
  4. 1 2 Cooper S, Khatib F, Treuille A, Barbero J, Lee J, Beenen M, et al. (August 2010). "Predicting protein structures with a multiplayer online game". Nature. 466 (7307): 756–60. Bibcode:2010Natur.466..756C. doi:10.1038/nature09304. PMC   2956414 . PMID   20686574.
  5. "Rosetta Commons: The hub for Rosetta modeling software". RosettaCommons.org. RosettaCommons.org. Retrieved 17 November 2015.
  6. 1 2 Howard Hughes Medical Institute "Protein-folding game taps power of worldwide audience to solve difficult puzzles" Eurekalert!, August 4, 2010
  7. Bourzac K (2008-05-08). "Biologists Enlist Online Gamers". Technology Review . Retrieved 17 November 2016.
  8. Bohannon J (2009-04-20). "Gamers Unravel the Secret Life of Protein". Wired . Retrieved 17 November 2016.
  9. "Zoran Popović". washington.edu.
  10. Hickey, Hannah. "Computer game's high score could earn the Nobel Prize in medicine" University of Washington, May 8, 2008
  11. 1 2 3 Marshall J (January 22, 2012). "Online Gamers Achieve First Crowd-Sourced Redesign of Protein". Scientific American . Retrieved February 22, 2012.
  12. Haspel N, Tsai CJ, Wolfson H, Nussinov R (June 2003). "Reducing the computational complexity of protein folding via fragment folding and assembly". Protein Science. 12 (6): 1177–87. doi:10.1110/ps.0232903. PMC   2323902 . PMID   12761388.
  13. Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, et al. (July 2017). "Global analysis of protein folding using massively parallel design, synthesis, and testing". Science. 357 (6347): 168–175. Bibcode:2017Sci...357..168R. doi:10.1126/science.aan0693. PMC   5568797 . PMID   28706065.
  14. Horowitz S, Koepnick B, Martin R, Tymieniecki A, Winburn AA, Cooper S, et al. (September 2016). "Determining crystal structures through crowdsourcing and coursework". Nature Communications. 7: 12549. Bibcode:2016NatCo...712549H. doi:10.1038/ncomms12549. PMC   5028414 . PMID   27633552.
  15. Khatib F, Cooper S, Tyka MD, Xu K, Makedon I, Popovic Z, et al. (November 2011). "Algorithm discovery by protein folding game players". Proceedings of the National Academy of Sciences of the United States of America. 108 (47): 18949–53. doi: 10.1073/pnas.1115898108 . PMC   3223433 . PMID   22065763.
  16. Khatib F, DiMaio F, Cooper S, Kazmierczyk M, Gilski M, Krzywda S, et al. (September 2011). "Crystal structure of a monomeric retroviral protease solved by protein folding game players". Nature Structural & Molecular Biology. 18 (10): 1175–7. doi:10.1038/nsmb.2119. PMC   3705907 . PMID   21926992.
  17. Gilski M, Kazmierczyk M, Krzywda S, Zábranská H, Cooper S, Popović Z, et al. (November 2011). "High-resolution structure of a retroviral protease folded as a monomer". Acta Crystallographica. Section D, Biological Crystallography. 67 (Pt 11): 907–14. doi:10.1107/S0907444911035943. PMC   3211970 . PMID   22101816.
  18. Praetorius D (2011-09-19). "Gamers Decode AIDS Protein That Stumped Researchers For 15 Years In Just 3 Weeks". The Huffington Post. Retrieved 17 November 2016.
  19. Eiben CB, Siegel JB, Bale JB, Cooper S, Khatib F, Shen BW, et al. (January 2012). "Increased Diels-Alderase activity through backbone remodeling guided by Foldit players". Nature Biotechnology. 30 (2): 190–2. doi:10.1038/nbt.2109. PMC   3566767 . PMID   22267011.
  20. Horowitz S, Koepnick B, Martin R, Tymieniecki A, Winburn AA, Cooper S, et al. (September 2016). "Determining crystal structures through crowdsourcing and coursework". Nature Communications. 7 (1): 12549. Bibcode:2016NatCo...712549H. doi: 10.1038/ncomms12549 . PMC   5028414 . PMID   27633552.
  21. Keasar C, McGuffin LJ, Wallner B, Chopra G, Adhikari B, Bhattacharya D, et al. (July 2018). "An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12". Scientific Reports. 8 (1): 9939. Bibcode:2018NatSR...8.9939K. doi:10.1038/s41598-018-26812-8. PMC   6028396 . PMID   29967418.
  22. Koepnick B, Flatten J, Husain T, Ford A, Silva DA, Bick MJ, et al. (June 2019). "De novo protein design by citizen scientists". Nature. 570 (7761): 390–394. Bibcode:2019Natur.570..390K. doi:10.1038/s41586-019-1274-4. PMC   6701466 . PMID   31168091.
  23. Khatib F, Desfosses A, Koepnick B, Flatten J, Popović Z, Baker D, et al. (November 2019). "Building de novo cryo-electron microscopy structures collaboratively with citizen scientists". PLOS Biology. 17 (11): e3000472. doi:10.1371/journal.pbio.3000472. PMC   6850521 . PMID   31714936.
  24. Hersher R (April 13, 2012). "FoldIt game's next play: crowdsourcing better drug design". nature.com. Retrieved April 16, 2012.
  25. Moretti, Rocco (March 15, 2023). "VHL puzzle series paper preprint released!". fold.it.
  26. "Drugit: Crowd-sourcing molecular design of non-peptidic VHL binders".