Developer(s) | Samudrala Computational Biology Group, [1] University of Washington |
---|---|
Initial release | 2008-5-12 |
Platform | Cross-platform |
Type | Distributed computing |
Website | protinfo.compbio.washington.edu/rice |
Nutritious Rice for the World [2] is a World Community Grid research project in the field of agronomy led by the Samudrala Computational Biology Research Group [1] at the University of Washington. It was launched on May 12, 2008. [3] [4] The objective of this project is to predict the structure of proteins of major strains of rice. The intent is to help farmers breed better rice strains with higher crop yields, promote greater disease and pest resistance, and utilize a full range of bioavailable nutrients that can benefit people around the world, especially in regions where malnutrition is a critical concern.
Determining the structure of proteins is an extremely difficult and expensive process. Though it is possible to computationally predict a protein's structure from its corresponding DNA sequence, there are thousands of distinct proteins found in rice. This presents a computational challenge that a single computer cannot solve within a reasonable timeframe. [5]
Once that the entire rice genome had been sequenced, the effort shifted to identifying genes that are involved in increased yield, disease resistance and nutritional value. This problem is made more difficult because very few cereal plants have been sequenced, and therefore, many of the rice genes do not resemble any genes of known function. The Computational Biology Research Group at the University of Washington developed the Protinfo [6] software, which can produce protein structures at a fraction of the cost and time.
Protinfo is being used to create three-dimensional models of the tens of thousands of rice proteins. These models are then used to predict the function of each protein and to understand the role of the gene that encodes it. The models, and any analysis resulting from examining them, will be housed at the Bioverse database and webserver, which is a comprehensive framework to relate molecules such as proteins and DNA to an organism's pathways and systems.
Volunteers' computers on World Community Grid will run the Protinfo software to create models of all proteins encoded by the rice genome whose structure can be predicted reliably. These models will be analyzed to choose the best ones. From the resulting structures, prediction tools will determine the function of each protein and the role of the gene that encodes it. Using the power of Protinfo, World Community Grid will initially examine over 10,000 genes, and produce 100,000 models per gene.
Eventually, the structures of 30,000 to 60,000 proteins will be studied. [7] Generating one billion models on the 320 CPU cluster at the Computational Biology Research Group was anticipated to take about 30 years to accomplish; however, using World Community Grid took only about two years working at 167 TFLOPS. [8] The distributed computing function was suspended in April 2010 while in-house analysis of results continues. The DC function will resume when funding is secured for further phases.
The resulting knowledge base will hopefully lead to the development of improved hybrids of rice strains with higher yield, greater disease and pest resistance, and a full range of bioavailable nutrients. This knowledge can also be extended to other food crops such as wheat and maize. [9]
The project has minimum system requirements [10] that the computer that does calculations for the project must comply to, which include:
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.
Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.
Protein engineering is the process of developing useful or valuable proteins through the design and production of unnatural polypeptides, often by altering amino acid sequences found in nature. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles. It has been used to improve the function of many enzymes for industrial catalysis. It is also a product and services market, with an estimated value of $168 billion by 2017.
In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.
In biology and other experimental sciences, an in silico experiment is one performed on a computer or via computer simulation software. The phrase is pseudo-Latin for 'in silicon', referring to silicon in computer chips. It was coined in 1987 as an allusion to the Latin phrases in vivo, in vitro, and in situ, which are commonly used in biology. The latter phrases refer, respectively, to experiments done in living organisms, outside living organisms, and where they are found in nature.
Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.
World Community Grid (WCG) is an effort to create the world's largest volunteer computing platform to tackle scientific research that benefits humanity. Launched on November 16, 2004, with proprietary Grid MP client from United Devices and adding support for Berkeley Open Infrastructure for Network Computing (BOINC) in 2005, World Community Grid eventually discontinued the Grid MP client and consolidated on the BOINC platform in 2008. In September 2021, it was announced that IBM transferred ownership to the Krembil Research Institute of University Health Network in Toronto, Ontario.
Modelling biological systems is a significant task of systems biology and mathematical biology. Computational systems biology aims to develop and use efficient algorithms, data structures, visualization and communication tools with the goal of computer modelling of biological systems. It involves the use of computer simulations of biological systems, including cellular subsystems, to both analyze and visualize the complex connections of these cellular processes.
Metabolic network modelling, also known as metabolic network reconstruction or metabolic pathway analysis, allows for an in-depth insight into the molecular mechanisms of a particular organism. In particular, these models correlate the genome with molecular physiology. A reconstruction breaks down metabolic pathways into their respective reactions and enzymes, and analyzes them within the perspective of the entire network. In simplified terms, a reconstruction collects all of the relevant metabolic information of an organism and compiles it in a mathematical model. Validation and analysis of reconstructions can allow identification of key features of metabolism such as growth yield, resource distribution, network robustness, and gene essentiality. This knowledge can then be applied to create novel biotechnology.
Michael Levitt, is a South African-born biophysicist and a professor of structural biology at Stanford University, a position he has held since 1987. Levitt received the 2013 Nobel Prize in Chemistry, together with Martin Karplus and Arieh Warshel, for "the development of multiscale models for complex chemical systems". In 2018, Levitt was a founding co-editor of the Annual Review of Biomedical Data Science.
Molecular biophysics is a rapidly evolving interdisciplinary area of research that combines concepts in physics, chemistry, engineering, mathematics and biology. It seeks to understand biomolecular systems and explain biological function in terms of molecular structure, structural organization, and dynamic behaviour at various levels of complexity. This discipline covers topics such as the measurement of molecular forces, molecular associations, allosteric interactions, Brownian motion, and cable theory. Additional areas of study can be found on Outline of Biophysics. The discipline has required development of specialized equipment and procedures capable of imaging and manipulating minute living structures, as well as novel experimental approaches.
The Fiocruz Genome Comparison Project is a collaborative effort involving Brazil's Oswaldo Cruz Institute and IBM's World Community Grid, designed to produce a database comparing the genes from many genomes with each other using SSEARCH. The program SSEARCH performs a rigorous Smith–Waterman alignment between a protein sequence and another protein sequence, a protein database, a DNA or a DNA library.
In computational biology, de novo protein structure prediction refers to an algorithmic process by which protein tertiary structure is predicted from its amino acid primary sequence. The problem itself has occupied leading scientists for decades while still remaining unsolved. According to Science, the problem remains one of the top 125 outstanding issues in modern science. At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins within 1.5 angstroms over the entire structure.
Help Cure Muscular Dystrophy is a volunteer computing project that runs on the BOINC platform.
Ram Samudrala is a professor of computational biology and bioinformatics at the University at Buffalo, United States. He researches protein folding, structure, function, interaction, design, and evolution.
In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.
The Critical Assessment of Functional Annotation (CAFA) is an experiment designed to provide a large-scale assessment of computational methods dedicated to predicting protein function. Different algorithms are evaluated by their ability to predict the Gene Ontology (GO) terms in the categories of Molecular Function, Biological Process, and Cellular Component.
Mary Jo Ondrechen is an American chemist, educator, researcher, community leader and activist. She serves as Professor of Chemistry and Chemical Biology and Principal Investigator of the Computational Biology Research Group at Northeastern University in Boston, Massachusetts.
Chaetomium thermophilum is a thermophilic filamentous fungus. It grows on dung or compost. It is notable for being a eukaryote with a high temperature tolerance (60 °C). Its optimal growth temperature is 50–55 °C.
Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.