The Protein Structure Initiative (PSI) was a USA based project that aimed at accelerating discovery in structural genomics and contribute to understanding biological function. [1] Funded by the U.S. National Institute of General Medical Sciences (NIGMS) between 2000 and 2015, its aim was to reduce the cost and time required to determine three-dimensional protein structures and to develop techniques for solving challenging problems in structural biology, including membrane proteins. Over a dozen research centers have been supported by the PSI for work in building and maintaining high-throughput structural genomics pipelines, developing computational protein structure prediction methods, organizing and disseminating information generated by the PSI, and applying high-throughput structure determination to study a broad range of important biological and biomedical problems.
The project has been organized into three separate phases. The first phase of the Protein Structure Initiative (PSI-1) spanned from 2000 to 2005, and was dedicated to demonstrating the feasibility of high-throughput structure determination, solving unique protein structures, and preparing for a subsequent production phase. [2] The second phase, PSI-2, focused on implementing the high-throughput structure determination methods developed in PSI-1, as well as homology modeling and addressing bottlenecks like modeling membrane proteins. [3] The third phase, PSI:Biology, began in 2010 and consisted of networks of investigators applying high-throughput structure determination to study a broad range of biological and biomedical problems. [4] PSI program ended on 7/1/2015, [5] even that some of the PSI centers continue structure determination supported by other funding mechanisms.
The first phase of the Protein Structure Initiative (PSI-1) lasted from June 2000 until September 2005, and had a budget of $270 million funded primarily by NIGMS with support from the National Institute of Allergy and Infectious Diseases. [2] PSI-1 saw the establishment of nine pilot centers focusing on structural genomics studies of a range of organisms, including Arabidopsis thaliana , Caenorhabditis elegans and Mycobacterium tuberculosis . [2] During this five-year period over 1,100 protein structures were determined, over 700 of which were classified as "unique" due to their < 30% sequence similarity with other known protein structures. [2]
The primary goal of PSI-1, to develop methods to streamline the structure determination process, resulted in an array of technical advances. Several methods developed during PSI-1 enhanced expression of recombinant proteins in systems like Escherichia coli , Pichia pastoris and insect cell lines. New streamlined approaches to cell cloning, expression and protein purification were also introduced, in which robotics and software platforms were integrated into the protein production pipeline to minimize required manpower, increase speed, and lower costs. [6]
The second phase of the Protein Structure Initiative (PSI-2) lasted from July 2005 to June 2010. Its goal was to use methods introduced in PSI-1 to determine a large number of proteins and continue development in streamlining the structural genomics pipeline. PSI-2 had a five-year budget of $325 million provided by NIGMS with support from the National Center for Research Resources. By the end of this phase, the Protein Structure Initiative had solved over 4,800 protein structures; over 4,100 of these were unique. [7]
The number of sponsored research centers grew to 14 during PSI-2. Four centers were selected as Large Scale centers, with a mandate to place 15% effort on targets nominated by the broader research community, 15% on targets of biomedical relevance, and 70% on broad structural coverage; these centers were the Joint Center for Structural Genomics (JCSG), the Midwest Center for Structural Genomics (MCSG), the Northeast Structural Genomics Consortium (NESG), and the New York SGX Research Center for Structural Genomics (NYSGXRC). The new centers participating in PSI-2 included four specialized centers: Accelerated Technologies Center for Gene to 3D Structure (ATCG3D), the Center for Eukaryotic Structural Genomics (CESG), the Center for High-Throughput Structural Biology (CHTSB), a branch of the Structural Genomics of Pathogenic Protozoa Consortium taking that institution's place), the Center for Structures of Membrane Proteins (CSMP), and the New York Consortium on Membrane Protein Structure (NYCOMPS). Two homology modeling centers, the Joint Center for Molecular Modeling (JCMM) and New Methods for High-Resolution Comparative Modeling (NMHRCM) were also added, as well as two resource centers, the PSI Materials Repository (PSI-MR) and the PSI Structural Biology Knowledgebase (SBKB). [8] The TB Structural Genomics Consortium was removed from the roster of supported research centers in the transition from PSI-1 to PSI-2. [2]
Originally launched in February 2008, the SBKB is a free resource that provides information on protein sequence and keyword searching, as well as modules describing target selection, experimental protocols, structure models, functional annotation, metrics on overall progress, and updates on structure determination technology. Like the PDB, it is directed by Dr. Helen M. Berman and hosted at Rutgers University.
The PSI Materials Repository, established in 2006 at the Harvard Institute of Proteomics, stores and ships PSI-generated plasmid clones. [9] Clones are sequence-verified, annotated and stored in the DNASU Plasmid Repository, [10] currently located at the Biodesign Institute at Arizona State University. As of September 2011, there are over 50,000 PSI-generated plasmid clones and empty vectors available for request through DNASU in addition to over 147,000 clones generated from non-PSI sources. Plasmids are distributed to researchers worldwide. Now called the PSI:Biology Materials Repository, this resource has a five-year budget of $5.4 million and is under the direction of Dr. Joshua LaBaer, [11] who moved to Arizona State University in the middle of 2009, taking the PSI:Biology-MR with him.
The third phase of the PSI was called PSI:Biology and was intended to reflect the emphasis on the biological relevance of the work. [4] During this phase, highly organized networks of investigators were applying the new paradigm of high-throughput structure determination, which was successfully developed during the earlier phases of the PSI, to study a broad range of important biological and biomedical problems. The network included centers for high-throughput structure determination, centers for membrane protein structure determination, consortia for high-throughput-enabled structural biology partnerships, the SBKB and the PSI-MR. In September 2013 NIH announced that PSI would not be renewed after its third phase would end in 2015.
As of January 2006, about two thirds of worldwide structural genomics (SG) output was made by PSI centers. [12] Of these PSI contributions over 20% represented new Pfam families, compared to the non-SG average of 5%. [12] Pfam families represent structurally distinct groups of proteins as predicted from sequenced genomes. Not targeting homologs of known structure was accomplished by using sequence comparison tools like BLAST and PSI-BLAST. [12] Like the difference in novelty as determined by discovery of new Pfam families, the PSI also discovered more SCOP folds and superfamilies than non-SG efforts. In 2006, 16% of structures solved by the PSI represented new SCOP folds and superfamilies, while the non-SG average was 4%. [12] Solving such novel structures reflects increased coverage of protein fold space, one of the PSI's main goals. [1] Determining the structure a novel protein allows homology modeling to more accurately predict the fold of other proteins in the same structural family.
While most of the structures solved by the four large-scale PSI centers lack functional annotation, many of the remaining PSI centers determine structures for proteins with known biological function. The TB Structural Genomics Consortium, for example, focused exclusively on functionally characterized proteins. During its term in PSI-1, it deposited structures for over 70 unique proteins from Mycobacterium tuberculosis , which represented more than 35% of total unique M. tuberculosis structures solved through 2007. [13] In following with its biomedical theme to increase coverage of phosphotomes, the NYSGXRC has determined structures for about 10% of all human phosphatases. [14]
The PSI consortia have provided the overwhelming majority of targets for the Critical Assessment of Techniques for Protein Structure Prediction (CASP), a community-wide, biannual experiment to determine the state and progress of protein structure prediction. [15] [16] [17]
A major goal during the PSI:Biology phase is to utilize the high-throughput methods developed during the initiative's first decade to generate protein structures for functional studies, broadening the PSI's biomedical impact. It is also expected to advance knowledge and understanding of membrane proteins.[ citation needed ]
The PSI has received notable criticism from the structural biology community. Among these charges is that the main product of the PSI – PDB files of proteins' atomic coordinates as determined by X-ray crystallography or NMR spectroscopy – are not useful enough to biologists to justify the project's $764 million cost. [18] [19] Critics note that money currently spent on the PSI could have otherwise funded what they consider worthier causes:
The $60 million a year in public money that is being spent – I would say, wasted – on the PSI is enough to fund approximately 100–200 individual investigator-initiated research grants. These hypothesis-driven proposals are the lifeblood of the scientific enterprise, and as I have discussed recently in other columns, they are being sucked dry by, among other things, an increasing trend to fund large initiatives at their expense. That $60 million a year would raise the payline at a typical NIH institute by about 6 percentile points, enough to make a huge difference to peer review and to the continuance of a lot of important science. [19]
— Gregory Petsko, PhD
A short response to this was published: [20]
In conclusion, it should be kept in mind that scientific research, and the cutting- edge technologies that both drive and are driven by it, are constantly and rapidly evolving. Some of Petsko’s criticisms are constructive, and should be noted by policy-makers. But one should not throw the baby out with the bathwater, rather tune the scope and objectives of the PSI to the needs of the life-science community as a whole, much in the spirit of SPINE, the SGC and other European structural genomics/ proteomics projects. [21] If such a constructive approach is adopted, we feel confident that the structural data provided by the PSI and its cousins will serve as no less valuable a resource than genome sequences.
In October 2008 the NIGMS hosted a meeting concerning the future of structural genomics efforts and invited speakers from the PSI Advisory Committee, members of the NIGMS Advisory Council, and interested scientists who had no previous involvement with the PSI. Representatives of other genomics, proteomics, and structural genomics initiatives, as well as scientists from academia, government, and industry were also included. Based on this meeting and the subsequent recommendations from the PSI Advisory Committee, [22] [23] a concept-clearance document was released in January 2009 describing what a third phase of the PSI might entail. Most notable was a large emphasis on partnerships and collaborations to ensure that the majority of PSI research is focused on proteins of interest to the broader research community as well as efforts to make PSI products more accessible to the research community. [24]
Grant applications for PSI:Biology were submitted by October 29, 2009. See Phase 3 section above.
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.
Structural biology is a field that is many centuries old which, as defined by the Journal of Structural Biology, deals with structural analysis of living material at every level of organization. Early structural biologists throughout the 19th and early 20th centuries were primarily only able to study structures to the limit of the naked eye's visual acuity and through magnifying glasses and light microscopes.
Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.
Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In addition, other kinds of proteins include antibodies that protect an organism from infection, and hormones that send important signals throughout the body.
An integral, or intrinsic, membrane protein (IMP) is a type of membrane protein that is permanently attached to the biological membrane. All transmembrane proteins can be classified as IMPs, but not all IMPs are transmembrane proteins. IMPs comprise a significant fraction of the proteins encoded in an organism's genome. Proteins that cross the membrane are surrounded by annular lipids, which are defined as lipids that are in direct contact with a membrane protein. Such proteins can only be separated from the membranes by using detergents, nonpolar solvents, or sometimes denaturing agents.
Structural genomics seeks to describe the 3-dimensional structure of every protein encoded by a given genome. This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling approaches. The principal difference between structural genomics and traditional structural prediction is that structural genomics attempts to determine the structure of every protein encoded by the genome, rather than focusing on one particular protein. With full-genome sequences available, structure prediction can be done more quickly through a combination of experimental and modeling approaches, especially because the availability of large number of sequenced genomes and previously solved protein structures allows scientists to model protein structure on the structures of previously solved homologs.
The SIB Swiss Institute of Bioinformatics is an academic not-for-profit foundation which federates bioinformatics activities throughout Switzerland.
Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.
Two-hybrid screening is a molecular biology technique used to discover protein–protein interactions (PPIs) and protein–DNA interactions by testing for physical interactions between two proteins or a single protein and a DNA molecule, respectively.
Ian Andrew Wilson is the Hansen Professor of Structural Biology and chair of the Department of Integrative Structural and Computational Biology at the Scripps Research Institute in San Diego, California, United States.
David S. Eisenberg is an American biochemist and biophysicist best known for his contributions to structural biology and computational molecular biology, a professor at the University of California, Los Angeles since the early 1970s and director of the UCLA-DOE Institute for Genomics & Proteomics since the early 1990s, as well as a member of the California NanoSystems Institute (CNSI) at UCLA.
In computational biology, de novo protein structure prediction refers to an algorithmic process by which protein tertiary structure is predicted from its amino acid primary sequence. The problem itself has occupied leading scientists for decades while still remaining unsolved. According to Science, the problem remains one of the top 125 outstanding issues in modern science. At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins within 1.5 angstroms over the entire structure.
Helen Miriam Berman is a Board of Governors Professor of Chemistry and Chemical Biology at Rutgers University and a former director of the RCSB Protein Data Bank. A structural biologist, her work includes structural analysis of protein-nucleic acid complexes, and the role of water in molecular interactions. She is also the founder and director of the Nucleic Acid Database, and led the Protein Structure Initiative Structural Genomics Knowledgebase.
Jack Y. Yang is an American computer scientist and biophysicist. As of 2011, he is the editor-in-chief of the International Journal of Computational Biology and Drug Design.
The Life Sciences Institute (LSI) is a collaborative, multidisciplinary research institution located on the campus of the University of Michigan in Ann Arbor. It encompasses 27 faculty-led teams from 13 schools and departments throughout U-M. The LSI brings together leading scientists from a variety of life science disciplines, working with a range of models systems and cutting-edge research tools, to accelerate breakthroughs and discoveries that will broaden understanding of the basic processes of life and lead to new treatments to improve human health.
Phyre and Phyre2 are free web-based services for protein structure prediction. Phyre is among the most popular methods for protein structure prediction having been cited over 1500 times. Like other remote homology recognition techniques, it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods. Its development is funded by the Biotechnology and Biological Sciences Research Council.
The Enzyme Function Initiative (EFI) is a large-scale collaborative project aiming to develop and disseminate a robust strategy to determine enzyme function through an integrated sequence–structure-based approach. The project was funded in May 2010 by the National Institute of General Medical Sciences as a Glue Grant which supports the research of complex biological problems that cannot be solved by a single research group. The EFI was largely spurred by the need to develop methods to identify the functions of the enormous number proteins discovered through genomic sequencing projects.
SWISS-MODEL is a structural bioinformatics web-server dedicated to homology modeling of 3D protein structures. Homology modeling is currently the most accurate method to generate reliable three-dimensional protein structure models and is routinely used in many practical applications. Homology modelling methods make use of experimental protein structures ("templates") to build models for evolutionary related proteins ("targets").
Continuous Automated Model EvaluatiOn (CAMEO) is a community-wide project to continuously evaluate the accuracy and reliability of protein structure prediction servers in a fully automated manner. CAMEO is a continuous and fully automated complement to the bi-annual CASP experiment.
Minimum information standards are sets of guidelines and formats for reporting data derived by specific high-throughput methods. Their purpose is to ensure the data generated by these methods can be easily verified, analysed and interpreted by the wider scientific community. Ultimately, they facilitate the transfer of data from journal articles into databases in a form that enables data to be mined across multiple data sets. Minimal information standards are available for a vast variety of experiment types including microarray (MIAME), RNAseq (MINSEQE), metabolomics (MSI) and proteomics (MIAPE).
{{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link)