Top7 is an artificial protein, classified as a de novo protein. This means that the protein itself was designed to have a specific structure and functional properties. [1]
Top7 was designed by Brian Kuhlman and Gautam Dantas in David Baker's laboratory at the University of Washington. [2] Top7's design was built through the use of a general computational method that repeated its sequence design and structure prediction. The end goal was to develop a 93-residue α/β protein with a new sequence and arrangement of its structure, or topology. These computational methods helped to design the proteins along with protein structure prediction algorithms. [2]
The resulting sequence of residues is:
DIQVQVNIDDNGKNFDYTYTVTTESELQKVLNELKDYIKKQGAKRVRISITARTKKEAEKFAAILIKVFAELGYNDINVTFDGDTVTVEGQLE
Due to the de novo design, Top7 possesses a unique three-dimensional structure. The protein is described as a 93-residue α/β protein, which suggest that Top7 contains both alpha helices,α, and beta sheets, β, in its secondary structure. Overall, the structure consists of two alpha helices packed on a five-stranded anti-parallel beta sheet. The combination of alpha helices and beta sheets is seen commonly in protein structures; this contributes to the overall stability and functionality of the protein.
In order to achieve a target structure, researchers first developed a two-dimensional diagram and utilized it to determine the constraints that allowed them to construct the three-dimensional model of Top7. Determination of the high-resolution X-ray structure of the experimentally expressed and purified protein revealed that the structure (PDB: 1QYS) was indeed very similar (1.2 Å RMSD) to the computer-designed model.
Researchers used a variety of biophysical methods in order to characterize the Top7 protein. These processes were able to define certain characteristics to describe the protein. Gel filtration chromatography was used to determined that Top7 is monomeric and is highly soluble. It was also discovered that an increase in temperature allows the protein to unfold cooperatively and displays cold denaturation. Crystallization trials with Top7 design resulted in negligible differences in nuclear magnetic resonance therefore the design model exhibited a structure very similar to the true structure. [2] Structure-Based models were used to further studying folding characteristics of Top7. [3] [4]
Through these analyzes, it was determined that the Top7 protein is extremely stable. [2]
Top7 exhibits non-cooperative folding behavior. [5] Many naturally occurring proteins display cooperative folding, indicating that the whole structure folds in a coordinated procedure. In contrast, the folding of Top7 does not follow a smooth, single phase process. Its non-cooperative characteristic may be linked to its designed sequence, which promotes the formation of an independently folded C-terminal intermediate structure. Studies found that mutations in C-terminal as well as N-terminal of the amino acid sequence of a base model prove that there is a probable sequence of Top7 that allows fold cooperative folding. [3]
The creation of the de novo protein Top7 showcases the capability of computational methods in creating proteins with specific three-dimensional structures. This has broad implications for advancing the field of computational protein design and provides a platform for the creation of novel biomolecules with desired properties. [2] The stability and folding characteristics of Top7 provide insights into the relationship between sequence, structure, and folding cooperativity. Understanding these principles can contribute to the development of more stable and functional proteins not derived from natural evolution. [6]
Top7 was featured as the RCSB Protein Data Bank's 'Molecule of the Month' in October 2005, and a superposition of the respective cores (residues 60-79) of its predicted and X-ray crystal structures are featured in the Rosetta@home logo. [7]
An alpha helix is a sequence of amino acids in a protein that are twisted into a coil.
The beta sheet is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet. A β-strand is a stretch of polypeptide chain typically 3 to 10 amino acids long with backbone in an extended conformation. The supramolecular association of β-sheets has been implicated in the formation of the fibrils and protein aggregates observed in amyloidosis, Alzheimer's disease and other proteinopathies.
Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.
Protein folding is the physical process by which a protein, after synthesis by a ribosome as a linear chain of amino acids, changes from an unstable random coil into a more ordered three-dimensional structure. This structure permits the protein to become biologically functional.
A transmembrane protein is a type of integral membrane protein that spans the entirety of the cell membrane. Many transmembrane proteins function as gateways to permit the transport of specific substances across the membrane. They frequently undergo significant conformational changes to move a substance through the membrane. They are usually highly hydrophobic and aggregate and precipitate in water. They require detergents or nonpolar solvents for extraction, although some of them (beta-barrels) can be also extracted using denaturing agents.
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design.
Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.
Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.
Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid monomer may also be called a residue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions, such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo-electron microscopy (cryo-EM) and dual polarisation interferometry, to determine the structure of proteins.
Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch or by making calculated variants of a known protein structure and its sequence. Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.
In chemistry, a foldamer is a discrete chain molecule (oligomer) that folds into a conformationally ordered state in solution. They are artificial molecules that mimic the ability of proteins, nucleic acids, and polysaccharides to fold into well-defined conformations, such as α-helices and β-sheets. The structure of a foldamer is stabilized by noncovalent interactions between nonadjacent monomers. Foldamers are studied with the main goal of designing large molecules with predictable structures. The study of foldamers is related to the themes of molecular self-assembly, molecular recognition, and host–guest chemistry.
Rosetta@home is a volunteer computing project researching protein structure prediction on the Berkeley Open Infrastructure for Network Computing (BOINC) platform, run by the Baker lab. Rosetta@home aims to predict protein–protein docking and design new proteins with the help of about fifty-five thousand active volunteered computers processing at over 487,946 GigaFLOPS on average as of September 19, 2020. Foldit, a Rosetta@home videogame, aims to reach these goals with a crowdsourcing approach. Though much of the project is oriented toward basic research to improve the accuracy and robustness of proteomics methods, Rosetta@home also does applied research on malaria, Alzheimer's disease, and other pathologies.
A turn is an element of secondary structure in proteins where the polypeptide chain reverses its overall direction.
A 310 helix is a type of secondary structure found in proteins and polypeptides. Of the numerous protein secondary structures present, the 310-helix is the fourth most common type observed; following α-helices, β-sheets and reverse turns. 310-helices constitute nearly 10–15% of all helices in protein secondary structures, and are typically observed as extensions of α-helices found at either their N- or C- termini. Because of the α-helices tendency to consistently fold and unfold, it has been proposed that the 310-helix serves as an intermediary conformation of sorts, and provides insight into the initiation of α-helix folding.
The TIM barrel, also known as an alpha/beta barrel, is a conserved protein fold consisting of eight alpha helices (α-helices) and eight parallel beta strands (β-strands) that alternate along the peptide backbone. The structure is named after triose-phosphate isomerase, a conserved metabolic enzyme. TIM barrels are ubiquitous, with approximately 10% of all enzymes adopting this fold. Further, five of seven enzyme commission (EC) enzyme classes include TIM barrel proteins. The TIM barrel fold is evolutionarily ancient, with many of its members possessing little similarity today, instead falling within the twilight zone of sequence similarity.
A helix bundle is a small protein fold composed of several alpha helices that are usually nearly parallel or antiparallel to each other.
In computational biology, de novo protein structure prediction refers to an algorithmic process by which protein tertiary structure is predicted from its amino acid primary sequence. The problem itself has occupied leading scientists for decades while still remaining unsolved. According to Science, the problem remains one of the top 125 outstanding issues in modern science. At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins within 1.5 angstroms over the entire structure.
In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.
Carboxypeptidase A2 is an enzyme that in humans is encoded by the CPA2 gene.
In molecular biology, protein fold classes are broad categories of protein tertiary structure topology. They describe groups of proteins that share similar amino acid and secondary structure proportions. Each class contains multiple, independent protein superfamilies.