In molecular biology, protein fold classes are broad categories of protein tertiary structure topology. They describe groups of proteins that share similar amino acid and secondary structure proportions. Each class contains multiple, independent protein superfamilies (i.e. are not necessarily evolutionarily related to one another). [1] [2] [3]
Four large classes of protein that are generally agreed upon by the two main structure classification databases (SCOP and CATH).
All-α proteins are a class of structural domains in which the secondary structure is composed entirely of α-helices, with the possible exception of a few isolated β-sheets on the periphery.
Common examples include the bromodomain, the globin fold and the s.
All-β proteins are a class of structural domains in which the secondary structure is composed entirely of β-sheets, with the possible exception of a few isolated α-helices on the periphery.
Common examples include the SH3 domain, the beta-propeller domain, the immunoglobulin fold and B3 DNA binding domain.
α+β proteins are a class of structural domains in which the secondary structure is composed of α-helices and β-strands that occur separately along the backbone. The β-strands are therefore mostly antiparallel. [4]
Common examples include the ferredoxin fold, ribonuclease A, and the SH2 domain.
α/β proteins are a class of structural domains in which the secondary structure is composed of alternating α-helices and β-strands along the backbone. The β-strands are therefore mostly parallel. [4]
Common examples include the flavodoxin fold, the TIM barrel and leucine-rich-repeat (LRR) proteins such as ribonuclease inhibitor.
Membrane proteins interact with biological membranes either by inserting into it, or being tethered via a covalently attached lipid. They are one of the common types of protein along with soluble globular proteins, fibrous proteins, and disordered proteins. [5] They are targets of over 50% of all modern medicinal drugs. [6] It is estimated that 20–30% of all genes in most genomes encode membrane proteins. [7]
Intrinsically disordered proteins lack a fixed or ordered three-dimensional structure. [8] [9] [10] IDPs cover a spectrum of states from fully unstructured to partially structured and include random coils, (pre-)molten globules, and large multi-domain proteins connected by flexible linkers. They constitute one of the main types of protein (alongside globular, fibrous and membrane proteins). [5]
Coiled coil proteins form long, insoluble fibers involved in the extracellular matrix. There are many scleroprotein superfamilies including keratin, collagen, elastin, and fibroin. The roles of such proteins include protection and support, forming connective tissue, tendons, bone matrices, and muscle fiber.
Small proteins typically have a tertiary structure that is maintained by disulphide bridges (cysteine-rich proteins), metal ligands (metal-binding proteins), and or cofactors such as heme.
Numerous protein structures are the result of rational design and do not exist in nature. Proteins can be designed from scratch (de novo design) or by making calculated variations on a known protein structure and its sequence (known as protein redesign). Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or Artificial gene synthesis.
The beta sheet is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet. A β-strand is a stretch of polypeptide chain typically 3 to 10 amino acids long with backbone in an extended conformation. The supramolecular association of β-sheets has been implicated in the formation of the fibrils and protein aggregates observed in amyloidosis, Alzheimer's disease and other proteinopathies.
Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; it is important in medicine and biotechnology.
In a chain-like biological molecule, such as a protein or nucleic acid, a structural motif is a common three-dimensional structure that appears in a variety of different, evolutionarily unrelated molecules. A structural motif does not have to be associated with a sequence motif; it can be represented by different and completely unrelated sequences in different proteins or RNA.
The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.
The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones, and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.
Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid monomer may also be called a residue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions, such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo-electron microscopy (cryo-EM) and dual polarisation interferometry, to determine the structure of proteins.
A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure.
InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.
In protein structures, a beta barrel(β barrel) is a beta sheet composed of tandem repeats that twists and coils to form a closed toroidal structure in which the first strand is bonded to the last strand. Beta-strands in many beta-barrels are arranged in an antiparallel fashion. Beta barrel structures are named for resemblance to the barrels used to contain liquids. Most of them are water-soluble outer membrane proteins and frequently bind hydrophobic ligands in the barrel center, as in lipocalins. Others span cell membranes and are commonly found in porins. Porin-like barrel structures are encoded by as many as 2–3% of the genes in Gram-negative bacteria. It has been shown that more than 600 proteins with various function such as oxidase, dismutase, and amylase contain the beta barrel structure.
A beta helix is a tandem protein repeat structure formed by the association of parallel beta sheet in a helical pattern with either two or three faces. The beta helix is a type of solenoid protein domain. The structure is stabilized by inter-strand hydrogen bonds, protein-protein interactions, and sometimes bound metal ions. Both left- and right-handed beta helices have been identified. These structures are distinct from jelly-roll folds, a different protein structure sometimes known as a "double-stranded beta helix".
The TIM barrel, also known as an alpha/beta barrel, is a conserved protein fold consisting of eight alpha helices (α-helices) and eight parallel beta strands (β-strands) that alternate along the peptide backbone. The structure is named after triose-phosphate isomerase, a conserved metabolic enzyme. TIM barrels are ubiquitous, with approximately 10% of all enzymes adopting this fold. Further, five of seven enzyme commission (EC) enzyme classes include TIM barrel proteins. The TIM barrel fold is evolutionarily ancient, with many of its members possessing little similarity today, instead falling within the twilight zone of sequence similarity.
In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.
Protein subfamily is a level of protein classification, based on their close evolutionary relationship. It is below the larger levels of protein superfamily and protein family.
SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.
David Tudor Jones is a Professor of Bioinformatics, and Head of Bioinformatics Group in the University College London. He is also the director in Bloomsbury Center for Bioinformatics, which is a joint Research Centre between UCL and Birkbeck, University of London and which also provides bioinformatics training and support services to biomedical researchers. In 2013, he is a member of editorial boards for PLoS ONE, BioData Mining, Advanced Bioinformatics, Chemical Biology & Drug Design, and Protein: Structure, Function and Bioinformatics.
A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred. Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent. Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.
Membranome database provides structural and functional information about more than 6000 single-pass (bitopic) transmembrane proteins from Homo sapiens, Arabidopsis thaliana, Dictyostelium discoideum, Saccharomyces cerevisiae, Escherichia coli and Methanocaldococcus jannaschii. Bitopic membrane proteins consist of a single transmembrane alpha-helix connecting water-soluble domains of the protein situated at the opposite sides of a biological membrane. These proteins are frequently involved in the signal transduction and communication between cells in multicellular organisms.
IntFOLD is fully automated, integrated pipeline for prediction of 3D structure and function from amino acid sequences. The pipeline is wrapped up and deployed as a Web Server. The core of the server method is quality assessment using built-in accuracy self-estimates (ASE) which improves performance prediction of 3D model using ModFOLD.