Topology of a transmembrane protein refers to locations of N- and C-termini of membrane-spanning polypeptide chain with respect to the inner or outer sides of the biological membrane occupied by the protein. [1]
Several databases provide experimentally determined topologies of membrane proteins. They include Uniprot, TOPDB, [3] [4] [5] OPM, and ExTopoDB. [6] [7] There is also a database of domains located conservatively on a certain side of membranes, TOPDOM. [8]
Several computational methods were developed, with a limited success, for predicting transmembrane alpha-helices and their topology. Pioneer methods utilized the fact that membrane-spanning regions contain more hydrophobic residues than other parts of the protein, however applying different hydrophobic scales altered the prediction results. Later, several statistical methods were developed to improve the topography prediction and a special alignment method was introduced. [9] According to the positive-inside rule, [10] cytosolic loops near the lipid bilayer contain more positively-charged amino acids. Applying this rule resulted in the first topology prediction methods. There is also a negative-outside rule in transmembrane alpha-helices from single-pass proteins, although negatively charged residues are rarer than positively charged residues in transmembrane segments of proteins. [11] As more structures were determined, machine learning algorithms appeared. Supervised learning methods are trained on a set of experimentally determined structures, however, these methods highly depend on the training set. [12] [13] [14] [15] Unsupervised learning methods are based on the principle that topology depends on the maximum divergence of the amino acid distributions in different structural parts. [16] [17] It was also shown that locking a segment location based on prior knowledge about the structure improves the prediction accuracy. [18] This feature has been added to some of the existing prediction methods. [17] [14] The most recent methods use consensus prediction (i.e. they use several algorithm to determine the final topology) [19] and automatically incorporate previously determined experimental informations. [20] HTP database [21] [22] provides a collection of topologies that are computationally predicted for human transmembrane proteins.
Discrimination of signal peptides and transmembrane segments is an additional problem in topology prediction treated with a limited success by different methods. [23] Both signal peptides and transmembrane segments contain hydrophobic regions which form α-helices. This causes the cross-prediction between them, which is a weakness of many transmembrane topology predictors. By predicting signal peptides and transmembrane helices simultaneously (Phobius [14] ), the errors caused by cross-prediction are reduced and the performance is substantially increased. Another feature used to increase the accuracy of the prediction is the homology (PolyPhobius).”
It is also possible to predict beta-barrel membrane proteins' topology. [24] [25]
An alpha helix is a sequence of amino acids in a protein that are twisted into a coil.
Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.
A transmembrane domain (TMD) is a membrane-spanning protein domain. TMDs may consist of one or several alpha-helices or a transmembrane beta barrel. Because the interior of the lipid bilayer is hydrophobic, the amino acid residues in TMDs are often hydrophobic, although proteins such as membrane pumps and ion channels can contain polar residues. TMDs vary greatly in size and hydrophobicity; they may adopt organelle-specific properties.
Membrane proteins are common proteins that are part of, or interact with, biological membranes. Membrane proteins fall into several broad categories depending on their location. Integral membrane proteins are a permanent part of a cell membrane and can either penetrate the membrane (transmembrane) or associate with one or the other side of a membrane. Peripheral membrane proteins are transiently associated with the cell membrane.
A transmembrane protein (TP) is a type of integral membrane protein that spans the entirety of the cell membrane. Many transmembrane proteins function as gateways to permit the transport of specific substances across the membrane. They frequently undergo significant conformational changes to move a substance through the membrane. They are usually highly hydrophobic and aggregate and precipitate in water. They require detergents or nonpolar solvents for extraction, although some of them (beta-barrels) can be also extracted using denaturing agents.
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.
A signal peptide is a short peptide present at the N-terminus of most newly synthesized proteins that are destined toward the secretory pathway. These proteins include those that reside either inside certain organelles, secreted from the cell, or inserted into most cellular membranes. Although most type I membrane-bound proteins have signal peptides, the majority of type II and multi-spanning membrane-bound proteins are targeted to the secretory pathway by their first transmembrane domain, which biochemically resembles a signal sequence except that it is not cleaved. They are a kind of target peptide.
A protein contact map represents the distance between all possible amino acid residue pairs of a three-dimensional protein structure using a binary two-dimensional matrix. For two residues and , the element of the matrix is 1 if the two residues are closer than a predetermined threshold, and 0 otherwise. Various contact definitions have been proposed: The distance between the Cα-Cα atom with threshold 6-12 Å; distance between Cβ-Cβ atoms with threshold 6-12 Å ; and distance between the side-chain centers of mass.
SOSUI is a free online tool that predicts a part of the secondary structure of proteins from a given amino acid sequence (AAS). The main objective is to determine whether the protein in question is a soluble or a transmembrane protein.
Orientations of Proteins in Membranes (OPM) database provides spatial positions of membrane protein structures with respect to the lipid bilayer. Positions of the proteins are calculated using an implicit solvation model of the lipid bilayer. The results of calculations were verified against experimental studies of spatial arrangement of transmembrane and peripheral proteins in membranes.
Professor Nils Gunnar Hansson von Heijne, born 10 June 1951 in Gothenburg, is a Swedish scientist working on signal peptides, membrane proteins and bioinformatics at the Stockholm Center for Biomembrane Research at Stockholm University.
KIAA0090 is a human gene coding for a protein of unknown function. KIAA0090 has two aliases OTTHUMP00000002581 and RP1-43E13.1. The gene codes for multiple transcript variants which can localize to different subcellular compartments. KIAA0090 interacts with multiple effector proteins. KIAA0090 contains a conserved COG1520 WD40 like repeat domain thought to be the method of such interaction.
David Tudor Jones is a Professor of Bioinformatics, and Head of Bioinformatics Group in the University College London. He is also the director in Bloomsbury Center for Bioinformatics, which is a joint Research Centre between UCL and Birkbeck, University of London and which also provides bioinformatics training and support services to biomedical researchers. In 2013, he is a member of editorial boards for PLoS ONE, BioData Mining, Advanced Bioinformatics, Chemical Biology & Drug Design, and Protein: Structure, Function and Bioinformatics.
In molecular biology, protein fold classes are broad categories of protein tertiary structure topology. They describe groups of proteins that share similar amino acid and secondary structure proportions. Each class contains multiple, independent protein superfamilies.
A target peptide is a short peptide chain that directs the transport of a protein to a specific region in the cell, including the nucleus, mitochondria, endoplasmic reticulum (ER), chloroplast, apoplast, peroxisome and plasma membrane. Some target peptides are cleaved from the protein by signal peptidases after the proteins are transported.
Solute carrier family 46 member 3 (SLC46A3) is a protein that in humans is encoded by the SLC46A3 gene. Also referred to as FKSG16, the protein belongs to the major facilitator superfamily (MFS) and SLC46A family. Most commonly found in the plasma membrane and endoplasmic reticulum (ER), SLC46A3 is a multi-pass membrane protein with 11 α-helical transmembrane domains. It is mainly involved in the transport of small molecules across the membrane through the substrate translocation pores featured in the MFS domain. The protein is associated with breast and prostate cancer, hepatocellular carcinoma (HCC), papilloma, glioma, obesity, and SARS-CoV. Based on the differential expression of SLC46A3 in antibody-drug conjugate (ADC)-resistant cells and certain cancer cells, current research is focused on the potential of SLC46A3 as a prognostic biomarker and therapeutic target for cancer. While protein abundance is relatively low in humans, high expression has been detected particularly in the liver, small intestine, and kidney.
TMEM106C is a gene that encodes the transmembrane protein 106C (TMEM106C) in Homo sapiens It has been found to be overexpressed in cancer cells and also is related to distal arthrogryposis, a condition of stiff joints and irregular muscle development. The TMEM106C gene contains a domain of unknown function, DUF1356, that spans most of the protein. Transmembrane protein 106C also goes by the aliases MGC5576 or MGC111210, LOC79022.
Stephen H. White is an American Biophysicist, academic, and author. He is a Professor Emeritus of Physiology and Biophysics at the University of California, Irvine.
Small integral membrane protein 14, also known as SMIM14 or C4orf34, is a protein encoded on chromosome 4 of the human genome by the SMIM14 gene. SMIM14 has at least 298 orthologs mainly found in jawed vertebrates and no paralogs. SMIM14 is classified as a type I transmembrane protein. While this protein is not well understood by the scientific community, the transmembrane domain of SMIM14 may be involved in ER retention.