Trefoil knot fold

Last updated
A deep trefoil knot in a Thermus thermophilus RNA methyltransferase domain (PDB ID 1IPA). The knotted C-terminus of the protein is shown in blue. Trefoil-knot-1ipa.png
A deep trefoil knot in a Thermus thermophilus RNA methyltransferase domain (PDB ID 1IPA). The knotted C-terminus of the protein is shown in blue.

The trefoil knot fold is a protein fold in which the protein backbone is twisted into a trefoil knot shape. "Shallow" knots in which the tail of the polypeptide chain only passes through a loop by a few residues are uncommon, but "deep" knots in which many residues are passed through the loop are extremely rare. Deep trefoil knots have been found in the SPOUT superfamily. [1] including methyltransferase proteins involved in posttranscriptional RNA modification in all three domains of life, including bacterium Thermus thermophilus [2] and proteins, [3] in archaea and in eukaryota. [4]

Contents

In many cases the trefoil knot is part of the active site or a ligand-binding site and is critical to the activity of the enzyme in which it appears. Before the discovery of the first knotted protein, it was believed that the process of protein folding could not efficiently produce deep knots in protein backbones. Studies of the folding kinetics of a dimeric protein from Haemophilus influenzae have revealed that the folding of trefoil knot proteins may depend on proline isomerization. [5] Computational algorithms have been developed to identify knotted protein structures, both to canvas the Protein Data Bank for previously undetected natural knots and to identify knots in protein structure predictions, where they are unlikely to accurately reproduce the native-state structure due to the rarity of knots in known proteins. [6] Knottins are small, diverse and stable proteins with important drug design potential. They can be classified in 30 families which cover a wide range of sequences (1621 sequenced), three-dimensional structures (155 solved) and functions (> 10). Inter knottin similarity lies mainly between 20% and 40% sequence identity and 1.5 to 4 A backbone deviations although they all share a tightly knotted disulfide core. This important variability is likely to arise from the highly diverse loops which connect the successive knotted cysteines. The prediction of structural models for all knottin sequences would open new directions for the analysis of interaction sites and to provide a better understanding of the structural and functional organization of proteins sharing this scaffold. [7]

Trefoil domain

Trefoil (P-type) domain
PDB 1psp EBI.jpg
Structure of pancreatic spasmolytic polypeptide. [8]
Identifiers
SymbolTrefoil
Pfam PF00088
InterPro IPR000519
SMART SM00018
PROSITE PDOC00024
SCOP2 1psp / SCOPe / SUPFAM
CDD cd00111
Available protein structures:
Pfam   structures / ECOD  
PDB RCSB PDB; PDBe; PDBj
PDBsum structure summary
PDB 1e9t A:31-72 1pe3 2:31-72 1ps2 :30-71

1hi7 B:30-71 2psp A:28-70 1psp B:28-70

1pcp :28-70

Trefoil (P-type) domain is a cysteine-rich domain of approximately forty five amino-acid residues has been found in some extracellular eukaryotic proteins. [9] [10] [11] [12] It is known as either the 'P', 'trefoil' or 'TFF' domain, and contains six cysteines linked by three disulphide bonds with connectivity 1–5, 2–4, 3–6.

The domain has been found in a variety of extracellular eukaryotic proteins, [9] [11] [12] including protein pS2 (TFF1) a protein secreted by the stomach mucosa; spasmolytic polypeptide (SP) (TFF2), a protein of about 115 residues that inhibits gastrointestinal motility and gastric acid secretion; intestinal trefoil factor (ITF) (TFF3); Xenopus laevis stomach proteins xP1 and xP4; xenopus integumentary mucins A.1 (preprospasmolysin) and C.1, proteins which may be involved in defense against microbial infections by protecting the epithelia from the external environment; xenopus skin protein xp2 (or APEG); Zona pellucida sperm-binding protein B (ZP-B); intestinal sucrase-isomaltase (EC 3.2.1.48 / EC 3.2.1.10), a vertebrate membrane bound, multifunctional enzyme complex which hydrolyzes sucrose, maltose and isomaltose; and lysosomal alpha-glucosidase (EC 3.2.1.20).

Examples

Human gene encoding proteins containing the trefoil domain include:

History

There was a web server pKNOT available to detect knots in proteins as well as to provide information on knotted proteins in the Protein Data Bank. [13]

Related Research Articles

<span class="mw-page-title-main">Alpha helix</span> Type of secondary structure of proteins

An alpha helix is a sequence of amino acids in a protein that are twisted into a coil.

<span class="mw-page-title-main">Beta sheet</span> Protein structural motif

The beta sheet, (β-sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet. A β-strand is a stretch of polypeptide chain typically 3 to 10 amino acids long with backbone in an extended conformation. The supramolecular association of β-sheets has been implicated in the formation of the fibrils and protein aggregates observed in amyloidosis, Alzheimer's disease and other proteinopathies.

<span class="mw-page-title-main">Protein</span> Biomolecule consisting of chains of amino acid residues

Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.

<span class="mw-page-title-main">Protein tertiary structure</span> Three dimensional shape of a protein

Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may interact and bond in a number of ways. The interactions and bonds of side chains within a particular protein determine its tertiary structure. The protein tertiary structure is defined by its atomic coordinates. These coordinates may refer either to a protein domain or to the entire tertiary structure. A number of tertiary structures may fold into a quaternary structure.

<span class="mw-page-title-main">Zinc finger</span> Small structural protein motif found mostly in transcriptional proteins

A zinc finger is a small protein structural motif that is characterized by the coordination of one or more zinc ions (Zn2+) which stabilizes the fold. It was originally coined to describe the finger-like appearance of a hypothesized structure from the African clawed frog (Xenopus laevis) transcription factor IIIA. However, it has been found to encompass a wide variety of differing protein structures in eukaryotic cells. Xenopus laevis TFIIIA was originally demonstrated to contain zinc and require the metal for function in 1983, the first such reported zinc requirement for a gene regulatory protein followed soon thereafter by the Krüppel factor in Drosophila. It often appears as a metal-binding domain in multi-domain proteins.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.

In biology and biochemistry, protease inhibitors, or antiproteases, are molecules that inhibit the function of proteases. Many naturally occurring protease inhibitors are proteins.

<span class="mw-page-title-main">Catalytic triad</span> Set of three coordinated amino acids

A catalytic triad is a set of three coordinated amino acids that can be found in the active site of some enzymes. Catalytic triads are most commonly found in hydrolase and transferase enzymes. An acid-base-nucleophile triad is a common motif for generating a nucleophilic residue for covalent catalysis. The residues form a charge-relay network to polarise and activate the nucleophile, which attacks the substrate, forming a covalent intermediate which is then hydrolysed to release the product and regenerate free enzyme. The nucleophile is most commonly a serine or cysteine amino acid, but occasionally threonine or even selenocysteine. The 3D structure of the enzyme brings together the triad residues in a precise orientation, even though they may be far apart in the sequence.

<span class="mw-page-title-main">Cyclotide</span> Disulfide-rich ring peptides found in plants

In biochemistry, cyclotides are small, disulfide-rich peptides isolated from plants. Typically containing 28-37 amino acids, they are characterized by their head-to-tail cyclised peptide backbone and the interlocking arrangement of their three disulfide bonds. These combined features have been termed the cyclic cystine knot (CCK) motif. To date, over 100 cyclotides have been isolated and characterized from species of the families Rubiaceae, Violaceae, and Cucurbitaceae. Cyclotides have also been identified in agriculturally important families such as the Fabaceae and Poaceae.

<span class="mw-page-title-main">Maltose-binding protein</span>

Maltose-binding protein (MBP) is a part of the maltose/maltodextrin system of Escherichia coli, which is responsible for the uptake and efficient catabolism of maltodextrins. It is a complex regulatory and transport system involving many proteins and protein complexes. MBP has an approximate molecular mass of 42.5 kilodaltons.

<span class="mw-page-title-main">Protein domain</span> Self-stable region of a proteins chain that folds independently from the rest

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

<span class="mw-page-title-main">Protein-glutamate O-methyltransferase</span>

In enzymology, a protein-glutamate O-methyltransferase is an enzyme that catalyzes the chemical reaction

<span class="mw-page-title-main">Trefoil factor 1</span> Protein-coding gene in the species Homo sapiens

Trefoil factor 1 is a protein that in humans is encoded by the TFF1 gene.

<span class="mw-page-title-main">Trefoil factor 2</span> Protein-coding gene in the species Homo sapiens

Trefoil factor 2 is a protein that in humans is encoded by the TFF2 gene.

<span class="mw-page-title-main">Circular permutation in proteins</span> Arrangement of amino acid sequence

A circular permutation is a relationship between proteins whereby the proteins have a changed order of amino acids in their peptide sequence. The result is a protein structure with different connectivity, but overall similar three-dimensional (3D) shape. In 1979, the first pair of circularly permuted proteins – concanavalin A and lectin – were discovered; over 2000 such proteins are now known.

<span class="mw-page-title-main">SET domain</span>

The SET domain is a protein domain that typically has methyltransferase activity. It was originally identified as part of a larger conserved region present in the Drosophila Trithorax protein and was subsequently identified in the Drosophila Su(var)3-9 and 'Enhancer of zeste' proteins, from which the acronym SET is derived [Su(var)3-9, Enhancer-of-zeste and Trithorax].

<span class="mw-page-title-main">Knotted protein</span>

Knotted proteins are proteins whose backbones entangle themselves in a knot. One can imagine pulling a protein chain from both termini, as though pulling a string from both ends. When a knotted protein is “pulled” from both termini, it does not get disentangled. Knotted proteins are very rare, making up only about one percent of the proteins in the Protein Data Bank, and their folding mechanisms and function are not well understood. Although there are experimental and theoretical studies that hint to some answers, systematic answers to these questions have not yet been found.

<span class="mw-page-title-main">Nest (protein structural motif)</span>

The Nest is a type of protein structural motif. It is a small recurring anion-binding feature of both proteins and peptides. Each consists of the main chain atoms of three consecutive amino acid residues. The main chain NH groups bind the anions while the side chain atoms are often not involved. Proline residues lack NH groups so are rare in nests. About one in 12 of amino acid residues in proteins, on average, belongs to a nest.

Atrolysin A is an enzyme that is one of six hemorrhagic toxins found in the venom of western diamondback rattlesnake. This endopeptidase has a length of 419 amino acid residues. The metalloproteinase disintegrin-like domain and the cysteine-rich domain of the enzyme are responsible for the enzyme's hemorrhagic effects on organisms via inhibition of platelet aggregation.

A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred. Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent. Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.

References

  1. Zarembinski, Thomas I.; Kim, Youngchang; Peterson, Kelly; Christendat, Dinesh; Dharamsi, Akil; Arrowsmith, Cheryl H.; Edwards, Aled M.; Joachimiak, Andrzej (2002). "Deep trefoil knot implicated in RNA binding found in an archaebacterial protein". Proteins: Structure, Function, and Bioinformatics. 50 (2): 177–183. doi:10.1002/prot.10311. PMC   2792022 . PMID   12486711.
  2. Nureki, Osamu; Shirouzu, Mikako; Hashimoto, Kyoko; Ishitani, Ryuichiro; Terada, Takaho; Tamakoshi, Masatada; Oshima, Tairo; Chijimatsu, Masao; Takio, Koji; Vassylyev, Dmitry G.; Shibata, Takehiko; Inoue, Yorinao; Kuramitsu, Seiki; Yokoyama, Shigeyuki (2002). "An enzyme with a deep trefoil knot for the active-site architecture". Acta Crystallographica Section D Biological Crystallography. 58 (7): 1129–1137. doi:10.1107/s0907444902006601. PMID   12077432.
  3. Nureki, Osamu; Watanabe, Kazunori; Fukai, Shuya; Ishii, Ryohei; Endo, Yaeta; Hori, Hiroyuki; Yokoyama, Shigeyuki (2004). "Deep Knot Structure for Construction of Active Site and Cofactor Binding Site of tRNA Modification Enzyme". Structure. 12 (4): 593–602. doi: 10.1016/j.str.2004.03.003 . PMID   15062082.
  4. Leulliot, N.; Bohnsack, M. T.; Graille, M.; Tollervey, D.; Van Tilbeurgh, H. (2007). "The yeast ribosome synthesis factor Emg1 is a novel member of the superfamily of alpha/Beta knot fold methyltransferases". Nucleic Acids Research. 36 (2): 629–639. doi:10.1093/nar/gkm1074. PMC   2241868 . PMID   18063569.
  5. Mallam, Anna L.; Jackson, Sophie E. (2006). "Probing Nature's Knots: The Folding Pathway of a Knotted Homodimeric Protein". Journal of Molecular Biology. 359 (5): 1420–1436. doi:10.1016/j.jmb.2006.04.032. PMID   16787779.
  6. Khatib, Firas; Weirauch, Matthew T.; Rohl, Carol A. (2006). "Rapid knot detection and application to protein structure prediction". Bioinformatics. 22 (14): e252–e259. doi: 10.1093/bioinformatics/btl236 . PMID   16873480.
  7. Gracy, Jérôme; Chiche, Laurent (2010). "Optimizing structural modeling for a specific protein scaffold: Knottins or inhibitor cystine knots". BMC Bioinformatics. 11: 535. doi: 10.1186/1471-2105-11-535 . PMC   2984590 . PMID   21029427.
  8. Gajhede M, Petersen TN, Henriksen A, et al. (December 1993). "Pancreatic spasmolytic polypeptide: first three-dimensional structure of a member of the mammalian trefoil family of peptides". Structure. 1 (4): 253–62. doi:10.1016/0969-2126(93)90014-8. PMID   8081739.
  9. 1 2 Otto B, Wright N (1994). "Trefoil peptides. Coming up clover". Current Biology. 4 (9): 835–838. doi:10.1016/S0960-9822(00)00186-X. PMID   7820556. S2CID   11245174.
  10. Thim L, Wright NA, Hoffmann W, Otto WR, Rio MC (1997). "Rolling in the clover: trefoil factor family (TFF)-domain peptides, cell migration and cancer". FEBS Letters. 408 (2): 121–123. doi: 10.1016/S0014-5793(97)00424-9 . PMID   9187350. S2CID   26946754.
  11. 1 2 Bork P (1993). "A trefoil domain in the major rabbit zona pellucida protein". Protein Science. 2 (4): 669–670. doi:10.1002/pro.5560020417. PMC   2142363 . PMID   8518738.
  12. 1 2 Hoffmann W, Hauser F (1993). "The P-domain or trefoil motif: a role in renewal and pathology of mucous epithelia?". Trends in Biochemical Sciences. 18 (7): 239–243. doi:10.1016/0968-0004(93)90170-R. PMID   8267796.
  13. Lai, Y.-L.; Yen, S.-C.; Yu, S.-H.; Hwang, J.-K. (2007). "pKNOT: The protein KNOT web server". Nucleic Acids Research. 35 (Web Server issue): W420–W424. doi:10.1093/nar/gkm304. PMC   1933195 . PMID   17526524.

Bibliography

This article incorporates text from the public domain Pfam and InterPro: IPR000519