Small protein

Last updated

Small proteins are a diverse fold class of proteins (usually <100 amino acids long). [1] [2] [3] Their tertiary structure is usually maintained by disulphide bridges, [4] metal ligands, [5] and or cofactors such as heme. Some small proteins serve important regulatory functions by direct interaction with certain enzymes and are therefore also an interesting tool for biotechnological applications in microorganisms. [6]

Contents

Identification of small proteins

The size of small proteins has limited their identification and characterization for a long time. However, the various examples of functionality have led to the development of methods for their identification.

For larger ORFs, computational identification is based solely on their long uninterrupted coding potential. Computational searches for small proteins take into account multiple parameters, such as the presence of a ribosome binding site and amino acid conservation. [7] RNA sequencing or mass spectrometric data sets available are also incorporated into computational predictions. [8] [9]

A method extensively used for the identification of small proteins is ribosome profiling (Ribo-seq or ribosome footprinting). Ribosome profiling uses next generation sequencing and targets only mRNA sequences protected by the ribosomes. Binding of a ribosome on an mRNA suggests that the transcript is being actively translated, allowing for the identification even of very small ORFs. [10]

Mass spectrometry is the best method thus far for identifying small proteins, but their sizes again pose a barrier. However, several adjustments are possible to perform to improve detection and data quality. [11]

See also

Related Research Articles

<span class="mw-page-title-main">Protein</span> Biomolecule consisting of chains of amino acid residues

Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.

<span class="mw-page-title-main">RNA</span> Family of large biological molecules

Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself or by forming a template for the production of proteins. RNA and deoxyribonucleic acid (DNA) are nucleic acids. The nucleic acids constitute one of the four major macromolecules essential for all known forms of life. RNA is assembled as a chain of nucleotides. Cellular organisms use messenger RNA (mRNA) to convey genetic information that directs synthesis of specific proteins. Many viruses encode their genetic information using an RNA genome.

<span class="mw-page-title-main">Ribosome</span> Synthesizes proteins in cells

Ribosomes are macromolecular machines, found within all cells, that perform biological protein synthesis. Ribosomes link amino acids together in the order specified by the codons of messenger RNA molecules to form polypeptide chains. Ribosomes consist of two major components: the small and large ribosomal subunits. Each subunit consists of one or more ribosomal RNA molecules and many ribosomal proteins. The ribosomes and associated molecules are also known as the translational apparatus.

<span class="mw-page-title-main">Translation (biology)</span> Cellular process of protein synthesis

In biology, translation is the process in living cells in which proteins are produced using RNA molecules as templates. The generated protein is a sequence of amino acids. This sequence is determined by the sequence of nucleotides in the RNA. The nucleotides are considered three at a time. Each such triple results in addition of one specific amino acid to the protein being generated. The matching from nucleotide triple to amino acid is called the genetic code. The translation is performed by a large complex of functional RNA and proteins called ribosomes. The entire process is called gene expression.

A signal peptide is a short peptide present at the N-terminus of most newly synthesized proteins that are destined toward the secretory pathway. These proteins include those that reside either inside certain organelles, secreted from the cell, or inserted into most cellular membranes. Although most type I membrane-bound proteins have signal peptides, most type II and multi-spanning membrane-bound proteins are targeted to the secretory pathway by their first transmembrane domain, which biochemically resembles a signal sequence except that it is not cleaved. They are a kind of target peptide.

The 5′ untranslated region is the region of a messenger RNA (mRNA) that is directly upstream from the initiation codon. This region is important for the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes. While called untranslated, the 5′ UTR or a portion of it is sometimes translated into a protein product. This product can then regulate the translation of the main coding sequence of the mRNA. In many organisms, however, the 5′ UTR is completely untranslated, instead forming a complex secondary structure to regulate translation.

The Shine–Dalgarno (SD) sequence is a ribosomal binding site in bacterial and archaeal messenger RNA, generally located around 8 bases upstream of the start codon AUG. The RNA sequence helps recruit the ribosome to the messenger RNA (mRNA) to initiate protein synthesis by aligning the ribosome with the start codon. Once recruited, tRNA may add amino acids in sequence as dictated by the codons, moving downstream from the translational start site.

<span class="mw-page-title-main">Conserved sequence</span> Similar DNA, RNA or protein sequences within genomes or among species

In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids or proteins across species, or within a genome, or between donor and receptor taxa. Conservation indicates that a sequence has been maintained by natural selection.

<span class="mw-page-title-main">Start codon</span> First codon of a messenger RNA translated by a ribosome

The start codon is the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and archaea and a N-formylmethionine (fMet) in bacteria, mitochondria and plastids.

Bacterial translation is the process by which messenger RNA is translated into proteins in bacteria.

<span class="mw-page-title-main">Aminoacyl-tRNA</span> Molecule that delivers the amino acid to the ribosome during translation

Aminoacyl-tRNA is tRNA to which its cognate amino acid is chemically bonded (charged). The aa-tRNA, along with particular elongation factors, deliver the amino acid to the ribosome for incorporation into the polypeptide chain that is being produced during translation.

<span class="mw-page-title-main">Ribosomal protein</span> Proteins found in ribosomes

A ribosomal protein is any of the proteins that, in conjunction with rRNA, make up the ribosomal subunits involved in the cellular process of translation. E. coli, other bacteria and Archaea have a 30S small subunit and a 50S large subunit, whereas humans and yeasts have a 40S small subunit and a 60S large subunit. Equivalent subunits are frequently numbered differently between bacteria, Archaea, yeasts and humans.

<span class="mw-page-title-main">EF-Tu</span> Prokaryotic elongation factor

EF-Tu is a prokaryotic elongation factor responsible for catalyzing the binding of an aminoacyl-tRNA (aa-tRNA) to the ribosome. It is a G-protein, and facilitates the selection and binding of an aa-tRNA to the A-site of the ribosome. As a reflection of its crucial role in translation, EF-Tu is one of the most abundant and highly conserved proteins in prokaryotes. It is found in eukaryotic mitochondria as TUFM.

<span class="mw-page-title-main">ArcZ RNA</span>

In molecular biology the ArcZ RNA is a small non-coding RNA (ncRNA). It is the functional product of a gene which is not translated into protein. ArcZ is an Hfq binding RNA that functions as an antisense regulator of a number of protein coding genes.

A ribosome binding site, or ribosomal binding site (RBS), is a sequence of nucleotides upstream of the start codon of an mRNA transcript that is responsible for the recruitment of a ribosome during the initiation of translation. Mostly, RBS refers to bacterial sequences, although internal ribosome entry sites (IRES) have been described in mRNAs of eukaryotic cells or viruses that infect eukaryotes. Ribosome recruitment in eukaryotes is generally mediated by the 5' cap present on eukaryotic mRNAs.

<span class="mw-page-title-main">EF-G</span> Prokaryotic elongation factor

EF-G is a prokaryotic elongation factor involved in mRNA translation. As a GTPase, EF-G catalyzes the movement (translocation) of transfer RNA (tRNA) and messenger RNA (mRNA) through the ribosome.

Bacterial small RNAs are small RNAs produced by bacteria; they are 50- to 500-nucleotide non-coding RNA molecules, highly structured and containing several stem-loops. Numerous sRNAs have been identified using both computational analysis and laboratory-based techniques such as Northern blotting, microarrays and RNA-Seq in a number of bacterial species including Escherichia coli, the model pathogen Salmonella, the nitrogen-fixing alphaproteobacterium Sinorhizobium meliloti, marine cyanobacteria, Francisella tularensis, Streptococcus pyogenes, the pathogen Staphylococcus aureus, and the plant pathogen Xanthomonas oryzae pathovar oryzae. Bacterial sRNAs affect how genes are expressed within bacterial cells via interaction with mRNA or protein, and thus can affect a variety of bacterial functions like metabolism, virulence, environmental stress response, and structure.

<span class="mw-page-title-main">Anti small RNA</span> RNA sequences

Antisense small RNAs are short RNA sequences that are complementary to other small RNA (sRNA) in the cell.

<span class="mw-page-title-main">Micropeptide</span> Short length polypeptides

Micropeptides are polypeptides with a length of less than 100-150 amino acids that are encoded by short open reading frames (sORFs). In this respect, they differ from many other active small polypeptides, which are produced through the posttranslational cleavage of larger polypeptides. In terms of size, micropeptides are considerably shorter than "canonical" proteins, which have an average length of 330 and 449 amino acids in prokaryotes and eukaryotes, respectively. Micropeptides are sometimes named according to their genomic location. For example, the translated product of an upstream open reading frame (uORF) might be called a uORF-encoded peptide (uPEP). Micropeptides lack an N-terminal signaling sequences, suggesting that they are likely to be localized to the cytoplasm. However, some micropeptides have been found in other cell compartments, as indicated by the existence of transmembrane micropeptides. They are found in both prokaryotes and eukaryotes. The sORFs from which micropeptides are translated can be encoded in 5' UTRs, small genes, or polycistronic mRNAs. Some micropeptide-coding genes were originally mis-annotated as long non-coding RNAs (lncRNAs).

<span class="mw-page-title-main">Translatomics</span>

Translatomics is the study of all open reading frames (ORFs) that are being actively translated in a cell or organism. This collection of ORFs is called the translatome. Characterizing a cell's translatome can give insight into the array of biological pathways that are active in the cell. According to the central dogma of molecular biology, the DNA in a cell is transcribed to produce RNA, which is then translated to produce a protein. Thousands of proteins are encoded in an organism's genome, and the proteins present in a cell cooperatively carry out many functions to support the life of the cell. Under various conditions, such as during stress or specific timepoints in development, the cell may require different biological pathways to be active, and therefore require a different collection of proteins. Depending on intrinsic and environmental conditions, the collection of proteins being made at one time varies. Translatomic techniques can be used to take a "snapshot" of this collection of actively translating ORFs, which can give information about which biological pathways the cell is activating under the present conditions.

References

  1. Kihara D, Skolnick J (December 2003). "The PDB is a covering set of small protein structures". Journal of Molecular Biology. 334 (4): 793–802. CiteSeerX   10.1.1.333.477 . doi:10.1016/j.jmb.2003.10.027. PMID   14636603.
  2. Su M, Ling Y, Yu J, Wu J, Xiao J (December 2013). "Small proteins: untapped area of potential biological importance". Frontiers in Genetics. 4: 286. doi: 10.3389/fgene.2013.00286 . PMC   3864261 . PMID   24379829.
  3. Storz G, Wolf YI, Ramamurthi KS (2014-06-02). "Small proteins can no longer be ignored". Annual Review of Biochemistry. 83 (1): 753–77. doi:10.1146/annurev-biochem-070611-102400. PMC   4166647 . PMID   24606146.
  4. Cheek S, Krishna SS, Grishin NV (May 2006). "Structural classification of small, disulfide-rich protein domains". Journal of Molecular Biology. 359 (1): 215–37. doi:10.1016/j.jmb.2006.03.017. PMID   16618491.
  5. Berg, J. M. (April 1990). "Zinc fingers and other metal-binding domains. Elements for interactions between macromolecules". The Journal of Biological Chemistry. 265 (12): 6513–6. doi: 10.1016/S0021-9258(19)39172-0 . PMID   2108957. Archived from the original on 8 May 2022.
  6. Brandenburg F, Klähn S (2020). "Small but smart: On the diverse role of small proteins in the regulation of cyanobacterial metabolism". Life. 10 (12): 322. Bibcode:2020Life...10..322B. doi: 10.3390/life10120322 . PMC   7760959 . PMID   33271798.
  7. Richardson, E. J.; Watson, M. (2012-03-09). "The automatic annotation of bacterial genomes". Briefings in Bioinformatics. 14 (1): 1–12. doi:10.1093/bib/bbs007. ISSN   1467-5463. PMC   3548604 . PMID   22408191.
  8. Sberro, Hila; Fremin, Brayon J.; Zlitni, Soumaya; Edfors, Fredrik; Greenfield, Nicholas; Snyder, Michael P.; Pavlopoulos, Georgios A.; Kyrpides, Nikos C.; Bhatt, Ami S. (2019-08-22). "Large-Scale Analyses of Human Microbiomes Reveal Thousands of Small, Novel Genes". Cell. 178 (5): 1245–1259.e14. doi:10.1016/j.cell.2019.07.016. ISSN   0092-8674. PMC   6764417 . PMID   31402174.
  9. Miravet-Verde, Samuel; Ferrar, Tony; Espadas-García, Guadalupe; Mazzolini, Rocco; Gharrab, Anas; Sabido, Eduard; Serrano, Luis; Lluch-Senar, Maria (2019). "Unraveling the hidden universe of small proteins in bacterial genomes". Molecular Systems Biology. 15 (2): e8290. doi:10.15252/msb.20188290. ISSN   1744-4292. PMC   6385055 . PMID   30796087.
  10. Ingolia, Nicholas T.; Brar, Gloria A.; Rouskin, Silvia; McGeachy, Anna M.; Weissman, Jonathan S. (2012). "The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments". Nature Protocols. 7 (8): 1534–1550. doi:10.1038/nprot.2012.086. ISSN   1750-2799. PMC   3535016 . PMID   22836135.
  11. Ahrens, Christian H.; Wade, Joseph T.; Champion, Matthew M.; Langer, Julian D. (2022-01-18). Henkin, Tina M. (ed.). "A Practical Guide to Small Protein Discovery and Characterization Using Mass Spectrometry". Journal of Bacteriology. 204 (1): e00353–21. doi:10.1128/jb.00353-21. ISSN   0021-9193. PMC   8765459 . PMID   34748388.