ORFeome

Last updated

In molecular genetics, an ORFeome refers to the complete set of open reading frames (ORFs) in a genome. The term may also be used to describe a set of cloned ORFs. [1] ORFs correspond to the protein coding sequences (CDS) of genes. ORFs can be found in genome sequences by computer programs such as GENSCAN and then amplified by PCR. While this is relatively trivial in bacteria the problem is non-trivial in eukaryotic genomes because of the presence of introns and exons as well as splice variants.

Contents

Use in research

The usage of complete ORFeomes reflects a new trend in biology that can be succinctly summarized as omics. ORFeomes are used for the study of protein-protein interactions, [2] [3] protein microarrays, the study of antigens, [4] and other fields of study.

Cloned ORFeomes

Complete ORF sets have been cloned for a number of organisms including Brucella melitensis , [5] Chlamydia pneumoniae , [6] Escherichia coli , [7] Neisseria gonorrhoeae , [8]

Pseudomonas aeruginosa , [9] Schizosaccharomyces pombe,

    Staphylococcus aureus [10]

    and human herpesviruses [11]

    A partial human ORFeome has also been produced. [12] [13]

    Related Research Articles

    <span class="mw-page-title-main">Interactome</span> Complete set of molecular interactions in a biological cell

    In molecular biology, an interactome is the whole set of molecular interactions in a particular cell. The term specifically refers to physical interactions among molecules but can also describe sets of indirect interactions among genes.

    Within the field of molecular biology, a protein-fragment complementation assay, or PCA, is a method for the identification and quantification of protein–protein interactions. In the PCA, the proteins of interest are each covalently linked to fragments of a third protein. Interaction between the bait and the prey proteins brings the fragments of the reporter protein in close proximity to allow them to form a functional reporter protein whose activity can be measured. This principle can be applied to many different reporter proteins and is also the basis for the yeast two-hybrid system, an archetypical PCA assay.

    <span class="mw-page-title-main">Zinc transporter 8</span> Protein found in humans

    Zinc transporter 8 (ZNT8) is a protein that in humans is encoded by the SLC30A8 gene. ZNT8 is a zinc transporter related to insulin secretion in humans. In particular, ZNT8 is critical for the accumulation of zinc into beta cell secretory granules and the maintenance of stored insulin as tightly packaged hexamers. Certain alleles of the SLC30A8 gene may increase the risk for developing type 2 diabetes, but a loss-of-function mutation appears to greatly reduce the risk of diabetes.

    <span class="mw-page-title-main">DDX17</span> Protein-coding gene in the species Homo sapiens

    Probable ATP-dependent RNA helicase DDX17 (p72) is an enzyme that in humans is encoded by the DDX17 gene.

    <span class="mw-page-title-main">IFI27</span> Protein-coding gene in the species Homo sapiens

    Interferon alpha-inducible protein 27 is a protein that in humans is encoded by the IFI27 gene.

    <span class="mw-page-title-main">BTBD1</span> Protein-coding gene in the species Homo sapiens

    BTB/POZ domain-containing protein 1 is a protein that in humans is encoded by the BTBD1 gene.

    <span class="mw-page-title-main">MAP3K7CL</span> Protein-coding gene in the species Homo sapiens

    MAP3K7CL, is a human gene located on chromosome 21. It is a protein-coding gene.

    <span class="mw-page-title-main">SAP130</span> Protein-coding gene in humans

    Histone deacetylase complex subunit SAP130 is an enzyme that in humans is encoded by the SAP130 gene.

    <span class="mw-page-title-main">TMEM47</span> Protein-coding gene in the species Homo sapiens

    Transmembrane protein 47 is a protein that in humans is encoded by the TMEM47 gene.

    <span class="mw-page-title-main">TMC2</span> Protein-coding gene in the species Homo sapiens

    Transmembrane channel-like protein 2 is a protein that in humans is encoded by the TMC2 gene.

    <span class="mw-page-title-main">CCDC186</span> Protein found in humans

    CCDC186 is a protein that in humans is encoded by the CCDC186 gene The CCDC186 gene is also known as the CTCL-tumor associated antigen with accession number NM_018017.

    <span class="mw-page-title-main">LTR retrotransposon</span> Class I transposable element

    LTR retrotransposons are class I transposable elements (TEs) characterized by the presence of long terminal repeats (LTRs) directly flanking an internal coding region. As retrotransposons, they mobilize through reverse transcription of their mRNA and integration of the newly created cDNA into another genomic location. Their mechanism of retrotransposition is shared with retroviruses, with the difference that the rate of horizontal transfer in LTR-retrotransposons is much lower than the vertical transfer by passing active TE insertions to the progeny. LTR retrotransposons that form virus-like particles are classified under Ortervirales.

    αr9 is a family of bacterial small non-coding RNAs with representatives in a broad group of α-proteobacteria from the order Hyphomicrobiales. The first member of this family (Smr9C) was found in a Sinorhizobium meliloti 1021 locus located in the chromosome (C). Further homology and structure conservation analysis have identified full-length Smr9C homologs in several nitrogen-fixing symbiotic rhizobia, in the plant pathogens belonging to Agrobacterium species as well as in a broad spectrum of Brucella species. αr9C RNA species are 144-158 nt long and share a well defined common secondary structure consisting of seven conserved regions. Most of the αr9 transcripts can be catalogued as trans-acting sRNAs expressed from well-defined promoter regions of independent transcription units within intergenic regions (IGRs) of the α-proteobacterial genomes.

    αr14 is a family of bacterial small non-coding RNAs with representatives in a broad group of α-proteobacteria. The first member of this family (Smr14C2) was found in a Sinorhizobium meliloti 1021 locus located in the chromosome (C). It was later renamed NfeR1 and shown to be highly expressed in salt stress and during the symbiotic interaction on legume roots. Further homology and structure conservation analysis identified 2 other chromosomal copies and 3 plasmidic ones. Moreover, full-length Smr14C homologs have been identified in several nitrogen-fixing symbiotic rhizobia, in the plant pathogens belonging to Agrobacterium species as well as in a broad spectrum of Brucella species. αr14C RNA species are 115-125 nt long and share a well defined common secondary structure. Most of the αr14 transcripts can be catalogued as trans-acting sRNAs expressed from well-defined promoter regions of independent transcription units within intergenic regions (IGRs) of the α-proteobacterial genomes.

    αr15 is a family of bacterial small non-coding RNAs with representatives in a broad group of α-proteobacteria from the order Rhizobiales. The first members of this family were found tandemly arranged in the same intergenic region (IGR) of the Sinorhizobium meliloti 1021 chromosome (C). Further homology and structure conservation analysis have identified full-length Smr15C1 and Smr15C2 homologs in several nitrogen-fixing symbiotic rhizobia, in the plant pathogens belonging to Agrobacterium species as well as in a broad spectrum of Brucella species. The Smr15C1 and Smr15C2 homologs are also encoded in tandem within the same IGR region of Rhizobium and Agrobacterium species, whereas in Brucella species the αr15C loci are spread in the IGRs of Chromosome I. Moreover, this analysis also identified a third αr15 loci in extrachromosomal replicons of the mentioned nitrogen-fixing α-proteobacteria and in the Chromosome II of Brucella species. αr15 RNA species are 99-121 nt long and share a well defined common secondary structure consisting of three stem loops. The transcripts of the αr15 family can be catalogued as trans-acting sRNAs encoded by independent transcription units with recognizable promoter and transcription termination signatures within intergenic regions (IGRs) of the α-proteobacterial genomes.

    An overlapping gene is a gene whose expressible nucleotide sequence partially overlaps with the expressible nucleotide sequence of another gene. In this way, a nucleotide sequence may make a contribution to the function of one or more gene products. Overlapping genes are present in and a fundamental feature of both cellular and viral genomes. The current definition of an overlapping gene varies significantly between eukaryotes, prokaryotes, and viruses. In prokaryotes and viruses overlap must be between coding sequences but not mRNA transcripts, and is defined when these coding sequences share a nucleotide on either the same or opposite strands. In eukaryotes, gene overlap is almost always defined as mRNA transcript overlap. Specifically, a gene overlap in eukaryotes is defined when at least one nucleotide is shared between the boundaries of the primary mRNA transcripts of two or more genes, such that a DNA base mutation at any point of the overlapping region would affect the transcripts of all genes involved. This definition includes 5′ and 3′ untranslated regions (UTRs) along with introns.

    The Pseudomonas phage F116 holin is a non-characterized holin homologous to one in Neisseria gonorrheae that has been characterized. This protein is the prototype of the Pseudomonasphage F116 holin family, which is a member of the Holin Superfamily II. Bioinformatic analysis of the genome sequence of N. gonorrhoeae revealed the presence of nine probable prophage islands. The genomic sequence of FA1090 identified five genomic regions that are related to dsDNA lysogenic phage. The DNA sequences from NgoPhi1, NgoPhi2 and NgoPhi3 contained regions of identity. A region of NgoPhi2 showed high similarity with the Pseudomonas aeruginosa generalized transducing phage F116. NgoPhi1 and NgoPhi2 encode functionally active phages. The holin gene of NgoPhi1, when expressed in E. coli, could substitute for the phage lambda S gene.

    MHC class III is a group of proteins belonging the class of major histocompatibility complex (MHC). Unlike other MHC types such as MHC class I and MHC class II, of which their structure and functions in immune response are well defined, MHC class III are poorly defined structurally and functionally. They are not involved in antigen binding. Only few of them are actually involved in immunity while many are signalling molecules in other cell communications. They are mainly known from their genes because their gene cluster is present between those of class I and class II. The gene cluster was discovered when genes were found in between class I and class II genes on the short (p) arm of human chromosome 6. It was later found that it contains many genes for different signaling molecules such as tumour necrosis factors (TNFs) and heat shock proteins. More than 60 MHC class III genes are described, which is about 28% of the total MHC genes (224). The region previously considered within MHC class III gene cluster that contains genes for TNFs is now known as MHC class IV or inflammatory region.

    Diversity-generating retroelements (DGRs) are a family of retroelements that were first found in Bordetella phage (BPP-1), and since been found in bacteria, Archaea, Archaean viruses, temperate phages, and lytic phages. DGRs benefit their host by mutating particular regions of specific target proteins, for instance, phage tail fiber in BPP-1, lipoprotein in legionella pneumophila, and TvpA in Treponema denticola . An error-prone reverse transcriptase is responsible for generating these hypervariable regions in target proteins. In mutagenic retrohoming, a mutagenized cDNA is reverse transcribed from a template region (TR), and is replaced with a segment similar to the template region called variable region (VR). Accessory variability determinant (Avd) protein is another component of DGRs, and its complex formation with the error-prone RT is of importance to mutagenic rehoming.

    References

    1. Ohara, O. (2009). "ORFeome Cloning". Reverse Chemical Genetics. Methods in Molecular Biology. Vol. 577. pp. 3–9. doi:10.1007/978-1-60761-232-2_1. ISBN   978-1-60761-231-5. PMID   19718504.
    2. Titz B, Rajagopala SV, Goll J, Häuser R, McKevitt MT, Palzkill T, Uetz P (2008). Hall N (ed.). "The binary protein interactome of Treponema pallidum--the syphilis spirochete". PLOS ONE. 3 (5): e2292. Bibcode:2008PLoSO...3.2292T. doi: 10.1371/journal.pone.0002292 . PMC   2386257 . PMID   18509523. Open Access logo PLoS transparent.svg
    3. Uetz P, Rajagopala SV, Dong YA, Haas J (Oct 2004). "From ORFeomes to protein interaction maps in viruses". Genome Research. 14 (10B): 2029–33. doi: 10.1101/gr.2583304 . PMID   15489322.
    4. McKevitt, Matthew; Brinkman, Mary Beth; McLoughlin, Melanie; Perez, Carla; Howell, Jerrilyn K.; Weinstock, George M.; Norris, Steven J.; Palzkill, Timothy (2005-07-01). "Genome scale identification of Treponema pallidum antigens". Infection and Immunity. 73 (7): 4445–4450. doi:10.1128/IAI.73.7.4445-4450.2005. ISSN   0019-9567. PMC   1168556 . PMID   15972547.
    5. Viadas C, Rodríguez MC, García-Lobo JM, Sangari FJ, López-Goñi I (Oct 2009). "Construction and evaluation of an ORFeome-based Brucella whole-genome DNA microarray". Microbial Pathogenesis. 47 (4): 189–95. doi:10.1016/j.micpath.2009.06.002. PMID   19524659.
    6. Maier CJ, Maier RH, Virok DP, Maass M, Hintner H, Bauer JW, Onder K (2012). "Construction of a highly flexible and comprehensive gene collection representing the ORFeome of the human pathogen Chlamydia pneumoniae". BMC Genomics. 13: 632. doi: 10.1186/1471-2164-13-632 . PMC   3534531 . PMID   23157390.
    7. Rajagopala SV, Yamamoto N, Zweifel AE, Nakamichi T, Huang HK, Mendez-Rios JD, Franca-Koh J, Boorgula MP, Fujita K, Suzuki K, Hu JC, Wanner BL, Mori H, Uetz P (2010). "The Escherichia coli K-12 ORFeome: a resource for comparative molecular microbiology". BMC Genomics. 11: 470. doi: 10.1186/1471-2164-11-470 . PMC   3091666 . PMID   20701780.
    8. Brettin T, Altherr MR, Du Y, Mason RM, Friedrich A, Potter L, Langford C, Keller TJ, Jens J, Howie H, Weyand NJ, Clary S, Prichard K, Wachocki S, Sodergren E, Dillard JP, Weinstock G, So M, Arvidson CG (2005). "Expression capable library for studies of Neisseria gonorrhoeae, version 1.0". BMC Microbiology. 5: 50. doi: 10.1186/1471-2180-5-50 . PMC   1236931 . PMID   16137322.
    9. Labaer J, Qiu Q, Anumanthan A, Mar W, Zuo D, Murthy TV, Taycher H, Halleck A, Hainsworth E, Lory S, Brizuela L (Oct 2004). "The Pseudomonas aeruginosa PA01 gene collection". Genome Research. 14 (10B): 2190–200. doi:10.1101/gr.2482804. PMC   528936 . PMID   15489342.
    10. Brandner CJ, Maier RH, Henderson DS, Hintner H, Bauer JW, Onder K (2008). "The ORFeome of Staphylococcus aureus v 1.1". BMC Genomics. 9: 321. doi: 10.1186/1471-2164-9-321 . PMC   2474624 . PMID   18605992.
    11. Fossum E, Friedel CC, Rajagopala SV, Titz B, Baiker A, Schmidt T, Kraus T, Stellberger T, Rutenberg C, Suthram S, Bandyopadhyay S, Rose D, von Brunn A, Uhlmann M, Zeretzke C, Dong YA, Boulet H, Koegl M, Bailer SM, Koszinowski U, Ideker T, Uetz P, Zimmer R, Haas J (Sep 2009). Sun R (ed.). "Evolutionarily conserved herpesviral protein interaction networks". PLOS Pathogens. 5 (9): e1000570. doi: 10.1371/journal.ppat.1000570 . PMC   2731838 . PMID   19730696.
    12. Lamesch P, Li N, Milstein S, Fan C, Hao T, Szabo G, Hu Z, Venkatesan K, Bethel G, Martin P, Rogers J, Lawlor S, McLaren S, Dricot A, Borick H, Cusick ME, Vandenhaute J, Dunham I, Hill DE, Vidal M (Mar 2007). "hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes". Genomics. 89 (3): 307–15. doi:10.1016/j.ygeno.2006.11.012. PMC   4647941 . PMID   17207965.
    13. http://horfdb.dfci.harvard.edu/ Human ORFeome 2011 Release