Orphan gene

Last updated

Orphan genes, ORF ans, [1] [2] or taxonomically restricted genes (TRGs) [3] are genes that lack a detectable homologue outside of a given species or lineage. [2] Most genes have known homologues. Two genes are homologous when they share an evolutionary history, and the study of groups of homologous genes allows for an understanding of their evolutionary history and divergence. Common mechanisms that have been uncovered as sources for new genes through studies of homologues include gene duplication, exon shuffling, gene fusion and fission, etc. [4] [5] Studying the origins of a gene becomes more difficult when there is no evident homologue. [6] The discovery that about 10% or more of the genes of the average microbial species is constituted by orphan genes raises questions about the evolutionary origins of different species as well as how to study and uncover the evolutionary origins of orphan genes.

Contents

In some cases, a gene can be classified as an orphan gene due to undersampling of the existing genome space. While it is possible that homologues exist for a given gene, that gene will still be classified as an orphan if the organisms harbouring homologues have not yet been discovered and had their genomes sequenced and properly annotated. For example, one study of orphan genes across 119 archaeal and bacterial genomes could identify that at least 56% were recently acquired from integrative elements (or mobile genetic elements) from non-cellular sources such as viruses and plasmids that remain to be explored and characterized, and another 7% arise through horizontal gene transfer from distant cellular sources (with an unknown proportion of the remaining 37% potentially coming from still unknown families of integrative elements). [7] In other cases, limitations in computational methods for detecting homologues may result in missed homologous sequences and thus classification of a gene as an orphan. Homology detection failure appears to account for the majority, but not all orphan genes. [8] In other cases, homology between genes may go undetected due to rapid evolution and divergence of one or both of these genes from each other to the point where they do not meet the criteria used to classify genes as evidently homologous by computational methods. One analysis suggests that divergence accounts for a third of orphan gene identifications in eukaryotes. [9] When homologous genes exist but are simply undetected, the emergence of these orphan genes can be explained by well-characterized phenomena such as genomic recombination, exon shuffling, gene duplication and divergence, etc. Orphan genes may also simply lack true homologues and in such cases have an independent origins via de novo gene birth, which tends to be a more recent event. [2] These processes may act at different rates in insects, primates, and plants. [10] Despite their relatively recent origin, orphan genes may encode functionally important proteins. [11] [12] Characteristics of orphan genes include AT richness, relatively recent origins, taxonomic restriction to a single genome, elevated evolution rates, and shorter sequences. [13]

Some approaches characterize all microbial genes as part of one of two classes of genes. One class is characterized by conservation or partial conservation across lineages, whereas the other (represented by orphan genes) is characterized by evolutionarily instantaneous rates of gene turnover/replacement with a negligible effect on fitness when such genes are either gained or lost. These orphan genes primarily derive from mobile genetic elements and tend to be 'passively selfish', often devoid of cellular functions (which is why they experience little selective pressure in their gain or loss from genomes) but persist in the biosphere due to their transient movement across genomes. [14] [15]

Evolution

Orphan genes evolve more rapidly than other genes, following their emergence via de novo gene birth, or horizontal gene transfer. However, there is also a tendency for rapidly evolving genes to be incorrectly classed as orphans due to undetectable homology. [8]

History

Orphan genes were first discovered when the yeast genome-sequencing project began in 1996. [2] Orphan genes accounted for an estimated 26% of the yeast genome, but it was believed that these genes could be classified with homologues when more genomes were sequenced. [3] At the time, gene duplication was considered the only serious model of gene evolution [2] [4] [16] and there were few sequenced genomes for comparison, so a lack of detectable homologues was thought to be most likely due to a lack of sequencing data and not due to a true lack of homology. [3] However, orphan genes continued to persist as the quantity of sequenced genomes grew, [3] [17] eventually leading to the conclusion that orphan genes are ubiquitous to all genomes. [2] Estimates of the percentage of genes which are orphans varies enormously between species and between studies; 10-30% is a commonly cited figure. [3]

The study of orphan genes emerged largely after the turn of the century. In 2003, a study of Caenorhabditis briggsae and related species compared over 2000 genes. [3] They proposed that these genes must be evolving too quickly to be detected and are consequently sites of very rapid evolution. [3] In 2005, Wilson examined 122 bacterial species to try to examine whether the large number of orphan genes in many species was legitimate. [17] The study found that it was legitimate and played a role in bacterial adaptation. The definition of taxonomically-restricted genes was introduced into the literature to make orphan genes seem less "mysterious." [17]

In 2008, a yeast protein of established functionality, BSC4, was found to have evolved de novo from non-coding sequences whose homology was still detectable in sister species. [18]

In 2009, an orphan gene was discovered to regulate an internal biological network: the orphan gene, QQS, from Arabidopsis thaliana modifies plant composition. [19] The QQS orphan protein interacts with a conserved transcription factor, these data explain the compositional changes (increased protein) that are induced when QQS is engineered into diverse species. [20] In 2011, a comprehensive genome-wide study of the extent and evolutionary origins of orphan genes in plants was conducted in the model plant Arabidopsis thaliana " [21]

Identification

Genes can be tentatively classified as orphans if no orthologous proteins can be found in nearby species. [10]

One method used to estimate nucleotide or protein sequence similarity indicative of homology (i.e. similarity due to common origin) is the Basic Local Alignment Search Tool (BLAST). BLAST allows query sequences to be rapidly searched against large sequence databases. [22] [23] Simulations suggest that under certain conditions BLAST is suitable for detecting distant relatives of a gene. [24] However, genes that are short and evolve rapidly can easily be missed by BLAST. [25]

The systematic detection of homology to annotate orphan genes is called phylostratigraphy. [26] Phylostratigraphy generates a phylogenetic tree in which the homology is calculated between all genes of a focal species and the genes of other species. The earliest common ancestor for a gene determines the age, or phylostratum, of the gene. The term "orphan" is sometimes used only for the youngest phylostratum containing only a single species, but when interpreted broadly as a taxonomically-restricted gene, it can refer to all but the oldest phylostratum, with the gene orphaned within a larger clade.

Homology detection failure accounts for a majority of classified orphan genes. [8] Some scientists have attempted to recover some homology by using more sensitive methods, such as remote homology detection. In one study, remote homology detection techniques were used to demonstrate that a sizable fraction of orphan genes (over 15%) still exhibited remote homology despite being missed by conventional homology detection techniques, and that their functions were often related to the functions of nearby genes at genomic loci. [27]

Many DNA annotation methods rely on homology, causing orphan genes will be missed. Convolutional neural networks (CNNs) have been trained on known gene families, and then applied to predict orphan genes in unannotated sequences. Tools include DeepGene and DeepBind, which predict the sequence specificities of DNA- and RNA-binding proteins.

Case studies on identifying orphan genes in various organisms often reveal both challenges and breakthroughs in genetic research. For example, a study on the Drosophila melanogaster fruit fly identified orphan genes that are essential for survival, showing that some of these genes could have species-specific functions. Another case in rice plants highlighted how orphan genes might contribute to unique agricultural traits and stress responses. These studies emphasize the complexity of evolutionary biology and the potential of orphan genes in understanding species adaptation and innovation, helping scientists to tackle the challenges of gene annotation and function prediction.

In the scientific community, there's ongoing debate regarding the criteria for classifying a gene as an orphan. Some researchers argue for strict definitions based on the complete absence of homologous sequences in any other species, while others propose a more relaxed approach that allows for distant or weak similarities. This debate is fueled by advancements in genomic technologies and bioinformatics, which can detect ever more subtle genetic relationships. Determining these criteria impacts not only the classification but also evolutionary studies and the understanding of gene function in biological processes. These discussions are crucial for refining our understanding of genomic uniqueness and evolutionary biology.

Sources

Orphan genes arise from multiple sources, predominantly through de novo origination, duplication and rapid divergence, and horizontal gene transfer. [2]

De novo gene birth

Novel orphan genes continually arise de novo from non-coding sequences. [28] These novel genes may be sufficiently beneficial to be swept to fixation by selection. Or, more likely, they will fade back into the non-genic background. This latter option is supported by research in Drosophila showing that young genes are more likely go extinct. [29]

De novo genes were once thought to be a near impossibility due to the complex and potentially fragile intricacies of creating and maintaining functional polypeptides, [16] but research from the past 10 years or so has found multiple examples of de novo genes, some of which are associated with important biological processes, particularly testes function in animals. De novo genes were also found in fungi and plants. [18] [30] [31] [5] [32] [33] [11] [34]

For young orphan genes, it is sometimes possible to find homologous non-coding DNA sequences in sister taxa, which is generally accepted as strong evidence of de novo origin. However, the contribution of de novo origination to taxonomically-restricted genes of older origin, particularly in relation to the traditional gene duplication theory of gene evolution, remains contested. [35] [36] Logistically, de novo origination is much easier for RNA genes than protein-coding ones and Nathan H. Lents and colleagues recently reported the existence of several young microRNA genes on human chromosome 21. [37]

Duplication and divergence

The duplication and divergence model for orphan genes involves a new gene being created from some duplication or divergence event and undergoing a period of rapid evolution where all detectable similarity to the originally duplicated gene is lost. [2] While this explanation is consistent with current understandings of duplication mechanisms, [2] the number of mutations needed to lose detectable similarity is large enough as to be a rare event, [2] [24] and the evolutionary mechanism by which a gene duplicate could be sequestered and diverge so rapidly remains unclear. [2] [38]

Horizontal gene transfer

Another explanation for how orphan genes arise is through a duplication mechanism called horizontal gene transfer, where the original duplicated gene derives from a separate, unknown lineage. [2] This explanation for the origin of orphan genes is especially relevant in bacteria and archaea, where horizontal gene transfer is common.

Protein characteristics

Orphans genes tend to be very short (~6 times shorter than mature genes), and some are weakly expressed, tissue specific and simpler in codon usage and amino acid composition. [39] Orphan genes tend to encode more intrinsically disordered proteins, [40] [41] [42] although some structure has been found in one of the best characterized orphan genes. [43] Of the tens of thousands of enzymes of primary or specialized metabolism that have been characterized to date, none are orphans, or even of restricted lineage; apparently, catalysis requires hundreds of millions of years of evolution. [39]

Biological functions

Orphan genes, which have no detectable homologs in other species, represent a fascinating area of study in genomics. Their evolutionary role and biological significance remain subjects of ongoing research and debate.

Emergence and Controversy

Some scientists propose that many orphan genes may not play a direct evolutionary role. They argue that genomes contain non-functional open reading frames (ORFs) which might produce spurious polypeptides not maintained by natural selection. Such genes are likely to be unique to a species because they do not undergo conservation across species, hence are categorized as orphan genes. [44]

The diagram categorizes key aspects of orphan gene (OG) study into distinct sections, outlining challenges like lack of sequence similarity and identifiable motifs. It discusses strategies for OG research, such as developing web interfaces and using CRISPR for high-throughput knockouts. The functions of OGs, including their role in stress responses and species-specific traits, are emphasized. Research methods range from selection screening to interaction screens, and future directions aim to explore OGs' roles in intraspecific differences and evolutionary processes. Orphan Genes.jpg
The diagram categorizes key aspects of orphan gene (OG) study into distinct sections, outlining challenges like lack of sequence similarity and identifiable motifs. It discusses strategies for OG research, such as developing web interfaces and using CRISPR for high-throughput knockouts. The functions of OGs, including their role in stress responses and species-specific traits, are emphasized. Research methods range from selection screening to interaction screens, and future directions aim to explore OGs' roles in intraspecific differences and evolutionary processes.

Functional Significance Through Research

Contrary to the view that they are evolutionary noise, emerging studies have illustrated the functional importance of orphan genes:

These examples confirm the functionality of some orphan genes but also suggest their potential involvement in the emergence of novel phenotypes, thereby contributing to species-specific adaptations.

Implications

Orphan genes have garnered interest across multiple scientific disciplines such as evolutionary biology and medicine, due to their nature and potential implications. [46]

In evolutionary biology, orphan genes diverge from traditional models of gene evolution and provide valuable insights into the process of de novo gene origination and lineage-specific adaptation. The term "de novo gene" specifically denotes the emergence of a functional gene without ancestral genetic material, whether as a protein-coding gene or a functional RNA molecule [47] . This understanding of de novo genes, coupled with the study of orphan genes, enriches the traditional Charles Darwin's model of evolution, also called Darwinism or Darwinian theory, by revealing additional mechanisms through which genetic diversity and adaptation can occur. By clarifying that de novo genes can arise from non-genic sequences and contribute to lineage-specific adaptation, this research expands our understanding of the creative forces of evolution, adding depth and complexity to Darwin's foundational principles.

The following table displays different OGs identified in multiple hosts with their functions. Three orphan genes, Gpr49, KIR2DS3, and C19orf12, were found in humans. Source: https://pubmed.ncbi.nlm.nih.gov/37367481/ Table OGs functions.png
The following table displays different OGs identified in multiple hosts with their functions. Three orphan genes, Gpr49, KIR2DS3, and C19orf12, were found in humans. Source: https://pubmed.ncbi.nlm.nih.gov/37367481/

In medicine, orphan genes represent a rich yet relatively unexplored resource that holds promise for understanding human health and addressing disease. These genes, which lack detectable homologs in other lineages, offer unique opportunities for biomedical research. [46] By elucidating the functions and regulatory mechanisms of orphan genes, researchers can gain insights into various aspects of human health. Orphan genes may play crucial roles in diseases that are poorly understood or have unknown genetic origins. Studying these genes can uncover novel disease mechanisms and therapeutic targets, paving the way for the development of innovative treatment strategies. To name a few, the orphan gene Gpr49, identified in humans, presents itself as a potential novel therapeutic target in combating hepatocellular carcinoma, the predominant form of liver cancer [46] . Furthermore, the gene C19orf12 is implicated in the manifestation of a particular clinical subtype of neurodegeneration characterized by brain iron accumulation [46] . An excerpt from a table listing various orphan genes across diverse species along with their respective functions is shown. [46]

Orphan genes have the potential to serve as biomarkers for disease diagnosis, prognosis, and treatment response. Their lineage-specific nature and expression patterns may provide valuable information for personalized medicine approaches, enabling more accurate and targeted interventions for individuals affected by various diseases. Thus, harnessing the potential of orphan genes in understanding human health has significant implications for advancing biomedical research and improving clinical outcomes.

See also

Related Research Articles

<span class="mw-page-title-main">Mutation</span> Alteration in the nucleotide sequence of a genome

In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

<span class="mw-page-title-main">Pseudogene</span> Functionless relative of a gene

Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by gene duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA.

<span class="mw-page-title-main">Protein family</span> Group of evolutionarily-related proteins

A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be confused with family as it is used in taxonomy.

Gene duplication is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. Gene duplications can arise as products of several types of errors in DNA replication and repair machinery as well as through fortuitous capture by selfish genetic elements. Common sources of gene duplications include ectopic recombination, retrotransposition event, aneuploidy, polyploidy, and replication slippage.

<span class="mw-page-title-main">Gene family</span> Set of several similar genes

A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on different chromosomes, called the α-globin and β-globin loci. These two gene clusters are thought to have arisen as a result of a precursor gene being duplicated approximately 500 million years ago.

<span class="mw-page-title-main">Comparative genomics</span> Field of biological research

Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural landmarks. In this branch of genomics, whole or large parts of genomes resulting from genome projects are compared to study basic biological similarities and differences as well as evolutionary relationships between organisms. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, comparative genomic approaches start with making some form of alignment of genome sequences and looking for orthologous sequences in the aligned genomes and checking to what extent those sequences are conserved. Based on these, genome and molecular evolution are inferred and this may in turn be put in the context of, for example, phenotypic evolution or population genetics.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

<span class="mw-page-title-main">Paleopolyploidy</span> State of having undergone whole genome duplication in deep evolutionary time

Paleopolyploidy is the result of genome duplications which occurred at least several million years ago (MYA). Such an event could either double the genome of a single species (autopolyploidy) or combine those of two species (allopolyploidy). Because of functional redundancy, genes are rapidly silenced or lost from the duplicated genomes. Most paleopolyploids, through evolutionary time, have lost their polyploid status through a process called diploidization, and are currently considered diploids, e.g., baker's yeast, Arabidopsis thaliana, and perhaps humans.

Hox genes, a subset of homeobox genes, are a group of related genes that specify regions of the body plan of an embryo along the head-tail axis of animals. Hox proteins encode and specify the characteristics of 'position', ensuring that the correct structures form in the correct places of the body. For example, Hox genes in insects specify which appendages form on a segment, and Hox genes in vertebrates specify the types and shape of vertebrae that will form. In segmented animals, Hox proteins thus confer segmental or positional identity, but do not form the actual segments themselves.

<span class="mw-page-title-main">Copy number variation</span> Repeated DNA variation between individuals

Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of duplication or deletion event that affects a considerable number of base pairs. Approximately two-thirds of the entire human genome may be composed of repeats and 4.8–9.5% of the human genome can be classified as copy number variations. In mammals, copy number variations play an important role in generating necessary variation in the population as well as disease phenotype.

<span class="mw-page-title-main">Gene</span> Sequence of DNA or RNA that codes for an RNA or protein product

In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA, that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.

mir-2 microRNA precursor

The mir-2 microRNA family includes the microRNA genes mir-2 and mir-13. Mir-2 is widespread in invertebrates, and it is the largest family of microRNAs in the model species Drosophila melanogaster. MicroRNAs from this family are produced from the 3' arm of the precursor hairpin. Leaman et al. showed that the miR-2 family regulates cell survival by translational repression of proapoptotic factors. Based on computational prediction of targets, a role in neural development and maintenance has been suggested.

Genomic phylostratigraphy is a novel genetic statistical method developed in order to date the origin of specific genes by looking at its homologs across species. It was first developed by Ruđer Bošković Institute in Zagreb, Croatia. The system links genes to their founder gene, allowing us to then determine their age. This could help us better understand many evolutionary processes such as patterns of gene birth throughout evolution, or the relationship between the age of a transcriptome throughout embryonic development. Bioinformatic tools like GenEra have been developed to calculate relative gene ages based on genomic phylostratigraphy.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

<span class="mw-page-title-main">Genome evolution</span> Process by which a genome changes in structure or size over time

Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Genome evolution is a constantly changing and evolving field due to the steadily growing number of sequenced genomes, both prokaryotic and eukaryotic, available to the scientific community and the public at large.

An overlapping gene is a gene whose expressible nucleotide sequence partially overlaps with the expressible nucleotide sequence of another gene. In this way, a nucleotide sequence may make a contribution to the function of one or more gene products. Overlapping genes are present in and a fundamental feature of both cellular and viral genomes. The current definition of an overlapping gene varies significantly between eukaryotes, prokaryotes, and viruses. In prokaryotes and viruses overlap must be between coding sequences but not mRNA transcripts, and is defined when these coding sequences share a nucleotide on either the same or opposite strands. In eukaryotes, gene overlap is almost always defined as mRNA transcript overlap. Specifically, a gene overlap in eukaryotes is defined when at least one nucleotide is shared between the boundaries of the primary mRNA transcripts of two or more genes, such that a DNA base mutation at any point of the overlapping region would affect the transcripts of all genes involved. This definition includes 5′ and 3′ untranslated regions (UTRs) along with introns.

Horizontal or lateral gene transfer is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate investigations of the evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages.

<i>De novo</i> gene birth Evolution of novel genes from non-genic DNA sequence

De novo gene birth is the process by which new genes evolve from non-coding DNA. De novo genes represent a subset of novel genes, and may be protein-coding or instead act as RNA genes. The processes that govern de novo gene birth are not well understood, although several models exist that describe possible mechanisms by which de novo gene birth may occur.

Erich Bornberg-Bauer is an Austrian biochemist, theoretical biologist and bioinformatician.

References

  1. Fischer D, Eisenberg D (September 1999). "Finding families for genomic ORFans". Bioinformatics. 15 (9): 759–762. doi: 10.1093/bioinformatics/15.9.759 . PMID   10498776.
  2. 1 2 3 4 5 6 7 8 9 10 11 12 Tautz D, Domazet-Lošo T (August 2011). "The evolutionary origin of orphan genes". Nature Reviews. Genetics. 12 (10): 692–702. doi:10.1038/nrg3053. PMID   21878963. S2CID   31738556.
  3. 1 2 3 4 5 6 7 Khalturin K, Hemmrich G, Fraune S, Augustin R, Bosch TC (September 2009). "More than just orphans: are taxonomically-restricted genes important in evolution?". Trends in Genetics. 25 (9): 404–413. doi:10.1016/j.tig.2009.07.006. PMID   19716618.
  4. 1 2 Ohno S (11 December 2013). Evolution by Gene Duplication. Springer Science & Business Media. ISBN   978-3-642-86659-3.
  5. 1 2 Zhou Q, Zhang G, Zhang Y, Xu S, Zhao R, Zhan Z, et al. (September 2008). "On the origin of new genes in Drosophila". Genome Research. 18 (9): 1446–1455. doi:10.1101/gr.076588.108. PMC   2527705 . PMID   18550802.
  6. Toll-Riera M, Bosch N, Bellora N, Castelo R, Armengol L, Estivill X, Albà MM (March 2009). "Origin of primate orphan genes: a comparative genomics approach". Molecular Biology and Evolution. 26 (3): 603–612. doi: 10.1093/molbev/msn281 . PMID   19064677.
  7. Cortez, Diego; Forterre, Patrick; Gribaldo, Simonetta (2009). "A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes". Genome Biology. 10 (6): R65. doi: 10.1186/gb-2009-10-6-r65 . ISSN   1465-6906. PMC   2718499 . PMID   19531232.
  8. 1 2 3 Weisman CM, Murray AW, Eddy SR (November 2020). "Many, but not all, lineage-specific genes can be explained by homology detection failure". PLOS Biology. 18 (11): e3000862. doi: 10.1371/journal.pbio.3000862 . PMC   7660931 . PMID   33137085.
  9. Vakirlis N, Carvunis AR, McLysaght A (February 2020). "Synteny-based analyses indicate that sequence divergence is not the main source of orphan genes". eLife. 9. doi: 10.7554/eLife.53500 . PMC   7028367 . PMID   32066524.
  10. 1 2 Wissler L, Gadau J, Simola DF, Helmkampf M, Bornberg-Bauer E (2013). "Mechanisms and dynamics of orphan gene emergence in insect genomes". Genome Biology and Evolution. 5 (2): 439–455. doi:10.1093/gbe/evt009. PMC   3590893 . PMID   23348040.
  11. 1 2 Reinhardt JA, Wanjiru BM, Brant AT, Saelao P, Begun DJ, Jones CD (17 October 2013). "De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences". PLOS Genetics. 9 (10): e1003860. doi: 10.1371/journal.pgen.1003860 . PMC   3798262 . PMID   24146629.
  12. Suenaga Y, Islam SM, Alagu J, Kaneko Y, Kato M, Tanaka Y, et al. (January 2014). "NCYM, a Cis-antisense gene of MYCN, encodes a de novo evolved protein that inhibits GSK3β resulting in the stabilization of MYCN in human neuroblastomas". PLOS Genetics. 10 (1): e1003996. doi: 10.1371/journal.pgen.1003996 . PMC   3879166 . PMID   24391509.
  13. Yu G, Stoltzfus A (2012). "Population diversity of ORFan genes in Escherichia coli". Genome Biology and Evolution. 4 (11): 1176–87. doi:10.1093/gbe/evs081. PMC   3514957 . PMID   23034216.
  14. Wolf YI, Makarova KS, Lobkovsky AE, Koonin EV (November 2016). "Two fundamentally different classes of microbial genes". Nature Microbiology. 2 (3): 16208. doi:10.1038/nmicrobiol.2016.208. PMID   27819663. S2CID   21799266.
  15. Koonin EV, Makarova KS, Wolf YI (July 2021). "Evolution of Microbial Genomics: Conceptual Shifts over a Quarter Century". Trends in Microbiology. 29 (7): 582–592. doi:10.1016/j.tim.2021.01.005. PMC   9404256 . PMID   33541841. S2CID   231820647.
  16. 1 2 Jacob F (June 1977). "Evolution and tinkering". Science. 196 (4295): 1161–1166. Bibcode:1977Sci...196.1161J. doi:10.1126/science.860134. PMID   860134.
  17. 1 2 3 Wilson GA, Bertrand N, Patel Y, Hughes JB, Feil EJ, Field D (August 2005). "Orphans as taxonomically restricted and ecologically important genes". Microbiology. 151 (Pt 8): 2499–2501. doi: 10.1099/mic.0.28146-0 . PMID   16079329.
  18. 1 2 Cai J, Zhao R, Jiang H, Wang W (May 2008). "De novo origination of a new protein-coding gene in Saccharomyces cerevisiae". Genetics. 179 (1): 487–496. doi:10.1534/genetics.107.084491. PMC   2390625 . PMID   18493065.
  19. 1 2 Li L, Foster CM, Gan Q, Nettleton D, James MG, Myers AM, Wurtele ES (May 2009). "Identification of the novel protein QQS as a component of the starch metabolic network in Arabidopsis leaves". The Plant Journal. 58 (3): 485–498. doi: 10.1111/j.1365-313X.2009.03793.x . PMID   19154206.
  20. Li L, Zheng W, Zhu Y, Ye H, Tang B, Arendsee ZW, et al. (November 2015). "QQS orphan gene regulates carbon and nitrogen partitioning across species via NF-YC interactions". Proceedings of the National Academy of Sciences of the United States of America. 112 (47): 14734–14739. Bibcode:2015PNAS..11214734L. doi: 10.1073/pnas.1514670112 . PMC   4664325 . PMID   26554020.
  21. Donoghue MT, Keshavaiah C, Swamidatta SH, Spillane C (February 2011). "Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana". BMC Evolutionary Biology. 11 (1): 47. Bibcode:2011BMCEE..11...47D. doi: 10.1186/1471-2148-11-47 . PMC   3049755 . PMID   21332978.
  22. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (September 1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". Nucleic Acids Research. 25 (17): 3389–3402. doi:10.1093/nar/25.17.3389. PMC   146917 . PMID   9254694.
  23. "NCBI BLAST homepage". National Center for Biotechnology Information. National Institutes of Health, U.S. Department of Health and Human Services.
  24. 1 2 Albà MM, Castresana J (April 2007). "On homology searches by protein Blast and the characterization of the age of genes". BMC Evolutionary Biology. 7 (1): 53. Bibcode:2007BMCEE...7...53A. doi: 10.1186/1471-2148-7-53 . PMC   1855329 . PMID   17408474.
  25. Moyers BA, Zhang J (January 2015). "Phylostratigraphic bias creates spurious patterns of genome evolution". Molecular Biology and Evolution. 32 (1): 258–267. doi:10.1093/molbev/msu286. PMC   4271527 . PMID   25312911.
  26. Domazet-Loso T, Brajković J, Tautz D (November 2007). "A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages". Trends in Genetics. 23 (11): 533–539. doi:10.1016/j.tig.2007.08.014. PMID   18029048.
  27. Lobb B, Kurtz DA, Moreno-Hagelsieb G, Doxey AC (2015). "Remote homology and the functions of metagenomic dark matter". Frontiers in Genetics. 6: 234. doi: 10.3389/fgene.2015.00234 . PMC   4508852 . PMID   26257768.
  28. McLysaght A, Guerzoni D (September 2015). "New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation". Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 370 (1678): 20140332. doi:10.1098/rstb.2014.0332. PMC   4571571 . PMID   26323763.
  29. Palmieri N, Kosiol C, Schlötterer C (February 2014). "The life cycle of Drosophila orphan genes". eLife. 3: e01311. arXiv: 1401.4956 . doi: 10.7554/eLife.01311 . PMC   3927632 . PMID   24554240.
  30. Zhao L, Saelao P, Jones CD, Begun DJ (February 2014). "Origin and spread of de novo genes in Drosophila melanogaster populations". Science. 343 (6172): 769–772. Bibcode:2014Sci...343..769Z. doi:10.1126/science.1248286. PMC   4391638 . PMID   24457212.
  31. Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ (June 2006). "Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression". Proceedings of the National Academy of Sciences of the United States of America. 103 (26): 9935–9939. Bibcode:2006PNAS..103.9935L. doi: 10.1073/pnas.0509809103 . PMC   1502557 . PMID   16777968.
  32. Heinen TJ, Staubach F, Häming D, Tautz D (September 2009). "Emergence of a new gene from an intergenic region". Current Biology. 19 (18): 1527–1531. Bibcode:2009CBio...19.1527H. doi: 10.1016/j.cub.2009.07.049 . PMID   19733073.
  33. Chen S, Zhang YE, Long M (December 2010). "New genes in Drosophila quickly become essential". Science. 330 (6011): 1682–1685. Bibcode:2010Sci...330.1682C. doi:10.1126/science.1196380. PMC   7211344 . PMID   21164016.
  34. Silveira AB, Trontin C, Cortijo S, Barau J, Del Bem LE, Loudet O, et al. (April 2013). "Extensive natural epigenetic variation at a de novo originated gene". PLOS Genetics. 9 (4): e1003437. doi: 10.1371/journal.pgen.1003437 . PMC   3623765 . PMID   23593031.
  35. Neme R, Tautz D (March 2014). "Evolution: dynamics of de novo gene emergence". Current Biology. 24 (6): R238–R240. Bibcode:2014CBio...24.R238N. doi: 10.1016/j.cub.2014.02.016 . PMID   24650912.
  36. Moyers BA, Zhang J (May 2016). "Evaluating Phylostratigraphic Evidence for Widespread De Novo Gene Birth in Genome Evolution". Molecular Biology and Evolution. 33 (5): 1245–1256. doi:10.1093/molbev/msw008. PMC   5010002 . PMID   26758516.
  37. Hunter R. Johnson; Jessica A. Blandino; Beatriz C. Mercado; José A. Galván; William J. Higgins; Nathan H. Lents (June 2022). "The evolution of de novo human-specific microRNA genes on chromosome 21". American Journal of Biological Anthropology. 178 (2): 223–243. doi:10.1002/ajpa.24504. S2CID   247240062.
  38. Lynch M, Katju V (November 2004). "The altered evolutionary trajectories of gene duplicates". Trends in Genetics. 20 (11): 544–549. CiteSeerX   10.1.1.335.7718 . doi:10.1016/j.tig.2004.09.001. PMID   15475113.
  39. 1 2 Arendsee ZW, Li L, Wurtele ES (November 2014). "Coming of age: orphan genes in plants". Trends in Plant Science. 19 (11): 698–708. doi: 10.1016/j.tplants.2014.07.003 . PMID   25151064.
  40. Mukherjee S, Panda A, Ghosh TC (June 2015). "Elucidating evolutionary features and functional implications of orphan genes in Leishmania major". Infection, Genetics and Evolution. 32: 330–337. doi:10.1016/j.meegid.2015.03.031. PMID   25843649.
  41. Wilson BA, Foy SG, Neme R, Masel J (June 2017). "Young Genes are Highly Disordered as Predicted by the Preadaptation Hypothesis of De Novo Gene Birth". Nature Ecology & Evolution. 1 (6): 0146–146. Bibcode:2017NatEE...1..146W. doi:10.1038/s41559-017-0146. PMC   5476217 . PMID   28642936.
  42. Willis S, Masel J (September 2018). "Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes". Genetics. 210 (1): 303–313. doi:10.1534/genetics.118.301249. PMC   6116962 . PMID   30026186.
  43. 1 2 Bungard D, Copple JS, Yan J, Chhun JJ, Kumirov VK, Foy SG, et al. (November 2017). "Foldability of a Natural De Novo Evolved Protein". Structure. 25 (11): 1687–1696.e4. doi:10.1016/j.str.2017.09.006. PMC   5677532 . PMID   29033289.
  44. Guerra-Almeida, Diego; Nunes-da-Fonseca, Rodrigo (20 October 2020). "Small Open Reading Frames: How Important Are They for Molecular Evolution?". Frontiers in Genetics. 11. doi: 10.3389/fgene.2020.574737 . PMC   7606980 . PMID   33193682.
  45. Lehmann, M.; Siegmund, T.; Lintermann, K. G.; Korge, G. (23 October 1998). "The pipsqueak protein of Drosophila melanogaster binds to GAGA sequences through a novel DNA-binding domain". The Journal of Biological Chemistry. 273 (43): 28504–28509. doi: 10.1074/jbc.273.43.28504 . ISSN   0021-9258. PMID   9774480.
  46. 1 2 3 4 5 Fakhar, A. Z., Liu, J., Pajerowska-Mukhtar, K. M., & Mukhtar, M. S. (Year). "The Lost and Found: Unraveling the Functions of Orphan Genes." Journal Name, Volume(Issue), Page numbers.
  47. Schmitz, J. F., & Bornberg-Bauer, E. (2017). Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA. F1000Res, 6, 57. doi: 10.12688/f1000research.9736.1.

[1]

  1. Domazet-Loso, Tomislav; Tautz, Diethard (13 October 2003). "An Evolutionary Analysis of Orphan Genes in Drosophila". Genome Research. 13 (10): 2213–2219. doi:10.1101/gr.1311003. PMC   403679 . PMID   14525923.