De novo gene birth

Last updated January 13, 2026

De novo gene birth is the process by which new genes evolve from non-coding DNA.^[1]^[3] De novo genes represent a subset of novel genes, and may be protein-coding or instead act as RNA genes.^[4] The processes that govern de novo gene birth are not well understood, although several models exist that describe possible mechanisms by which de novo gene birth may occur.

Ancient de novo gene birth events are difficult to detect. Most studies of de novo genes to date have thus focused on young genes, typically taxonomically restricted genes (TRGs) that are present in a single species or lineage, including so-called orphan genes, defined as genes that lack any identifiable homolog.

Not all orphan genes arise de novo, but rather through fairly well characterized mechanisms such as gene duplication (including retroposition) or horizontal gene transfer followed by sequence divergence, or by gene fission/fusion.^[5]^[6]

Although de novo gene birth was once viewed as a highly unlikely occurrence,^[7] thousands of examples have now been described,^[8]^[9] and some researchers speculate that de novo gene birth could play a major role in evolutionary innovation, morphological specification, and adaptation,^[10]^[11] probably promoted by their low level of pleiotropy.

Conversely, criticism of the methodology and interpretation used to identify de novo gene candidates has appeared in the literature, including arguments that some reported cases may reflect artefacts of homology detection and annotation rather than genuine novelty.^[12]^[13]^[14] Rapid evolutionary divergence, along with the multiple and parallel loss of orthologs in some lineages, can give the false appearance of the emergence of a novel gene from scratch.^[15] Recent syntheses highlight both rapid progress and ongoing debates in de novo gene research, including challenges in identification, functional validation, and links to microproteins and random-peptide experiments.^[16]^[17]^[18]^[19]

History

As early as the 1930s, J. B. S. Haldane and others suggested that copies of existing genes may lead to new genes with novel functions.^[6] In 1970, Susumu Ohno published the seminal text Evolution by Gene Duplication.^[20] For some time subsequently, the consensus view was that virtually all genes were derived from ancestral genes,^[21] with François Jacob famously remarking in a 1977 essay that "the probability that a functional protein would appear de novo by random association of amino acids is practically zero."^[7]

In the same year, however, Pierre-Paul Grassé coined the term "overprinting" to describe the emergence of genes through the expression of alternative open reading frames (ORFs) that overlap preexisting genes.^[22] These new ORFs may be out of frame with or antisense to the preexisting gene. They may also be in frame with the existing ORF, creating a truncated version of the original gene, or represent 3' extensions of an existing ORF into a nearby ORF. The first two types of overprinting may be thought of as a particular subtype of de novo gene birth; although overlapping with a previously coding region of the genome, the primary amino-acid sequence of the new protein is entirely novel and derived from a frame that did not previously contain a gene. The first examples of this phenomenon in bacteriophages were reported in a series of studies from 1976 to 1978,^[23]^[24]^[25] and since then numerous other examples have been identified in viruses, bacteria, and several eukaryotic species.^[26]^[27]^[28]^[29]^[30]^[31]

The phenomenon of exonization also represents a special case of de novo gene birth, in which, for example, often-repetitive intronic sequences acquire splice sites through mutation, leading to de novo exons. This was first described in 1994 in the context of Alu sequences found in the coding regions of primate mRNAs.^[32] Interestingly, such de novo exons are frequently found in minor splice variants, which may allow the evolutionary "testing" of novel sequences while retaining the functionality of the major splice variant(s).^[33]

Still, it was thought by some that most or all eukaryotic proteins were constructed from a constrained pool of "starter type" exons.^[34] Using the sequence data available at the time, a 1991 review estimated the number of unique, ancestral eukaryotic exons to be < 60,000,^[34] while in 1992 a piece was published estimating that the vast majority of proteins belonged to no more than 1,000 families.^[35] Around the same time, however, the sequence of chromosome III of the budding yeast Saccharomyces cerevisiae was released,^[36] representing the first time an entire chromosome from any eukaryotic organism had been sequenced. Sequencing of the entire yeast nuclear genome was then completed by early 1996 through a massive, collaborative international effort.^[37] In his review of the yeast genome project, Bernard Dujon noted that the unexpected abundance of genes lacking any known homologs was perhaps the most striking finding of the entire project.^[37]

In 2006 and 2007, a series of studies provided arguably the first documented examples of de novo gene birth that did not involve overprinting.^[38]^[39]^[40] These studies were conducted using the accessory gland transcriptomes of Drosophila yakuba and Drosophila erecta and they identified 20 putative lineage-restricted genes that appeared unlikely to have resulted from gene duplication.^[40] Levine and colleagues identified and confirmed five de novo candidate genes specific to Drosophila melanogaster and/or the closely related Drosophila simulans through a rigorous approach that combined bioinformatic and experimental techniques.^[39]

Since these initial studies, many groups have identified specific cases of de novo gene birth events in diverse organisms.^[41] The first de novo gene identified in yeast, BSC4 gene was identified in S. cerevisiae in 2008. This gene shows evidence of purifying selection, is expressed at both the mRNA and protein levels, and when deleted is synthetically lethal with two other yeast genes, all of which indicate a functional role for the BSC4 gene product.^[42] Historically, one argument against the notion of widespread de novo gene birth is the evolved complexity of protein folding. Interestingly, Bsc4 was later shown to adopt a partially folded state that combines properties of native and non-native protein folding.^[43] In plants, the first de novo gene to be functionally characterized was QQS, an Arabidopsis thaliana gene identified in 2009 that regulates carbon and nitrogen metabolism.^[44] The first functionally characterized de novo gene identified in mice, a noncoding RNA gene, was also described in 2009.^[45] In primates, a 2008 informatic analysis estimated that 15/270 primate orphan genes had been formed de novo.^[46] A 2009 report identified the first three de novo human genes, one of which is a therapeutic target in chronic lymphocytic leukemia.^[47] This was, however, shown to lack sufficient evidence for translation and it is regarded as a lncrna.^[48] Since this time, a plethora of genome-level studies have identified large numbers of orphan genes in many organisms, although the extent to which they arose de novo, and the degree to which they can be deemed functional, remain debated.

Novel genes

In general, genes without detectable homologs can be summarized under the term novel genes. These genes can also be called orphan genes , or — more precisely — species-/lineage-specific genes. The term de novo describes a specific subclass of novel genes, namely genes emerging from non-genic sequences.

^[5]

A key caveat is that orphan (taxonomically restricted) genes are heterogeneous in origin and age—lack of detectable homology can reflect multiple processes (including horizontal transfer, transposable element domestication, overprinting, or extreme divergence) and is not, by itself, evidence of de novo origin.^[49]

Identification

Identification of de novo emerging sequences

There are two major approaches to the systematic identification of novel genes: genomic phylostratigraphy ^[50] and synteny-based methods.^[51] Both approaches are widely used, individually or in a complementary fashion. To standardise terminology for translated non-canonical ORFs (often implicated in de novo gene studies), a community proposal introduced the term “translons” to denote all translated regions detected by approaches such as ribosome profiling.^[52]

Genomic phylostratigraphy

Genomic phylostratigraphy involves examining each gene in a focal, or reference, species and inferring the presence or absence of ancestral homologs through the use of the BLAST sequence alignment algorithms^[53] or related tools. Each gene in the focal species can be assigned an age (aka "conservation level" or "genomic phylostratum") that is based on a predetermined phylogeny, with the age corresponding to the most distantly related species in which a homolog is detected.^[50] When a gene lacks any detectable homolog outside of its own genome, or close relatives, it is said to be a novel, taxonomically restricted or orphan gene.

Phylostratigraphy is limited by the set of closely related genomes that are available, and results are dependent on BLAST search criteria.^[54] In addition, it is often difficult to determine based on lack of observed sequence similarity whether a novel gene has emerged de novo or has diverged from an ancestral gene beyond recognition, for instance following a duplication event. This was pointed out by a study that simulated the evolution of genes of equal age and found that distant orthologs can be undetectable for rapidly evolving genes.^[55] On the other hand, when accounting for changes in the rate of evolution in young regions of genes, a phylostratigraphic approach was more accurate at assigning gene ages in simulated data.^[56] Subsequent studies using simulated evolution found that phylostratigraphy failed to detect an ortholog in the most distantly related species for 13.9% of D. melanogaster genes and 11.4% of S. cerevisiae genes.^[57]^[58] However, a reanalysis of studies that used phylostratigraphy in yeast, fruit flies and humans found that even when accounting for such error rates and excluding difficult-to-stratify genes from the analyses, the qualitative conclusions were unaffected.^[59] The impact of phylostratigraphic bias on studies examining various features of de novo genes remains debated. Because some “orphan” genes may be ancient but diverged beyond recognition, machine-learning classifiers trained on patterns of sub-threshold similarity-search hits have been proposed to estimate which orphans are more consistent with extreme divergence rather than true de novo origin.^[60]

Synteny-based approaches

Synteny-based approaches use order and relative positioning of genes (or other features) to identify the potential ancestors of candidate de novo genes.^[11]^[54] Syntenic alignments are anchored by conserved "markers." Genes are the most common marker in defining syntenic blocks, although k-mers and exons are also used.^[61]^[51] Confirmation that the syntenic region lacks coding potential in outgroup species allows a de novo origin to be asserted with higher confidence.^[54] The strongest possible evidence for de novo emergence is the inference of the specific "enabling" mutation(s) that created coding potential, typically through the analysis of smaller sequence regions, termed microsyntenic regions, of closely related species.

One challenge in applying synteny-based methods is that synteny can be difficult to detect across longer timescales. To address this, various optimization techniques have been created, such as using exons clustered irrespective of their specific order to define syntenic blocks^[51] or algorithms that use well-conserved genomic regions to expand microsyntenic blocks.^[62] There are also difficulties associated with applying synteny-based approaches to genome assemblies that are fragmented^[63] or in lineages with high rates of chromosomal rearrangements, as is common in insects.^[64] Synteny-based approaches can be applied to genome-wide surveys of de novo genes^[46]^[47]^[65]^[66]^[67]^[68]^[69]^[70] and represent a promising area of algorithmic development for gene birth dating. Some have used synteny-based approaches in combination with similarity searches in an attempt to develop standardized, stringent pipelines^[71] that can be applied to any group of genomes in an attempt to address discrepancies in the various lists of de novo genes that have been generated.

Determination of status

Even when the evolutionary origin of a particular coding sequence has been established, there is still a lack of consensus about what constitutes a genuine de novo gene birth event. One reason for this is a lack of agreement on whether or not the entirety of the sequence must be non-genic in origin. For protein-coding de novo genes, it has been proposed that de novo genes be divided into subtypes based on the proportion of the ORF in question that was derived from a previously noncoding sequence.^[54] Furthermore, for de novo gene birth to occur, the sequence in question must be a gene which has led to a questioning of what constitutes a gene, with some models establishing a strict dichotomy between genic and non-genic sequences, and others proposing a more fluid continuum.^[72]

All definitions of genes are linked to the notion of function, as it is generally agreed that a genuine gene should encode a functional product, be it RNA or protein. There are, however, different views of what constitutes function, depending whether a given sequence is assessed using genetic, biochemical, or evolutionary approaches.^[54]^[73]^[74]^[75] The ambiguity of the concept of 'function' is especially problematic for the de novo gene birth field, where the objects of study are often rapidly evolving.^[75] To address these challenges, the Pittsburgh Model of Function deconstructs 'function' into five meanings to describe the different properties that are acquired by a locus undergoing de novo gene birth: Expression, Capacities, Interactions, Physiological Implications, and Evolutionary Implications.^[75]

It is generally accepted that a genuine de novo gene is expressed in at least some context,^[5] allowing selection to operate, and many studies use evidence of expression as an inclusion criterion in defining de novo genes. The expression of sequences at the mRNA level may be confirmed individually through techniques such as quantitative PCR, or globally through RNA sequencing (RNA-seq). Similarly, expression at the protein level can be determined with high confidence for individual proteins using techniques such as mass spectrometry or western blotting, while ribosome profiling (Ribo-seq) provides a global survey of translation in a given sample. Ideally, to confirm a gene arose de novo, a lack of expression of the syntenic region of outgroup species would also be demonstrated.^[76]

Genetic approaches to detect a specific phenotype or change in fitness upon disruption of a particular sequence, are useful to infer function.^[74] Other experimental approaches, including screens for protein-protein and/or genetic interactions, may also be employed to confirm a biological effect for a particular de novo ORF.

Evolutionary approaches may be employed to infer the existence of a molecular function from computationally derived signatures of selection. In the case of TRGs, one common signature of selection is the ratio of nonsynonymous to synonymous substitutions (dN/dS ratio), calculated from different species from the same taxon. Similarly, in the case of species-specific genes, polymorphism data may be used to calculate a pN/pS ratio from different strains or populations of the focal species. Given that young, species-specific de novo genes lack deep conservation by definition, detecting statistically significant deviations from 1 can be difficult without an unrealistically large number of sequenced strains/populations. An example of this can be seen in Mus musculus, where three very young de novo genes lack signatures of selection despite well-demonstrated physiological roles.^[77] For this reason, pN/pS approaches are often applied to groups of candidate genes, allowing researchers to infer that at least some of them are evolutionarily conserved, without being able to specify which. Other signatures of selection, such as the degree of nucleotide divergence within syntenic regions, conservation of ORF boundaries, or for protein-coding genes, a coding score based on nucleotide hexamer frequencies, have instead been employed.^[78]^[79]

Standardization and computational pipelines

Because candidate de novo gene sets can differ substantially depending on input data (e.g., annotated genomes versus transcriptomes or Ribo-seq–derived ORFs) and filtering criteria, reviews have emphasized the need to clearly document methodological choices and to standardize reporting across studies.^[18] One proposed approach is to record the detection and validation protocol itself in a structured, reusable form, enabling comparisons between studies even when different operational definitions are used.^[80]

Automated workflows have also been developed to make candidate selection and filtering more reproducible. For example, the Nextflow pipeline DENSE identifies taxonomically restricted genes via phylostratigraphy and then filters for de novo candidates using genome comparisons and synteny searches, while allowing users to select among multiple strategies and parameter settings and providing metrics intended to help assess detectability and potential annotation-related biases.^[81]

In addition, ancestral sequence reconstruction has been proposed as a complementary computational approach for testing whether a locus likely had protein-coding capacity in ancestral lineages, thereby helping to distinguish de novo origin from alternative scenarios such as rapid divergence after duplication; however, this approach can yield ambiguous results for some short or weakly conserved candidates and is sensitive to reconstruction uncertainty.^[82]^[83]

Prevalence

Estimates of numbers

Frequency and number estimates of de novo genes in various lineages vary widely and are highly dependent on methodology. Studies may identify de novo genes by phylostratigraphy/BLAST-based methods alone, or may employ a combination of computational techniques, and may or may not assess experimental evidence for expression and/or biological role.^[11] Furthermore, genome-scale analyses may consider all or most ORFs in the genome,^[72] or may instead limit their analysis to previously annotated genes.

The D. melanogaster lineage is illustrative of these differing approaches. An early survey using a combination of BLAST searches performed on cDNA sequences along with manual searches and synteny information identified 72 new genes specific to D. melanogaster and 59 new genes specific to three of the four species in the D. melanogaster species complex. This report found that only 2/72 (~2.8%) of D. melanogaster-specific new genes and 7/59 (~11.9%) of new genes specific to the species complex were derived de novo,^[69] with the remainder arising via duplication/retroposition. Similarly, an analysis of 195 young (<35 million years old) D. melanogaster genes identified from syntenic alignments found that only 16 had arisen de novo.^[67] In contrast, an analysis focused on transcriptomic data from the testes of six D. melanogaster strains identified 106 fixed and 142 segregating de novo genes.^[68] For many of these, ancestral ORFs were identified but were not expressed. A newer study found that up to 39% of orphan genes in the Drosophila clade may have emerged de novo, as they overlap with non-coding regions of the genome.^[84] Highlighting the differences between inter- and intra-species comparisons, a study in natural Saccharomyces paradoxus populations found that the number of de novo polypeptides identified more than doubled when considering intra-species diversity.^[85] In primates, one early study identified 270 orphan genes (unique to humans, chimpanzees, and macaques), of which 15 were thought to have originated de novo.^[46] Later reports identified many more de novo genes in humans alone that are supported by transcriptional and proteomic evidence.^[70]^[86] Studies in other lineages/organisms have also reached different conclusions with respect to the number of de novo genes present in each organism, as well as the specific sets of genes identified. A sample of these large-scale studies is described in the table below.

Generally speaking, it remains debated whether duplication and divergence or de novo gene birth represent the dominant mechanism for the emergence of new genes,^[67]^[69]^[72]^[87]^[88]^[89] in part because de novo genes are likely to both emerge and be lost more frequently than other young genes. In a study on the origin of orphan genes in 3 different eukaryotic lineages, authors found that on average only around 30% of orphan genes can be explained by sequence divergence.^[89]

Dynamics

It is important to distinguish between the frequency of de novo gene birth and the number of de novo genes in a given lineage. If de novo gene birth is frequent, it might be expected that genomes would tend to grow in their gene content over time; however, the gene content of genomes is usually relatively stable.^[11] This implies that a frequent gene death process must balance de novo gene birth, and indeed, de novo genes are distinguished by their rapid turnover relative to established genes. In support of this notion, recently emerged Drosophila genes are much more likely to be lost, primarily through pseudogenization, with the youngest orphans being lost at the highest rate;^[90] this is despite the fact that some Drosophila orphan genes have been shown to rapidly become essential.^[67] A similar trend of frequent loss among young gene families was observed in the nematode genus Pristionchus .^[91] Similarly, an analysis of five mammalian transcriptomes found that most ORFs in mice were either very old or species specific, implying frequent birth and death of de novo transcripts.^[88] A comparable trend could be shown by further analyses of six primate transcriptomes.^[86] In wild S. paradoxus populations, de novo ORFs emerge and are lost at similar rates.^[85] Nevertheless, there remains a positive correlation between the number of species-specific genes in a genome and the evolutionary distance from its most recent ancestor.^[92]^[84] A rapid gain and loss of de novo genes was also found on a population level by analyzing nine natural three-spined stickleback populations.^[93] In addition to the birth and death of de novo genes at the level of the ORF, mutational and other processes also subject genomes to constant "transcriptional turnover". One study in murines found that while all regions of the ancestral genome were transcribed at some point in at least one descendant, the portion of the genome under active transcription in a given strain or subspecies is subject to rapid change.^[94] The transcriptional turnover of noncoding RNA genes is particularly fast compared to coding genes.^[95] De novo open reading frames are expected to undergo substantial early turnover, because neutral-evolution modelling predicts frequent stop-gain losses and only rare, substantial ORF-length increases.^[96]

Examples de novo genes


Organism/Lineage	Gene	Evidence of de novo origin	Evidence of selection	Phenotypic evidence	Year discovered	Notes	Ref.
Arabidopsis thaliana	QQS		N/A	Excess leaf starch in RNAi knockdowns	2009		^[44]
Drosophila	CG9284	Syntenic alignments of 12 Drosophila species		RNAi knockdown is lethal	2010		^[67]
Drosophila	CG30395	Syntenic alignments of 12 Drosophila species		RNAi knockdown is lethal	2010		^[67]
Drosophila	CG31882	Syntenic alignments of 12 Drosophila species		RNAi knockdown is lethal	2010		^[67]
Drosophila	CG31406	tBLASTn of protein-coding regions to all 12 Drosophila genomes and comparison of BLASTZ alignments	dN/dS <1 indicates purifying selection	RNAi knockdown inhibits fertility	2013		^[97]
Drosophila	CG32582	tBLASTn of protein-coding regions to all 12 Drosophila genomes and comparison of BLASTZ alignments	Possible positive selection but not statistically significant	RNAi knockdown inhibits fertility	2013		^[97]
Drosophila	CG33235	tBLASTn of protein-coding regions to all 12 Drosophila genomes and comparison of BLASTZ alignments	dN/dS <1 indicates purifying selection	RNAi knockdown inhibits fertility	2013		^[97]
Drosophila	CG34434	tBLASTn of protein-coding regions to all 12 Drosophila genomes and comparison of BLASTZ alignments	dN/dS <1 indicates purifying selection	RNAi knockdown inhibits fertility	2013		^[97]
Drosophila melanogaster	goddard	Genome-wide tblastn searches and LASTZ- and Exonerate-based analyses of the syntenic regions		essential for individualization of elongated spermatids; RNA_i knockdown experiments in male flies	2017	Structure prediction: half disordered, half alpha-helical	^[98]^[99]
Drosophila simulans and Drosophila sechellia	Dsim_GD19764 and Dsec_GM10790	Exonerate-based analyses of the syntenic regions	Conservation across two sister species	Testes expression	2020	Born inside intron of another gene, contains conserved intron present at time of birth (length not multiple of 3). Structure prediction: contains a transmembrane alpha helix	^[100]
Gadidae	AFGP	Examination of Gadid phylogeny	Gene multiplied in Gadid species in colder habitats but decayed in species not under threat of freezing	Inhibit ice growth formation		Function is similar to other antifreeze proteins that evolved independently	^[101]^[102]
Mus	Gm13030	Combined phylostratigraphy and synteny approach	ORF only retained in M. m. musculus and M. m. castaneus populations; no evidence of positive selection	Knockout mutant has irregular pregnancy cycles	2019		^[103]
Mus	Poldi	Homologous region not expressed in closely related and outgroup species	Evidence of recent selective sweep in M. m. musculus	Knockout mutant has reduced sperm motility and testis weight	2009	RNA gene	^[45]
Placental Mammals	ORF-Y	PhyloCSF of POLG gene in Homo sapiens, synonymous site conservation across mammals, and tBLASTN of mammals, sauropsids, amphibians, and teleost fish	Disappearance of enhanced synonymous site conservation within the POLG ORF after the ORF-Y's stop codon and high conservation of the initiation context of the start codon indicate purifying selection	41 Clinvar variants that affect the ORF-Y peptide but not the amino acid sequence of POLG	2020		^[31]
Saccharomyces cerevisiae	BSC4	tBLASTN and syntenic alignments of closely related species	Under negative selection based on population data	Has two synthetic lethal partners	2008	Adopts a partially specific three-dimensional structure	^[42]^[43]
Saccharomyces cerevisiae	MDF1	Only identified putative homologs are truncated, non-expressed, non-functional ORFs	Fixed in 39 diverse strains, no frameshift or nonsense mutations	Decreases mating efficiency by binding MATα2; promotes growth through an interaction with Snf1	2010	Expression is suppressed by its antisense gene	^[104]^[105]

Features

General Features

Recently emerged de novo genes differ from established genes in a number of ways. Across a broad range of species, young and/or taxonomically restricted genes have been reported to be shorter in length than established genes, more positively charged, faster evolving,^[106] and to be less expressed.^[46]^[72]^[90]^[91]^[107]^[108]^[109]^[110]^[111]^[112]^[113]^[114]^[88]^[86]^[84]^[93]^{[ excessive citations ]} Although these trends could be a result of homology detection bias, a reanalysis of several studies that accounted for this bias found that the qualitative conclusions reached were unaffected.^[59] Another feature includes the tendency for young genes to have their hydrophobic amino acids more clustered near one another along the primary sequence.^[115]^[116]

The expression of young genes has also been found to be more tissue- or condition-specific than that of established genes.^[38]^[40]^[46]^[68]^[70]^[72]^[112]^[117]^[118]^[119]^[84]^[93] In particular, relatively high expression of de novo genes was observed in male reproductive tissues in Drosophila, stickleback, mice, and humans, and, in the human brain.^[70]^[120]^[84]^[93] In animals with adaptive immune systems, higher expression in the brain and testes may be a function of the immune-privileged nature of these tissues. An analysis in mice found specific expression of intergenic transcripts in the thymus and spleen (in addition to the brain and testes). It has been proposed that in vertebrates de novo transcripts must first be expressed in tissues lacking immune cells before they can be expressed in tissues that have immune surveillance.^[119]

Evolutionary rate

For sequence evolution, dN/dS analysis studies often indicate that de novo genes evolve at a higher rate compared to other genes.^[121]^[106] For expression evolution and structural evolution, quantitative studies across different evolutionary ages or phylostratigraphic branches are very few.

Features that promote de novo gene birth

Its also of interest to compare features of recently emerged de novo genes to the pool of non-genic ORFs from which they emerge. Theoretical modeling has shown that such differences are the product both of selection for features that increase the likelihood of functionalization, and of neutral evolutionary forces that influence allelic turnover.^[122] In budding yeast, systematic deletion and overexpression assays of newly emerged ORFs found that overexpression is enriched for fitness benefits, and that adaptive emerging sequences are biased toward encoding transmembrane domains from thymine-rich intergenic regions.^[123] One proposed route to de novo membrane proteins is that poly-A–rich sequences can generate hydrophobic ORFs that are predicted to form transmembrane helices.^[124] Across Saccharomycotina yeasts, intergenic regions show widespread enrichment for putative transmembrane-domain encoding potential, and this enrichment (rather than raw hydrophobicity alone) correlates with the abundance of transmembrane domains in evolutionarily young genes.^[125]

Laboratory studies comparing young de novo proteins to matched unevolved random-sequence proteins found broadly similar predicted biophysical-property distributions, but moderately higher in vitro solubility for de novo proteins (further increased by the DnaK chaperone system).^[126] High-throughput sorting of thousands of putative human de novo sORF-encoded proteins by structural compactness showed that older candidates are, on average, more compact and less disordered than younger ones.^[127] In bacteria, selection from ~10⁸ random-sequence genes identified many variants that promote E. coli growth under stress, including a random protein (RamF) that interacts with chaperones to drive degradation of a toxin and can be improved by beneficial mutations.^[128] Similarly, screening ~100 million short (semi-)random sequences for phage resistance uncovered thousands of novel genes that protect E. coli through distinct mechanisms, showing that unrelated random sequences can converge on similar adaptive phenotypes.^[129]

Experiments in E. coli showed that random peptides tended to have more benign effects when they were enriched for amino acids that were small, and that promoted intrinsic structural disorder.^[130] Comparative work in flies indicates that de novo-originated genes can become integrated into gene regulatory networks through interactions with key transcription factors, potentially contributing to lineage-specific developmental trajectories (“developmental system drift”).^[131]

Lineage-dependent features

Features of de novo genes can depend on the species or lineage being examined. This appears to partly be a result of varying GC content in genomes and that young genes bear more similarity to non-genic sequences from the genome in which they arose than do established genes.^[132] Features in the resulting protein, such as the percentage of transmembrane residues and the relative frequency of various predicted secondary structural features show a strong GC dependency in orphan genes, whereas in more ancient genes these features are only weakly influenced by GC content.^[132]

The relationship between gene age and the amount of predicted intrinsic structural disorder (ISD) in the encoded proteins has been subject to considerable debate. It has been claimed that ISD is also a lineage-dependent feature, exemplified by the fact that in organisms with relatively high GC content, ranging from D. melanogaster to the parasite Leishmania major , young genes have high ISD,^[133]^[134] while in a low GC genome such as budding yeast, several studies have shown that young genes have low ISD.^[72]^[107]^[114]^[132] However, a study that excluded young genes with dubious evidence for functionality, defined in binary terms as being under selection for gene retention, found that the remaining young yeast genes have high ISD, suggesting that the yeast result may be due to contamination of the set of young genes with ORFs that do not meet this definition, and hence are more likely to have properties that reflect GC content and other non-genic features of the genome.^[135] Beyond the very youngest orphans, this study found that ISD tends to decrease with increasing gene age, and that this is primarily due to amino acid composition rather than GC content.^[135] Within shorter time scales, using de novo genes that have the most validation suggests that younger genes are more disordered in Lachancea, but less disordered in Saccharomyces.^[114] Intrinsic structural disorder and aggregation propensity did not show significant differences with age in some studies of mammals ^[88] and primates,^[86] but did in other studies of mammals.^[135] One large study of the entire Pfam protein domain database showed enrichment of younger protein domain for disorder-promoting amino acids across animals, but enrichment on the basis of amino acid availability in plants.^[116]

Role of epigenetic modifications

An examination of de novo genes in A. thaliana found that they are both hypermethylated and generally depleted of histone modifications.^[66] In agreement with either the proto-gene model or contamination with non-genes, methylation levels of de novo genes were intermediate between established genes and intergenic regions. The methylation patterns of these de novo genes are stably inherited, and methylation levels were highest, and most similar to established genes, in de novo genes with verified protein-coding ability.^[66] In the pathogenic fungus Magnaporthe oryzae, less conserved genes tend to have methylation patterns associated with low levels of transcription.^[136] A study in yeasts also found that de novo genes are enriched at recombination hotspots, which tend to be nucleosome-free regions.^[114]

In Pristionchus pacificus , orphan genes with confirmed expression display chromatin states that differ from those of similarly expressed established genes.^[113] Orphan gene start sites have epigenetic signatures that are characteristic of enhancers, in contrast to conserved genes that exhibit classical promoters.^[113] Many unexpressed orphan genes are decorated with repressive histone modifications, while a lack of such modifications facilitates transcription of an expressed subset of orphans, supporting the notion that open chromatin promotes the formation of novel genes.^[113]

Structural evolution

De novo proteins typically exhibit less well-defined secondary and three-dimensional structures, often lacking rigid folding but having extensive disordered regions.^[121]^[135] Quantitative analyses are still lacking on the evolution of secondary structural elements and tertiary structures over time. As structure is usually more conserved than sequence, comparing structures between orthologs could provide deeper insides into de novo gene emergence and evolution and help to confirm these genes as true de novo genes.^[137] Nevertheless, so far only very few de novo proteins have been structurally and functionally characterized, especially due to problems with protein purification and subsequent stability. Progresses have been made using different purification tags, cell types and chaperones.^[138]

The 'antifreeze glycoprotein' (AFGP) in Arctic codfishes prevents their blood from freezing in arctic waters.^[102]^[101] Bsc4, a short non-essential de novo protein in yeast,^[42] has been shown to be built mainly by β-sheets and has a hydrophobic core.^[43] It is associated to DNA repair under nutrient-deficient conditions.^[139] The Drosophilade novo protein Goddard has been characterized for the first time in 2017. Knockdown Drosophila melanogaster male flies were not able to produce sperm.^[98] Recently, it could be shown that this lack was due to failure of individualization of elongated spermatids. By using computational phylogenomic and structure predictions, experimental structural analyses, and cell biological assays, it was proposed that half of Goddard's structure is disordered and the other half is composed by alpha-helical amino acids. These analyses also indicated that Goddard's orthologs show similar results. Goddard's structure therefore appears to have been mainly conserved since its emergence.^[99] It has been proposed, that these four putative de novo genes have diverged beyond the point at which they can be found.^[140] However, the evidential strength of proposed “hidden homology” remains unclear since the study relies on very relaxed BLAST thresholds (high E-values/low identity, i.e. the “twilight zone”) and on structural resemblance that could also reflect convergent evolution.^[141]

Overall, de novo proteins are often short and enriched in intrinsically disordered regions (IDRs), and many are predicted to lack stable tertiary structure when isolated.^[142] However, comparative genome-wide analyses in rice suggest that the structural properties of de novo proteins can evolve rapidly in some lineages, with predicted decreases in disorder and increases in structured elements over short evolutionary timescales and incorporation of de novo proteins into heteromeric multimers.^[143]

In Drosophila, a genome-wide study combining gene-age dating and structural modeling reported little overall predicted structural change among Drosophilinae de novo candidates, and ancestral sequence reconstruction suggested that many potentially well-folded candidates may be born well-folded.^[83] A broader survey of Drosophila de novo proteins likewise found that most differ from conserved proteins in predicted properties, but that a subset are predicted to adopt known folds and participate in specific cellular processes.^[144]

Inference from prediction tools requires caution because many predictors were trained and benchmarked primarily on conserved, globular proteins, and their performance can be biased for short or low-homology sequences.^[142]^[145] In particular, disorder predictions can be sensitive to parameter choices, and different structure predictors (including alignment-based and protein language model approaches) may disagree on de novo proteins and yield low-confidence models.^[142]^[146] Comparisons of “newly born” orphan proteins to “never born” random polypeptides using multiple deep-learning structure predictors similarly reported that predicted models are often of low quality while still allowing limited qualitative comparisons across sequence sets.^[147]

Mechanisms

Pervasive expression

With the development of technologies such as RNA-seq and Ribo-seq, eukaryotic genomes are now known to be pervasively transcribed^[148]^[149]^[150]^[151] and translated.^[152] Many ORFs that are either unannotated, or annotated as long non-coding RNAs (lncRNAs), are translated at some level, either in a condition or tissue-specific manner.^[72]^[152]^[153]^[154]^[155]^[156] Though infrequent, these translation events expose non-genic sequence to selection. This pervasive expression forms the basis for several models describing de novo gene birth.

It has been speculated that the epigenetic landscape of de novo genes in the early stages of formation may be particularly variable between and among populations, resulting in variable gene expression thereby allowing young genes to explore the "expression landscape."^[157] The QQS gene in A. thaliana is one example of this phenomenon; its expression is negatively regulated by DNA methylation that, while heritable for several generations, varies widely in its levels both among natural accessions and within wild populations.^[157] Epigenetics are also largely responsible for the permissive transcriptional environment in the testes, particularly through the incorporation into nucleosomes of non-canonical histone variants that are replaced by histone-like protamines during spermatogenesis.^[158]

Intergenic ORFs as elementary structural modules

Analysis of the fold potential diversity shows that the majority of the amino acid sequences encoded by the intergenic ORFs of S. cerevisiae are predicted to be foldable.^[159] More importantly, these amino acid sequences with folding potential can serve as elementary building blocks for de novo genes or integrate into pre-existing genes.^[159]

Order of events

For birth of a de novo protein-coding gene to occur, a non-genic sequence must both be transcribed and acquire an ORF before becoming translated. These events could occur in either order, and there is evidence supporting both an "ORF first" and a "transcription first" model.^[5]^[160] An analysis of de novo genes that are segregating in D. melanogaster found that sequences that are transcribed had similar coding potential to the orthologous sequences from lines lacking evidence of transcription.^[68] This finding supports the notion that many ORFs can exist prior to being transcribed. The antifreeze glycoprotein gene AFGP, which emerged de novo in Arctic codfishes, provides a more definitive example in which the de novo emergence of the ORF was shown to precede the promoter region.^[101] Furthermore, putatively non-genic ORFs long enough to encode functional peptides are numerous in eukaryotic genomes, and expected to occur at high frequency by chance.^[68]^[72] Through tracing the evolution history of ORF sequences and transcription activation of human de novo genes, a study showed that some ORFs were ready to confer biological significance upon their birth.^[160] At the same time, transcription of eukaryotic genomes is far more extensive than previously thought, and there are documented examples of genomic regions that were transcribed prior to the appearance of an ORF that became a de novo gene.^[97] The proportion of de novo genes that are protein-coding is unknown, but the appearance of "transcription first" has led some to posit that protein-coding de novo genes may first exist as RNA gene intermediates. The case of bifunctional RNAs, which are both translated and function as RNA genes, shows that such a mechanism is plausible.^[161] Neutral evolutionary modelling suggests that de novo protein-coding genes may more often emerge via a transcription-first trajectory, and that antisense overlap with existing genes can increase the probability of ORF emergence and retention.^[162]^[163]

The gain of both transcription and ORF may occur simultaneously when chromosomal rearrangement is the event that precipitates gene birth.^[164]

Models

Several theoretical models and possible mechanisms of de novo gene birth have been described. The models are generally not mutually exclusive, and it is possible that multiple mechanisms may give rise to de novo genes.^[54] An example is the type III antifreeze protein gene, which originates from an old sialic acid synthase (SAS) gene, in an Antarctic zoarcid fish.

"Out of Testis" hypothesis

An early case study of de novo gene birth, which identified five de novo genes in D. melanogaster, noted preferential expression of these genes in the testes,^[39] and several additional de novo genes were identified using transcriptomic data derived from the testes and male accessory glands of D. yakuba and D. erecta.^[38]^[40] This is in agreement with other studies that showed there is rapid evolution of genes related to reproduction across a range of lineages,^[165]^[166]^[167] suggesting that sexual selection may play a key role in adaptive evolution and de novo gene birth. A subsequent large-scale analysis of six D. melanogaster strains identified 248 testis-expressed de novo genes, of which ~57% were not fixed.^[68] A recent study on twelve Drosophila species additionally identified a higher proportion of de novo genes with testis-biased expression compared to annotated proteome.^[84] It has been suggested that the large number of de novo genes with male-specific expression identified in Drosophila is likely due to the fact that such genes are preferentially retained relative to other de novo genes, for reasons that are not entirely clear.^[90] Interestingly, two putative de novo genes in Drosophila (Goddard and Saturn) were shown to be required for normal male fertility.^[98]^[99] A genetic screen of over 40 putative de novo genes with testis-enriched expression in Drosophila melanogaster revealed that one of the de novo genes, atlas, was required for proper chromatin condensation during the final stages of spermatogenesis in male. atlas evolved from the fusion of a protein-coding gene that arose at the base of Drosophila genus and a conserved non-coding RNA.^[168] Comparative analysis of the transcriptomes of testis and accessory glands, a somatic tissue of males that is important for fertility, of D. melanogaster suggests that de novo genes make greater contribution to the transcriptomic complexity of testis as compared to accessory glands.^[169] Single-cell RNA-seq of D. melanogaster testis revealed that the expression pattern of de novo genes was biased toward early spermatogenesis.^[170]

In humans, a study that identified 60 human-specific de novo genes found that their average expression, as measured by RNA-seq, was highest in the testes.^[70] Another study looking at mammalian-specific genes more generally also found enriched expression in the testes.^[171] Transcription in mammalian testes is thought to be particularly promiscuous, due in part to elevated expression of the transcription machinery^[172]^[173] and an open chromatin environment.^[174] Along with the immune-privileged nature of the testes, this promiscuous transcription is thought to create the ideal conditions for the expression of non-genic sequences required for de novo gene birth. Testes-specific expression seems to be a general feature of all novel genes, as an analysis of Drosophila and vertebrate species found that young genes showed testes-biased expression regardless of their mechanism of origination.^[117]

Preadaptation model

The preadaptation model of de novo gene birth uses mathematical modeling to show that when sequences that are normally hidden are exposed to weak or shielded selection, the resulting pool of "cryptic" sequences (i.e. proto-genes) can be purged of "self-evidently deleterious" variants, such as those prone to lead to protein aggregation, and thus enriched in potential adaptations relative to a completely non-expressed and unpurged set of sequences.^[175] This revealing and purging of cryptic deleterious non-genic sequences is a byproduct of pervasive transcription and translation of intergenic sequences, and is expected to facilitate the birth of functional de novo protein-coding genes.^[155] This is because by eliminating the most deleterious variants, what is left is, by a process of elimination, more likely to be adaptive than expected from random sequences. Using the evolutionary definition of function (i.e. that a gene is by definition under purifying selection against loss), the preadaptation model assumes that "gene birth is a sudden transition to functionality"^[135] that occurs as soon as an ORF acquires a net beneficial effect. In order to avoid being deleterious, newborn genes are expected to display exaggerated versions of genic features associated with the avoidance of harm. This is in contrast to the proto-gene model, which expects newborn genes to have features intermediate between old genes and non-genes.^[135]

The mathematics of the preadaptation model assume that the distribution of fitness effects is bimodal, with new sequences of mutations tending to break something or tinker, but rarely in between.^[175]^[176] Following this logic, populations may either evolve local solutions, in which selection operates on each individual locus and a relatively high error rate is maintained, or a global solution with a low error rate which permits the accumulation of deleterious cryptic sequences.^[175]De novo gene birth is thought to be favored in populations that evolve local solutions, as the relatively high error rate will result in a pool of cryptic variation that is "preadapted" through the purging of deleterious sequences. Local solutions are more likely in populations with a high effective population size.

In support of the preadaptation model, an analysis of ISD in mice and yeast found that young genes have higher ISD than old genes, while random non-genic sequences tend to show the lowest levels of ISD.^[135] Although the observed trend may have partly resulted from a subset of young genes derived by overprinting,^[177] higher ISD in young genes is also seen among overlapping viral gene pairs.^[178] With respect to other predicted structural features such as β-strand content and aggregation propensity, the peptides encoded by proto-genes are similar to non-genic sequences and categorically distinct from canonical genes.^[179]

Proto-gene model

This proto-gene model agrees with the preadaptation model about the importance of pervasive expression, and refers to the set of pervasively expressed sequences that do not meet all definitions of a gene as "proto-genes".^[72] In contrast to the preadaptation model, the proto-gene model, suggests newborn genes have features intermediate between old genes and non-genes.^[135] Specifically this model envisages a more gradual process under selection from non-genic to genic state, rejecting the binary classification of gene and non-gene.

In an extension of the proto-gene model, it has been proposed that as proto-genes become more gene-like, their potential for adaptive change gives way to selected effects; thus, the predicted impact of mutations on fitness is dependent on the evolutionary status of the ORF.^[180] This notion is supported by the fact that overexpression of established ORFs in S. cerevisiae tends to be less beneficial (and more harmful) than does overexpression of emerging ORFs.^[180]

Several features of ORFs correlate with ORF age as determined by phylostratigraphic analysis, with young ORFs having properties intermediate between old ORFs and non-genes; this has been taken as evidence in favor of the proto-gene model, in which proto-gene state is a continuum .^[72] This evidence has been criticized, because the same apparent trends are also expected under a model in which identity as a gene is a binary. Under this model, when each age group contains a different ratio of genes vs. non-genes, Simpson's paradox can generate correlations in the wrong direction.^[135]

Grow slow and moult model

The "grow slow and moult" model describes a potential mechanism of de novo gene birth, particular to protein-coding genes. In this scenario, existing protein-coding ORFs expand at their ends, especially their 3' ends, leading to the creation of novel N- and C-terminal domains.^[181]^[182]^[183]^[184]^[185] Novel C-terminal domains may first evolve under weak selection via occasional expression through read-through translation, as in the preadaptation model, only later becoming constitutively expressed through a mutation that disrupts the stop codon.^[175]^[182] Genes experiencing high translational readthrough tend to have intrinsically disordered C-termini.^[186] Furthermore, existing genes are often close to repetitive sequences that encode disordered domains. These novel, disordered domains may initially confer some non-specific binding capability that becomes gradually refined by selection. Sequences encoding these novel domains may occasionally separate from their parent ORF, leading or contributing to the creation of a de novo gene.^[182] Interestingly, an analysis of 32 insect genomes found that novel domains (i.e. those unique to insects) tend to evolve fairly neutrally, with only a few sites under positive selection, while their host proteins remain under purifying selection, suggesting that new functional domains emerge gradually and somewhat stochastically.^[187]

Escape from adaptive conflict

The evolutionary model escape from adaptive conflict (EAC) proposes a possible way for new gene duplication to be fixed: conflict due to contrasting function within a single gene drives the fixation of new duplication.^[188]^[189]

Pleiotropy-barrier model

The 'pleiotropy-barrier' model suggests that newly evolved genes, including de novo genes and duplication-related genes, could facilitate evolutionary innovation or evolution of specific functions due to their low (or no) pleiotropic effect, when facing new selective force, based on observations from human gene-disease data.^[190]

Cultivator model

Beyond “ORF-first” and “transcription-first” scenarios, the proposed “cultivator model” emphasises that selection acting on regulatory environments of nearby pre-existing genes can promote stepwise fixation of new transcripts and, more rarely, protein-coding de novo genes.^[191]

Human health

In addition to its significance for the field of evolutionary biology, de novo gene birth has implications for human health. It has been speculated that novel genes, including de novo genes, may play an outsized role in species-specific traits;^[6]^[11]^[41]^[192] however, many species-specific genes lack functional annotation.^[171] Nevertheless, there is evidence to suggest that human-specific de novo genes are involved in diseases such as cancer. NYCM, a de novo gene unique to humans and chimpanzees, regulates the pathogenesis of neuroblastomas in mouse models,^[193] and the primate-specific PART1, an lncRNA gene, has been identified as both a tumor suppressor and an oncogene in different contexts.^[46]^[194] Several other human- or primate-specific de novo genes, including PBOV1,^[195]GR6,^[196]^[197]MYEOV,^[198] and ELFN1-AS1,^[199] are also linked to cancer. Some have even suggested considering tumor-specifically expressed, evolutionary novel genes as their own class of genetic elements, noting that many such genes are under positive selection and may be neofunctionalized in the context of tumors.^[199]

The specific expression of many de novo genes in the human brain^[70] also raises the intriguing possibility that de novo genes influence human cognitive traits. One such example is FLJ33706, a de novo gene that was identified in GWAS and linkage analyses for nicotine addiction and shows elevated expression in the brains of Alzheimer's patients.^[200] Further research, however, showed that FLJ33706 and other de novo gene candidates expressed in the human brain are either not translated or are diverged duplicates.^[201]^[202] Rapid evolutionary divergence, along with the multiple and parallel loss of orthologs in some lineages, can give the false appearance of the emergence of a novel gene from scratch ^[203] Generally speaking, expression of young, primate-specific genes is enriched in the fetal human brain relative to the expression of similarly young genes in the mouse brain.^[204] Most of these young genes, several of which originated de novo, are expressed in the neocortex, which is thought to be responsible for many aspects of human-specific cognition. Many of these young genes show signatures of positive selection, and functional annotations indicate that they are involved in diverse molecular processes, but are enriched for transcription factors.^[204]

In addition to their roles in cancer processes, de novo originated human genes have been implicated in the maintenance of pluripotency^[205] and in immune function.^[46]^[171]^[206] The preferential expression of de novo genes in the testes is also suggestive of a role in reproduction. Given that the function of many de novo human genes remains uncharacterized, it seems likely that an appreciation of their contribution to human health and development will continue to grow.

Genome-scale studies of orphan and de novo genes in various lineages.


Organism/Lineage	Homology Detection Method(s)	Evidence of Expression?	Evidence of Selection?	Evidence of Physiological Role?	# Orphan/De Novo Genes	Notes	Ref.
Arthropods	BLASTP for all 30 species against each other, TBLASTN for Formicidae only, searched by synteny for unannotated orthologs in Formicidae only	ESTs, RNA-seq; RT-PCR on select candidates	37 Formicidae-restricted orthologs appear under positive selection (M1a to M2a and M7 to M8 models using likelihood ratio tests); as a group, Formicidae-restricted orthologs have a significantly higher K_a/K_s rate than non-restricted orthologs	Prediction of signal peptides and subcellular localization for subset of orphans	~65,000 orphan genes across 30 species	Abundance of orphan genes dependent on time since emergence from common ancestor; >40% of orphans from intergenic matches indicating possible de novo origin	^[92]
Arabidopsis thaliana	BLASTP against 62 species, PSI-BLAST against NCBI nonredundant protein database, TBLASTN against PlantGDB-assembled unique transcripts database; searched syntenic region of two closely related species	Transcriptomic and translatomic data from multiple sources	Allele frequencies of de novo genes correlated with their DNA methylation levels	None	782 de novo genes	Also assessed DNA methylation and histone modifications	^[66]
Bombyx mori	BLASTP against four lepidopterans, TBLASTN against lepidopteran EST sequences, BLASTP against NCBI nonredundant protein database	Microarray, RT-PCR	None	RNAi on five de novo genes produced no visible phenotypes	738 orphan genes	Five orphans identified as de novo genes	^[111]
Brassicaceae	BLASTP against NCBI nonredundant protein database, TBLASTN against NCBI nucleotide database, TBLASTN against NCBI EST database, PSI-BLAST against NCBI nonredundant protein database, InterProScan^[207]	Microarray	None	TRGs enriched for expression changes in response to abiotic stresses compared to other genes	1761 nuclear TRGs; 28 mitochondrial TRGs	~2% of TRGs thought to be de novo genes	^[112]
Drosophila melanogaster	BLASTN of query cDNAs against D. melanogaster, D. simulans and D. yakuba genomes; also performed check of syntenic region in sister species	cDNA/ expressed sequence tags (ESTs)	K_a/K_s ratios calculated between retained new genes and their parental genes are significantly >1, indicating most new genes are functionally constrained	List includes several genes with characterized molecular roles	72 orphan genes; 2 de novo genes	Gene duplication dominant mechanism for new genes; 7/59 orphans specific to D. melanogaster species complex identified as de novo	^[69]
Drosophila melanogaster	Presence or absence of orthologs in other Drosophila species inferred by synteny based on UCSC genome alignments and FlyBase protein-based synteny; TBLASTN against Drosophila subgroup	Indirect (RNAi)	Youngest essential genes show signatures of positive selection (α=0.25 as a group)	Knockdown with constitutive RNAi lethal for 59 TRGs	195 "young" (>35myo) TRGs; 16 de novo genes	Gene duplication dominant mechanism for new genes	^[67]
Drosophila melanogaster	RNA-seq in D. melanogaster and close relatives; syntenic alignments with D. simulans and D. yakuba; BLASTP against NCBI nonredundant protein database	RNA-seq	Nucleotide diversity lower in non-expressing relatives; Hudson-Kreitman-Aguade-like statistic lower in fixed de novo genes than in intergenic regions	Structural features of de novo genes (e.g. enrichment of long ORFs) suggestive of function	106 fixed and 142 segregating de novo genes	Specifically expressed in testes	^[68]
Drosophila	All-vs-all BLASTP; phylostratigraphic analysis	Annotated proteomes of twelve Drosophila species	Pairwise d_N/d_S values for all single exon focal ORFs	None	6297 orphan genes; 2467 de novo genes		^[84]
Homo sapiens	BLASTP against other primates; BLAT against chimpanzee and orangutan genomes, manual check of syntenic regions in chimpanzee and orangutan	RNA-seq	Substitution rate provides some evidence for weak selection; 59/60 de novo genes are fixed	None	60 de novo genes	Enabling mutations identified; highest expression seen in brain and testes	^[70]
Homo sapiens	BLASTP against chimpanzee, BLAT and Search of syntenic region in chimpanzee, manual check of syntenic regions in chimpanzee and macaque	EST/cDNA	No evidence of selective constraint seen by nucleotide divergence	One of the genes identified has a known role in leukemia	3 de novo genes	Estimated that human genome contains ~ 18 human-specific de novo genes	^[47]
Homo sapiens and five other primates	BLAST against each of the six primate transcriptomes; analysis of syntenic regions	Transcriptome data from up to six tissue types	Pairwise d_N/d_S ratios for human-chimpanzee homologs indicate mostly neutral or mild purifying selection	None	29.751 transcribed novel human ORFs in total: 2,749 human-restricted, 5,378 primate-restricted	Novel ORF gain and loss found to be mainly stochastic rather than shaped by selection	^[86]
Lachancea and Saccharomyces	BLASTP of all focal species against each other, BLASTP against NCBI nonredundant protein database, PSI-BLAST against NCBI nonredundant protein database, HMM Profile-Profile of TRG families against each other; families then merged and searched against four profile databases	Mass Spectrometry (MS)	K_a/K_s ratios across Saccharomyces indicate that candidates are under weak selection that increases with gene age; in Lachancea species with multiple strains, pN/pS ratios are lower for de novo candidates than for "spurious TRGs"	None	288 candidate de novo TRGs in Saccharomyces, 415 in Lachancea	MS evidence of translation for 25 candidates	^[114]
Mus musculus and Rattus norvegicus	BLASTP of rat and mouse against each other, BLASTP against Ensembl compara database; searched syntenic regions in rat and mouse	UniGene Database	Subset of genes shows low nucleotide diversity and high ORF conservation across 17 strains	Two mouse genes cause morbidity when knocked out	69 de novo genes in mouse and 6 "de novo" genes in rat	Enabling mutations identified for 9 mouse genes	^[208]
Mus musculus	BLASTP against NCBI nonredundant protein database	Microarray	None	None	781 orphan genes	Age-dependent features of genes compatible with de novo emergence of many orphans	^[87]
Mus musculus and four other mammals	DIAMOND and BLASTP; phylostratigraphy analysis	High coverage transcriptomes	None	None	~60,000 transcribed novel ORFs		^[88]
Oryza	Protein-to-protein and nucleotide-to-nucleotide BLAT against eight Oryza species and two outgroup species; searched syntenic regions of these species for coding potential	RNA-seq (all de novo TRGs); Ribosome Profiling and targeted MS (some de novo TRGs)	22 de novo candidates appear under negative selection, and six under positive selection, as measured by K_a/K_s rate	Expression of de novo TRGs is tissue-specific	175 de novo TRGs	~57% of de novo genes have translational evidence; transcription predates coding potential in most cases	^[209]
Primates	BLASTP against 15 eukaryotes, BLASTN against human genome, analysis of syntenic regions	ESTs	K_a/K_s ratios for TRGs below one but higher than established genes; coding scores consistent with translated proteins	Several genes have well-characterized cellular roles	270 TRGs	~5.5% of TRGs estimated to have originated de novo	^[46]
Pristionchus pacificus	BLASTP and tBLASTN, syntenic analysis	RNA-Seq			2 cases complete de novo gene origination	27 other high-confidence orphans whose methods of origin included annotation artifacts, chimeric origin, alternative reading frame usage, and gene splitting with subsequent gain of de novo exons	^[210]
Rodentia	BLASTP against NCBI nonredundant protein database	None	Mouse genes share 50% identity with rat ortholog	None	84 TRGs	Species-specific genes excluded from analysis; results robust to evolutionary rate	^[135]
Saccharomyces cerevisiae	BLASTP and PSI-BLAST against 18 fungal species, HMMER and HHpred against several databases, TBLASTN against three close relatives	None	None	Majority of orphans have characterized fitness effects	188 orphan genes	Ages of genes determined at level of individual residues	^[107]
Saccharomyces cerevisiae	BLASTP, TBLASTX, and TBLASTN against 14 other yeast species, BLASTP against NCBI nonredundant protein database	Ribosome Profiling	All 25 de novo genes, 115 proto-genes under purifying selection (pN/pS < 1)	None	25 de novo genes; 1,891 "proto-genes"	De novo gene birth more common than new genes from duplication; proto-genes are unique to Saccharomyces ( Sensu stricto ) yeasts	^[72]
Saccharomyces cerevisiae	BLASTN, TBLASTX, against nt/nr, manual inspection of syntenic alignment	transcripts believed to be non-coding, manual inspection of ribosome profiling traces	None	None	1 de novo candidate gene, 217 ribosome-associated transcripts	Candidate de novo gene is polymorphic. Ribosomal profiling data is the same as in ^[72]	^[155]
Saccharomyces paradoxus	Intergenic ORFs (iORFs) were annotated in orthologous intergenic regions identified by microsynteny; BLASTP against NCBI refseq protein database against 417 species, including 237 fungi species.	Ribo-seq, RNA-seq In vivo translation assay for subset (45) of translated iORFS	Global dn/ds ratio for translated iORFs in 24 S. paradoxus strains showed no evidence for purifying selection.	None	447 iORFs with significant translation that were specific to 3 S. paradoxus lineages and 1 S. cerevisiae lineage.	iORF translation efficiency was generally lower than that for annotated genes with significant overlap; only ~2% of the 19,689 iORFs found showed significant translation, but they add up to >8% of the canonical proteome in wild yeast populations.	^[85]
Saccharomyces sensu stricto	BLASTP against NCBI nonredundant protein database, TBLASTN against ten outgroup species; BLASTP and phmmer against 20 yeast species reannotated using syntenic alignments	Transcript isoform sequencing (TIF-seq), Ribosome Profiling	Most genes weakly constrained but a subset under strong selection, according to Neutrality Index, Direction of Selection, K_a/K_s, and McDonald-Kreitman tests	Subcellular localization demonstrated for five genes	~13,000 de novo genes	>65% of de novo genes are isoforms of ancient genes; >97% from TIF-seq dataset	^[65]

Note: For purposes of this table, genes are defined as orphan genes (when species-specific) or TRGs (when limited to a closely related group of species) when the mechanism of origination has not been investigated, and as de novo genes when de novo origination has been inferred, irrespective of method of inference. The designation of de novo genes as "candidates" or "proto-genes" reflects the language used by the authors of the respective studies.

References

This article was adapted from the following source under a CC BY 4.0 license (2019) (reviewer reports): Stephen Branden Van Oss; Anne-Ruxandra Carvunis (23 May 2019). "De novo gene birth". PLOS Genetics . 15 (5): e1008160. doi: 10.1371/JOURNAL.PGEN.1008160 . ISSN 1553-7390. PMC 6542195 . PMID 31120894. Wikidata Q86320144.{{cite journal}}: CS1 maint: article number as page number (link)

1 2 Long M, Betrán E, Thornton K, Wang W (November 2003). "The origin of new genes: glimpses from the young and old". Nature Reviews Genetics. 4 (11): 865–75. doi:10.1038/nrg1204. PMID 14634634. S2CID 33999892.
↑ Wang W, Yu H, Long M (May 2004). "Duplication-degeneration as a mechanism of gene fission and the origin of new genes in Drosophila species". Nature Genetics. 36 (5): 523–7. Bibcode:2004NaGen..36..523W. doi: 10.1038/ng1338 . PMID 15064762.
↑ Levy A (October 2019). "How evolution builds genes from scratch". Nature. 574 (7778): 314–316. Bibcode:2019Natur.574..314L. doi: 10.1038/d41586-019-03061-x . PMID 31619796.
↑ Schmitz JF, Bornberg-Bauer E (2017). "Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA". F1000Research. 6: 57. Bibcode:2017JSPS....6...57S. doi: 10.12688/f1000research.10079.1 . PMC 5247788 . PMID 28163910.
1 2 3 4 Schlötterer C (April 2015). "Genes from scratch--the evolutionary fate of de novo genes". Trends in Genetics. 31 (4): 215–9. doi:10.1016/j.tig.2015.02.007. PMC 4383367 . PMID 25773713.
1 2 3 Kaessmann H (October 2010). "Origins, evolution, and phenotypic impact of new genes". Genome Research. 20 (10): 1313–26. doi:10.1101/gr.101386.109. PMC 2945180 . PMID 20651121.
1 2 Jacob F (June 1977). "Evolution and tinkering". Science. 196 (4295): 1161–1166. Bibcode:1977Sci...196.1161J. doi:10.1126/science.860134. PMID 860134. S2CID 29756896.
↑ Van Oss SB, Carvunis AR (May 2019). "De novo gene birth". PLOS Genetics. 15 (5) e1008160. doi: 10.1371/journal.pgen.1008160 . PMC 6542195 . PMID 31120894.
↑ Fesenko, Igor; Shabalina, Svetlana A; Storz, Gisela; Koonin, Eugene V (2025-11-26). "De novo origin of numerous microproteins in enterobacteria". Nucleic Acids Research. 53 (22) gkaf1319. doi:10.1093/nar/gkaf1319. ISSN 0305-1048. PMC 12700099 . PMID 41385328.
↑ Khalturin K, Hemmrich G, Fraune S, Augustin R, Bosch TC (September 2009). "More than just orphans: are taxonomically-restricted genes important in evolution?". Trends in Genetics. 25 (9): 404–413. doi:10.1016/j.tig.2009.07.006. PMID 19716618.
1 2 3 4 5 Tautz D, Domazet-Lošo T (August 2011). "The evolutionary origin of orphan genes". Nature Reviews. Genetics. 12 (10): 692–702. doi:10.1038/nrg3053. PMID 21878963. S2CID 31738556.
↑ Weisman, Caroline M.; Murray, Andrew W.; Eddy, Sean R. (2020-11-02). "Many, but not all, lineage-specific genes can be explained by homology detection failure". PLOS Biology. 18 (11) e3000862. doi: 10.1371/journal.pbio.3000862 . PMC 7660931 . PMID 33137085.
↑ Moyers, Bryan A.; Zhang, Jianzhi (January 2015). "Phylostratigraphic bias creates spurious patterns of genome evolution". Molecular Biology and Evolution. 32 (1): 258–267. doi:10.1093/molbev/msu286. PMC 4271527 . PMID 25312911.
↑ Moyers, Bryan A.; Zhang, Jianzhi (May 2016). "Evaluating Phylostratigraphic Evidence for Widespread De Novo Gene Birth in Genome Evolution". Molecular Biology and Evolution. 33 (5): 1245–1256. doi:10.1093/molbev/msw008. PMC 5010002 . PMID 26758516.
↑ Reinhardt, Josephine A.; Wanjiru, B. M.; Brant, A. T.; Saelao, P.; Begun, David J.; Jones, Corbin D. (2013-10-17). "De Novo ORFs in Drosophila Are Important to Organismal Fitness and Evolved Rapidly from Previously Non-coding Sequences". PLOS Genetics. 9 (10) e1003860. doi: 10.1371/journal.pgen.1003860 . PMC 3798262 . PMID 24146629.
↑ Casola, Claudio; Luria, Victor; Vakirlis, Nikolaos; Zhao, Li (2025-11-28). "De Novo Genes: Current Status and Future Goals". Genome Biology and Evolution. 17 (12) evaf230. doi:10.1093/gbe/evaf230. PMC 12708343 . PMID 41313722.
↑ Zhao, Li; Svetec, Nicolas; Begun, David J. (November 2024). "De Novo Genes". Annual Review of Genetics. 58 (1): 211–232. doi:10.1146/annurev-genet-111523-102413. PMC 12051474 . PMID 39088850.
1 2 Grandchamp, Anna; Aubel, Margaux; Eicholt, Lars A.; Roginski, Paul; Luria, Victor; Karger, Amir; Dohmen, Elias (2025-10-23). "De Novo Gene Emergence: Summary, Classification, and Challenges of Current Methods". Genome Biology and Evolution. 17 (11): evaf197. doi:10.1093/gbe/evaf197. PMC 12605812 . PMID 41126639.{{cite journal}}: CS1 maint: article number as page number (link)
↑ Weisman, Caroline M. (August 2022). "The Origins and Functions of De Novo Genes: Against All Odds?". Journal of Molecular Evolution. 90 (3–): 244–257. Bibcode:2022JMolE..90..244W. doi:10.1007/s00239-022-10055-3. PMC 9233646 . PMID 35451603.
↑ Ohno S (1970) Evolution by Gene DuplicationAllen & Unwin; Springer-Verlag
↑ Tautz D (2014). "The discovery of de novo gene evolution". Perspectives in Biology and Medicine. 57 (1): 149–61. doi:10.1353/pbm.2014.0006. hdl: 11858/00-001M-0000-0024-3416-1 . PMID 25345708. S2CID 29552265.
↑ Grassé P-P (1977) Evolution of living organisms: evidence for a new theory of transformationAcademic Press
↑ Barrell BG, Air GM, Hutchison CA (November 1976). "Overlapping genes in bacteriophage phiX174". Nature. 264 (5581): 34–41. Bibcode:1976Natur.264...34B. doi:10.1038/264034a0. PMID 1004533. S2CID 4264796.
↑ Shaw DC, Walker JE, Northrop FD, Barrell BG, Godson GN, Fiddes JC (April 1978). "Gene K, a new overlapping gene in bacteriophage G4". Nature. 272 (5653): 510–5. Bibcode:1978Natur.272..510S. doi:10.1038/272510a0. PMID 692656. S2CID 4218777.
↑ Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, et al. (February 1977). "Nucleotide sequence of bacteriophage phi X174 DNA". Nature. 265 (5596): 687–95. Bibcode:1977Natur.265..687S. doi:10.1038/265687a0. PMID 870828. S2CID 4206886.
↑ Keese PK, Gibbs A (October 1992). "Origins of genes: "big bang" or continuous creation?". Proceedings of the National Academy of Sciences of the United States of America. 89 (20): 9489–93. Bibcode:1992PNAS...89.9489K. doi: 10.1073/pnas.89.20.9489 . PMC 50157 . PMID 1329098.
↑ Ohno S (April 1984). "Birth of a unique enzyme from an alternative reading frame of the preexisted, internally repetitious coding sequence". Proceedings of the National Academy of Sciences of the United States of America. 81 (8): 2421–5. Bibcode:1984PNAS...81.2421O. doi: 10.1073/pnas.81.8.2421 . PMC 345072 . PMID 6585807.
↑ Sabath N, Wagner A, Karlin D (December 2012). "Evolution of viral proteins originated de novo by overprinting". Molecular Biology and Evolution. 29 (12): 3767–80. doi:10.1093/molbev/mss179. PMC 3494269 . PMID 22821011.
↑ Makałowska I, Lin CF, Hernandez K (October 2007). "Birth and death of gene overlaps in vertebrates". BMC Evolutionary Biology. 7 (1): 193. Bibcode:2007BMCEE...7..193M. doi: 10.1186/1471-2148-7-193 . PMC 2151771 . PMID 17939861.
↑ Samandi S, Roy AV, Delcourt V, Lucier JF, Gagnon J, Beaudoin MC, et al. (October 2017). "Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins". eLife. 6 e27860. doi: 10.7554/eLife.27860 . PMC 5703645 . PMID 29083303.
1 2 Khan YA, Jungreis I, Wright JC, Mudge JM, Choudhary JS, Firth AE, Kellis M (March 2020). "Evidence for a novel overlapping coding sequence in POLG initiated at a CUG start codon". BMC Genetics. 21 (1) 25. doi: 10.1186/s12863-020-0828-7 . PMC 7059407 . PMID 32138667.
↑ Makałowski W, Mitchell GA, Labuda D (June 1994). "Alu sequences in the coding regions of mRNA: a source of protein variability". Trends in Genetics. 10 (6): 188–93. doi:10.1016/0168-9525(94)90254-2. PMID 8073532.
↑ Sorek R (October 2007). "The birth of new exons: mechanisms and evolutionary consequences". RNA. 13 (10): 1603–8. doi:10.1261/rna.682507. PMC 1986822 . PMID 17709368.
1 2 Dorit RL, Gilbert W (December 1991). "The limited universe of exons". Current Opinion in Genetics & Development. 1 (4): 464–9. doi:10.1016/S0959-437X(05)80193-5. PMID 1822278.
↑ Chothia C (June 1992). "Proteins. One thousand families for the molecular biologist". Nature. 357 (6379): 543–4. Bibcode:1992Natur.357..543C. doi: 10.1038/357543a0 . PMID 1608464. S2CID 4355476.
↑ Oliver SG, van der Aart QJ, Agostoni-Carbone ML, Aigle M, Alberghina L, Alexandraki D, et al. (May 1992). "The complete DNA sequence of yeast chromosome III". Nature. 357 (6373): 38–46. Bibcode:1992Natur.357...38O. doi:10.1038/357038a0. PMID 1574125. S2CID 4271784.
1 2 Dujon B (July 1996). "The yeast genome project: what did we learn?". Trends in Genetics. 12 (7): 263–70. doi:10.1016/0168-9525(96)10027-5. PMID 8763498.
1 2 3 Begun DJ, Lindfors HA, Kern AD, Jones CD (June 2007). "Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade". Genetics. 176 (2): 1131–7. Bibcode:2007Genet.176.1131B. doi:10.1534/genetics.106.069245. PMC 1894579 . PMID 17435230.
1 2 3 Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ (June 2006). "Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression". Proceedings of the National Academy of Sciences of the United States of America. 103 (26): 9935–9. Bibcode:2006PNAS..103.9935L. doi: 10.1073/pnas.0509809103 . PMC 1502557 . PMID 16777968.
1 2 3 4 Begun DJ, Lindfors HA, Thompson ME, Holloway AK (March 2006). "Recently evolved genes identified from Drosophila yakuba and D. erecta accessory gland expressed sequence tags". Genetics. 172 (3): 1675–81. doi:10.1534/genetics.105.050336. PMC 1456303 . PMID 16361246.
1 2 McLysaght A, Guerzoni D (September 2015). "New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation". Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 370 (1678) 20140332. doi:10.1098/rstb.2014.0332. PMC 4571571 . PMID 26323763.
1 2 3 Cai J, Zhao R, Jiang H, Wang W (May 2008). "De novo origination of a new protein-coding gene in Saccharomyces cerevisiae". Genetics. 179 (1): 487–96. Bibcode:2008Genet.179..487C. doi:10.1534/genetics.107.084491. PMC 2390625 . PMID 18493065.
1 2 3 Bungard D, Copple JS, Yan J, Chhun JJ, Kumirov VK, Foy SG, et al. (November 2017). "Foldability of a Natural De Novo Evolved Protein". Structure. 25 (11): 1687–1696.e4. doi:10.1016/j.str.2017.09.006. PMC 5677532 . PMID 29033289.
1 2 Li L, Foster CM, Gan Q, Nettleton D, James MG, Myers AM, et al. (May 2009). "Identification of the novel protein QQS as a component of the starch metabolic network in Arabidopsis leaves". The Plant Journal. 58 (3): 485–98. Bibcode:2009PlJ....58..485L. doi: 10.1111/j.1365-313X.2009.03793.x . PMID 19154206.
1 2 Heinen TJ, Staubach F, Häming D, Tautz D (September 2009). "Emergence of a new gene from an intergenic region". Current Biology. 19 (18): 1527–31. Bibcode:2009CBio...19.1527H. doi: 10.1016/j.cub.2009.07.049 . PMID 19733073. S2CID 12446879.
1 2 3 4 5 6 7 8 Toll-Riera M, Bosch N, Bellora N, Castelo R, Armengol L, Estivill X, et al. (March 2009). "Origin of primate orphan genes: a comparative genomics approach". Molecular Biology and Evolution. 26 (3): 603–12. doi: 10.1093/molbev/msn281 . PMID 19064677.
1 2 3 Knowles DG, McLysaght A (October 2009). "Recent de novo origin of human protein-coding genes". Genome Research. 19 (10): 1752–9. doi:10.1101/gr.095026.109. PMC 2765279 . PMID 19726446.
↑ Wu, et al. (November 2011). "De Novo Origin of Human Protein-Coding Genes". PLOS Genetics. 7 (11) e1002379. doi: 10.1371/journal.pgen.1002379 . PMC 3213175 . PMID 22102831.
↑ Pereira, Andres Barboza; Marano, Matthew; Bathala, Ramya; Zaragoza, Rigoberto Ayala; Neira, Andres; Samano, Alex; Owoyemi, Adekola; Casola, Claudio (January 2025). "Orphan genes are not a distinct biological entity". BioEssays. 47 (1) e2400146. doi:10.1002/bies.202400146. PMC 11662153 . PMID 39491810.
1 2 Domazet-Loso T, Brajković J, Tautz D (November 2007). "A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages". Trends in Genetics. 23 (11): 533–9. doi:10.1016/j.tig.2007.08.014. PMID 18029048.
1 2 3 Gehrmann T, Reinders MJ (November 2015). "Proteny: discovering and visualizing statistically significant syntenic clusters at the proteome level". Bioinformatics. 31 (21): 3437–44. doi:10.1093/bioinformatics/btv389. PMC 4612220 . PMID 26116928.
↑ Świrski, Michał I.; Tierney, Matthew T.; Albà, Mar; et al. (October 2025). "Translons: a common name for all translated regions". Nature Methods. 22 (10): 2002–2006. doi:10.1038/s41592-025-02810-3. hdl: 10261/405271 . PMID 40890551.
↑ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (October 1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–10. doi:10.1016/S0022-2836(05)80360-2. PMID 2231712. S2CID 14441902.
1 2 3 4 5 6 McLysaght A, Hurst LD (September 2016). "Open questions in the study of de novo genes: what, how and why". Nature Reviews Genetics. 17 (9): 567–78. doi:10.1038/nrg.2016.78. PMID 27452112. S2CID 6033249.^{[ permanent dead link ]}
↑ Elhaik E, Sabath N, Graur D (January 2006). "The "inverse relationship between evolutionary rate and age of mammalian genes" is an artifact of increased genetic distance with rate of evolution and time of divergence". Molecular Biology and Evolution. 23 (1): 1–3. doi: 10.1093/molbev/msj006 . PMID 16151190.
↑ Albà MM, Castresana J (April 2007). "On homology searches by protein Blast and the characterization of the age of genes". BMC Evolutionary Biology. 7 (1): 53. Bibcode:2007BMCEE...7...53A. doi: 10.1186/1471-2148-7-53 . PMC 1855329 . PMID 17408474.
↑ Moyers BA, Zhang J (May 2016). "Evaluating Phylostratigraphic Evidence for Widespread De Novo Gene Birth in Genome Evolution". Molecular Biology and Evolution. 33 (5): 1245–56. doi:10.1093/molbev/msw008. PMC 5010002 . PMID 26758516.
↑ Moyers BA, Zhang J (January 2015). "Phylostratigraphic bias creates spurious patterns of genome evolution". Molecular Biology and Evolution. 32 (1): 258–67. doi:10.1093/molbev/msu286. PMC 4271527 . PMID 25312911.
1 2 Domazet-Lošo T, Carvunis AR, Albà MM, Šestak MS, Bakaric R, Neme R, et al. (April 2017). "No Evidence for Phylostratigraphic Bias Impacting Inferences on Patterns of Gene Emergence and Evolution". Molecular Biology and Evolution. 34 (4): 843–856. doi:10.1093/molbev/msw284. PMC 5400388 . PMID 28087778.
↑ Tassios, Emilios; de Leuw, Jori; Nikolaou, Christoforos; Kupczok, Anne; Vakirlis, Nikolaos (2025-12-27). "Machine learning can distinguish orphans that have resulted from sequence divergence beyond recognition". Bioinformatics Advances vbaf324. doi: 10.1093/bioadv/vbaf324 .
↑ Ghiurcuta CG, Moret BM (June 2014). "Evaluating synteny for improved comparative studies". Bioinformatics. 30 (12): i9-18. doi:10.1093/bioinformatics/btu259. PMC 4058928 . PMID 24932010.
↑ Jean G, Nikolski M (2011). "SyDiG: uncovering Synteny in Distant Genomes" (PDF). International Journal of Bioinformatics Research and Applications. 7 (1) 39169: 43–62. doi:10.1504/IJBRA.2011.039169. PMID 21441096. S2CID 2644451.
↑ Liu D, Hunt M, Tsai IJ (January 2018). "Inferring synteny between genome assemblies: a systematic evaluation". BMC Bioinformatics. 19 (1) 26. Bibcode:2018BMCBi..19...26L. doi: 10.1186/s12859-018-2026-4 . PMC 5791376 . PMID 29382321.
↑ Ranz JM, Casals F, Ruiz A (February 2001). "How malleable is the eukaryotic genome? Extreme rate of chromosomal rearrangement in the genus Drosophila". Genome Research. 11 (2): 230–9. doi:10.1101/gr.162901. PMC 311025 . PMID 11157786.
1 2 Lu TC, Leu JY, Lin WC (November 2017). "A Comprehensive Analysis of Transcript-Supported De Novo Genes in Saccharomyces sensu stricto Yeasts". Molecular Biology and Evolution. 34 (11): 2823–2838. doi:10.1093/molbev/msx210. PMC 5850716 . PMID 28981695.
1 2 3 4 Li ZW, Chen X, Wu Q, Hagmann J, Han TS, Zou YP, Ge S, Guo YL (August 2016). "On the Origin of De Novo Genes in Arabidopsis thaliana Populations". Genome Biology and Evolution. 8 (7): 2190–202. doi:10.1093/gbe/evw164. PMC 4987118 . PMID 27401176.
1 2 3 4 5 6 7 8 Chen S, Zhang YE, Long M (December 2010). "New genes in Drosophila quickly become essential". Science. 330 (6011): 1682–5. Bibcode:2010Sci...330.1682C. doi:10.1126/science.1196380. PMC 7211344 . PMID 21164016. S2CID 7899890.
1 2 3 4 5 6 7 Zhao L, Saelao P, Jones CD, Begun DJ (February 2014). "Origin and spread of de novo genes in Drosophila melanogaster populations". Science. 343 (6172): 769–72. Bibcode:2014Sci...343..769Z. doi:10.1126/science.1248286. PMC 4391638 . PMID 24457212.
1 2 3 4 Zhou Q, Zhang G, Zhang Y, Xu S, Zhao R, Zhan Z, et al. (September 2008). "On the origin of new genes in Drosophila". Genome Research. 18 (9): 1446–55. doi:10.1101/gr.076588.108. PMC 2527705 . PMID 18550802.
1 2 3 4 5 6 7 Wu DD, Irwin DM, Zhang YP (November 2011). "De novo origin of human protein-coding genes". PLOS Genetics. 7 (11) e1002379. doi: 10.1371/journal.pgen.1002379 . PMC 3213175 . PMID 22102831.
↑ Vakirlis N, McLysaght A (2019). "Computational Prediction of de Novo Emerged Protein-Coding Genes". Computational Methods in Protein Evolution. Methods in Molecular Biology. Vol. 1851. Springer. pp. 63–81. doi:10.1007/978-1-4939-8736-8_4. ISBN 978-1-4939-8735-1. PMID 30298392. S2CID 52942639.
1 2 3 4 5 6 7 8 9 10 11 12 Carvunis AR, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, et al. (July 2012). "Proto-genes and de novo gene birth". Nature. 487 (7407): 370–374. Bibcode:2012Natur.487..370C. doi:10.1038/nature11184. PMC 3401362 . PMID 22722833.
↑ Doolittle WF, Brunet TD, Linquist S, Gregory TR (May 2014). "Distinguishing between "function" and "effect" in genome biology". Genome Biology and Evolution. 6 (5): 1234–1237. doi:10.1093/gbe/evu098. PMC 4041003 . PMID 24814287.
1 2 Kellis M, Wold B, Snyder MP, Bernstein BE, Kundaje A, Marinov GK, et al. (April 2014). "Defining functional DNA elements in the human genome". Proceedings of the National Academy of Sciences of the United States of America. 111 (17): 6131–6138. Bibcode:2014PNAS..111.6131K. doi: 10.1073/pnas.1318948111 . PMC 4035993 . PMID 24753594.
1 2 3 Keeling DM, Garza P, Nartey CM, Carvunis AR (November 2019). "The meanings of 'function' in biology and the problematic case of de novo gene emergence". eLife. 8 e47014. doi: 10.7554/eLife.47014 . PMC 6824840 . PMID 31674305.
↑ Andersson DI, Jerlström-Hultqvist J, Näsvall J (June 2015). "Evolution of new functions de novo and from preexisting genes". Cold Spring Harbor Perspectives in Biology. 7 (6) a017996. Bibcode:2015CSHPB...717996A. doi:10.1101/cshperspect.a017996. PMC 4448608 . PMID 26032716.
↑ Xie C, Bekpen C, Künzel S, Keshavarz M, Krebs-Wheaton R, Skrabar N, et al. (January 2019). "Studying the dawn of de novo gene emergence in mice reveals fast integration of new genes into functional networks". bioRxiv 10.1101/510214 .
↑ Ruiz-Orera J, Hernandez-Rodriguez J, Chiva C, Sabidó E, Kondova I, Bontrop R, et al. (December 2015). "Origins of De Novo Genes in Human and Chimpanzee". PLOS Genetics. 11 (12) e1005721. arXiv: 1507.07744 . Bibcode:2015arXiv150707744R. doi: 10.1371/journal.pgen.1005721 . PMC 4697840 . PMID 26720152.
↑ MIYATA, TAKASHI; YASUNAGA, TERUO; NISHIDA, TOSHIRŌ (1980). "Nucleotide sequence divergence and functional constraint in mRNA evolution". Proceedings of the National Academy of Sciences of the United States of America. 77 (12): 7328–7332. Bibcode:1980PNAS...77.7328M. doi: 10.1073/pnas.77.12.7328 . PMC 350496 . PMID 6938980.
↑ Dohmen, Elias; Aubel, Margaux; Eicholt, Lars A.; Roginski, Paul; Luria, Victor; Karger, Amir; Grandchamp, Anna (2025-10-06). "DeNoFo: a file format and toolkit for standardized, comparable de novo gene annotation". Bioinformatics. 41 (10): btaf539. doi:10.1093/bioinformatics/btaf539. PMC 12516307 . PMID 41051215.{{cite journal}}: CS1 maint: article number as page number (link)
↑ Roginski, Paul; Grandchamp, Anna; Quignot, Chloé; Lopes, Anne (2024-08-30). "De Novo Emerged Gene Search in Eukaryotes with DENSE". Genome Biology and Evolution. 16 (8): evae159. doi:10.1093/gbe/evae159. PMC 11363675 . PMID 39212967.{{cite journal}}: CS1 maint: article number as page number (link)
↑ Vakirlis, Nikolaos; Acar, Omer; Cherupally, Vijay; Carvunis, Anne-Ruxandra (2024-07-15). "Ancestral Sequence Reconstruction as a Tool to Detect and Study De Novo Gene Emergence". Genome Biology and Evolution. 16 (8): evae151. doi:10.1093/gbe/evae151. PMC 11299112 . PMID 39004885.{{cite journal}}: CS1 maint: article number as page number (link)
1 2 Peng, Junhui; Zhao, Li (2024-01-27). "The origin and structural evolution of de novo genes in Drosophila". Nature Communications. 15 (1): 810. Bibcode:2024NatCo..15..810P. doi:10.1038/s41467-024-45028-1. PMC 10821953 . PMID 38280868.
1 2 3 4 5 6 7 Heames B, Schmitz J, Bornberg-Bauer E (May 2020). "A Continuum of Evolving De Novo Genes Drives Protein-Coding Novelty in Drosophila". Journal of Molecular Evolution. 88 (4): 382–398. Bibcode:2020JMolE..88..382H. doi:10.1007/s00239-020-09939-z. PMC 7162840 . PMID 32253450.
1 2 3 Durand É, Gagnon-Arsenault I, Hallin J, Hatin I, Dubé AK, Nielly-Thibault L, et al. (June 2019). "Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations". Genome Research. 29 (6): 932–943. doi: 10.1101/gr.239822.118 . PMC 6581059 . PMID 31152050.
1 2 3 4 5 Dowling D, Schmitz JF, Bornberg-Bauer E (November 2020). "Stochastic Gain and Loss of Novel Transcribed Open Reading Frames in the Human Lineage". Genome Biology and Evolution. 12 (11): 2183–2195. doi:10.1093/gbe/evaa194. PMC 7674706 . PMID 33210146.
1 2 Neme R, Tautz D (February 2013). "Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution". BMC Genomics. 14 (1): 117. Bibcode:2013BMCG...14..117N. doi: 10.1186/1471-2164-14-117 . PMC 3616865 . PMID 23433480.
1 2 3 4 5 Schmitz JF, Ullrich KK, Bornberg-Bauer E (October 2018). "Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover". Nature Ecology & Evolution. 2 (10): 1626–1632. Bibcode:2018NatEE...2.1626S. doi:10.1038/s41559-018-0639-7. PMID 30201962. S2CID 52181376.
1 2 Vakirlis N, Carvunis AR, McLysaght A (February 2020). "Synteny-based analyses indicate that sequence divergence is not the main source of orphan genes". eLife. 9 e53500. doi: 10.7554/eLife.53500 . PMC 7028367 . PMID 32066524.
1 2 3 Palmieri N, Kosiol C, Schlötterer C (February 2014). "The life cycle of Drosophila orphan genes". eLife. 3 e01311. arXiv: 1401.4956 . Bibcode:2014arXiv1401.4956P. doi: 10.7554/eLife.01311 . PMC 3927632 . PMID 24554240.
1 2 Prabh N, Roeseler W, Witte H, Eberhardt G, Sommer RJ, Rödelsperger C (November 2018). "Pristionchus nematodes". Genome Research. 28 (11): 1664–1674. doi:10.1101/gr.234971.118. PMC 6211646 . PMID 30232197.
1 2 Wissler L, Gadau J, Simola DF, Helmkampf M, Bornberg-Bauer E (2013). "Mechanisms and dynamics of orphan gene emergence in insect genomes". Genome Biology and Evolution. 5 (2): 439–55. doi:10.1093/gbe/evt009. PMC 3590893 . PMID 23348040.
1 2 3 4 Schmitz JF, Chain FJ, Bornberg-Bauer E (August 2020). "Evolution of novel genes in three-spined stickleback populations". Heredity. 125 (1–2): 50–59. Bibcode:2020Hered.125...50S. doi:10.1038/s41437-020-0319-7. PMC 7413265 . PMID 32499660.
↑ Neme R, Tautz D (February 2016). "Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence". eLife. 5 e09977. Bibcode:2016eLife...509977N. doi: 10.7554/eLife.09977 . PMC 4829534 . PMID 26836309.
↑ Kutter C, Watt S, Stefflova K, Wilson MD, Goncalves A, Ponting CP, Odom DT, Marques AC (2012). "Rapid turnover of long noncoding RNAs and the evolution of gene expression". PLOS Genetics. 8 (7) e1002841. doi: 10.1371/journal.pgen.1002841 . PMC 3406015 . PMID 22844254.
↑ Lebherz, Marie A.; Iyengar, Bharat Ravi; Bornberg-Bauer, Erich (2024-07-03). "Modeling Length Changes in De Novo Open Reading Frames during Neutral Evolution". Genome Biology and Evolution. 16 (7) evae129. doi:10.1093/gbe/evae129. PMC 11339603 . PMID 38879874.
1 2 3 4 5 Reinhardt JA, Wanjiru BM, Brant AT, Saelao P, Begun DJ, Jones CD (2013). "De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences". PLOS Genetics. 9 (10) e1003860. doi: 10.1371/journal.pgen.1003860 . PMC 3798262 . PMID 24146629.
1 2 3 Gubala AM, Schmitz JF, Kearns MJ, Vinh TT, Bornberg-Bauer E, Wolfner MF, Findlay GD (May 2017). "The Goddard and Saturn Genes Are Essential for Drosophila Male Fertility and May Have Arisen De Novo". Molecular Biology and Evolution. 34 (5): 1066–1082. doi:10.1093/molbev/msx057. PMC 5400382 . PMID 28104747.
1 2 3 Lange A, Patel PH, Heames B, Damry AM, Saenger T, Jackson CJ, et al. (March 2021). "Structural and functional characterization of a putative de novo gene in Drosophila". Nature Communications. 12 (1) 1667. Bibcode:2021NatCo..12.1667L. doi:10.1038/s41467-021-21667-6. PMC 7954818 . PMID 33712569.
↑ Zile K, Dessimoz C, Wurm Y, Masel J (August 2020). "Only a Single Taxonomically Restricted Gene Family in the Drosophila melanogaster Subgroup Can Be Identified with High Confidence". Genome Biology and Evolution. 12 (8): 1355–1366. doi:10.1093/gbe/evaa127. PMC 8059200 . PMID 32589737.
1 2 3 Zhuang X, Yang C, Murphy KR, Cheng CC (March 2019). "Molecular mechanism and history of non-sense to sense evolution of antifreeze glycoprotein gene in northern gadids". Proceedings of the National Academy of Sciences of the United States of America. 116 (10): 4400–4405. Bibcode:2019PNAS..116.4400Z. doi: 10.1073/pnas.1817138116 . PMC 6410882 . PMID 30765531.
1 2 Baalsrud HT, Tørresen OK, Solbakken MH, Salzburger W, Hanel R, Jakobsen KS, Jentoft S (March 2018). "De Novo Gene Evolution of Antifreeze Glycoproteins in Codfishes Revealed by Whole Genome Sequence Data". Molecular Biology and Evolution. 35 (3): 593–606. doi:10.1093/molbev/msx311. PMC 5850335 . PMID 29216381.
↑ Xie C, Bekpen C, Künzel S, Keshavarz M, Krebs-Wheaton R, Skrabar N, et al. (August 2019). "A de novo evolved gene in the house mouse regulates female pregnancy cycles". eLife. 8 e44392. doi: 10.7554/eLife.44392 . PMC 6760900 . PMID 31436535.
↑ Li D, Dong Y, Jiang Y, Jiang H, Cai J, Wang W (April 2010). "A de novo originated gene depresses budding yeast mating pathway and is repressed by the protein encoded by its antisense strand". Cell Research. 20 (4): 408–20. doi: 10.1038/cr.2010.31 . PMID 20195295.
↑ Li D, Yan Z, Lu L, Jiang H, Wang W (December 2014). "Pleiotropy of the de novo-originated gene MDF1". Scientific Reports. 4 7280. Bibcode:2014NatSR...4.7280L. doi:10.1038/srep07280. PMC 4250933 . PMID 25452167.
1 2 Moutinho AF, Eyre-Walker A, Dutheil JY (September 2022). "Strong evidence for the adaptive walk model of gene evolution in Drosophila and Arabidopsis". PLOS Biology. 20 (9) e3001775. doi: 10.1371/journal.pbio.3001775 . PMC 9470001 . PMID 36099311.
1 2 3 Ekman D, Elofsson A (February 2010). "Identifying and quantifying orphan protein sequences in fungi". Journal of Molecular Biology. 396 (2): 396–405. doi:10.1016/j.jmb.2009.11.053. PMID 19944701.
↑ Domazet-Loso T, Tautz D (October 2003). "An evolutionary analysis of orphan genes in Drosophila". Genome Research. 13 (10): 2213–2219. doi:10.1101/gr.1311003. PMC 403679 . PMID 14525923.
↑ Guo WJ, Li P, Ling J, Ye SP (2007). "Significant comparative characteristics between orphan and nonorphan genes in the rice (Oryza sativa L.) genome". Comparative and Functional Genomics. 2007 21676. doi: 10.1155/2007/21676 . PMC 2216055 . PMID 18273382.
↑ Wolf YI, Novichkov PS, Karev GP, Koonin EV, Lipman DJ (May 2009). "The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages". Proceedings of the National Academy of Sciences of the United States of America. 106 (18): 7273–7280. doi: 10.1073/pnas.0901808106 . PMC 2666616 . PMID 19351897.
1 2 Sun W, Zhao XW, Zhang Z (September 2015). "Identification and evolution of the orphan genes in the domestic silkworm, Bombyx mori". FEBS Letters. 589 (19 Pt B): 2731–2738. Bibcode:2015FEBSL.589.2731S. doi: 10.1016/j.febslet.2015.08.008 . PMID 26296317.
1 2 3 Donoghue MT, Keshavaiah C, Swamidatta SH, Spillane C (February 2011). "Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana". BMC Evolutionary Biology. 11 (1) 47. Bibcode:2011BMCEE..11...47D. doi: 10.1186/1471-2148-11-47 . PMC 3049755 . PMID 21332978.
1 2 3 4 Werner MS, Sieriebriennikov B, Prabh N, Loschko T, Lanz C, Sommer RJ (November 2018). "Young genes have distinct gene structure, epigenetic profiles, and transcriptional regulation". Genome Research. 28 (11): 1675–1687. doi:10.1101/gr.234872.118. PMC 6211652 . PMID 30232198.
1 2 3 4 5 Vakirlis N, Hebert AS, Opulente DA, Achaz G, Hittinger CT, Fischer G, et al. (March 2018). "A Molecular Portrait of De Novo Genes in Yeasts". Molecular Biology and Evolution. 35 (3): 631–645. doi:10.1093/molbev/msx315. PMC 5850487 . PMID 29220506.
↑ Foy SG, Wilson BA, Bertram J, Cordes MH, Masel J (April 2019). "A Shift in Aggregation Avoidance Strategy Marks a Long-Term Direction to Protein Evolution". Genetics. 211 (4): 1345–1355. doi:10.1534/genetics.118.301719. PMC 6456324 . PMID 30692195.
1 2 James JE, Willis SM, Nelson PG, Weibel C, Kosinski LJ, Masel J (January 2021). "Universal and taxon-specific trends in protein sequences as a function of age". eLife. 10 e57347. doi: 10.7554/eLife.57347 . PMC 7819706 . PMID 33416492.
1 2 Zhang JY, Zhou Q (January 2019). "On the Regulatory Evolution of New Genes Throughout Their Life History". Molecular Biology and Evolution. 36 (1): 15–27. doi: 10.1093/molbev/msy206 . PMID 30395322. S2CID 53216993.
↑ Wu B, Knudson A (July 2018). "De Novo Origin of Protein-Coding Genes in Yeast". mBio. 9 (4). doi: 10.1128/mBio.01024-18 . PMC 6069113 . PMID 30065088.
1 2 Bekpen C, Xie C, Tautz D (August 2018). "Dealing with the adaptive immune system during de novo evolution of genes from intergenic sequences". BMC Evolutionary Biology. 18 (1) 121. Bibcode:2018BMCEE..18..121B. doi: 10.1186/s12862-018-1232-z . PMC 6091031 . PMID 30075701.
↑ Pertea M, Shumate A, Pertea G, Varabyou A, Chang YC, Madugundu A, et al. (2018). "Thousands of large-scale RNA sequencing experiments yield a comprehensive new human gene list and reveal extensive transcriptional noise". bioRxiv 10.1101/332825 .
1 2 Peng, Junhui; Zhao, Li (2023-06-27). "The origin and structural evolution of de novo genes in Drosophila". bioRxiv 10.1101/2023.03.13.532420 .
↑ Nielly-Thibault L, Landry CR (August 2019). "Differences Between the Raw Material and the Products of de Novo Gene Birth Can Result from Mutational Biases". Genetics. 212 (4): 1353–1366. doi:10.1534/genetics.119.302187. PMC 6707459 . PMID 31227545.
↑ Vakirlis, Nikolaos; Acar, Omer; Hsu, Brian (2020-02-07). "De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences". Nature Communications. 11 (1) 781. Bibcode:2020NatCo..11..781V. doi:10.1038/s41467-020-14500-z. PMC 7005711 . PMID 32034123.
↑ Vakirlis, Nikolaos; Fuqua, Zachary (2025-09-25). "De novo transmembrane proteins may emerge from poly-A tracts". Journal of Evolutionary Biology. 38 (9): 1272–1277. doi: 10.1093/jeb/voaf089 . PMID 40650590.
↑ Tassios, Emilios; Nikolaou, Christoforos; Vakirlis, Nikolaos (2023-03-04). "Intergenic Regions of Saccharomycotina Yeasts are Enriched in Potential to Encode Transmembrane Domains". Molecular Biology and Evolution. 40 (3) msad059. doi:10.1093/molbev/msad059. PMC 10063215 . PMID 36917489.
↑ Heames, Brennen; Buchel, Filip; Aubel, Margaux (2023-04-06). "Experimental characterization of de novo proteins and their unevolved random-sequence counterparts". Nature Ecology & Evolution. 7 (4): 570–580. Bibcode:2023NatEE...7..570H. doi:10.1038/s41559-023-02010-2. PMC 10089919 . PMID 37024625.
↑ Aubel, Margaux; Buchel, Filip; Heames, Brennen (2024-04-02). "High-throughput Selection of Human de novo-emerged sORFs with High Folding Potential". Genome Biology and Evolution. 16 (4) evae069. doi:10.1093/gbe/evae069. PMC 11024478 . PMID 38597156.
↑ Frumkin, Idan; Laub, Michael T. (2023-11-09). "Selection of a de novo gene that can promote survival of Escherichia coli by modulating protein homeostasis pathways". Nature Ecology & Evolution. 7 (12): 2067–2079. Bibcode:2023NatEE...7.2067F. doi:10.1038/s41559-023-02224-4. PMC 10697842 . PMID 37945946.
↑ Frumkin, Idan; Vassallo, Christopher N.; Chen, Yi Hua; Laub, Michael T. (2025-10-21). "Emergence of antiphage functions from random sequence libraries reveals mechanisms of gene birth". Proceedings of the National Academy of Sciences of the United States of America. 122 (42) e2513255122. Bibcode:2025PNAS..12213255F. doi:10.1073/pnas.2513255122. PMC 12557735 . PMID 41091762.
↑ Kosinski L, Aviles N, Gomez K, Masel J (June 2022). "Random peptides rich in small and disorder-promoting amino acids are less likely to be harmful". Genome Biology and Evolution. 14 (6) evac085. doi:10.1093/gbe/evac085. PMC 9210321 . PMID 35668555.
↑ Peng, Yingying; et al. (August 2025). "Gene regulatory network integration underlies de novo gene evolution and developmental system drift". Nature Ecology & Evolution. 9 (8): 1487–1498. doi:10.1038/s41559-025-02747-y. PMID 40659874.
1 2 3 Basile W, Sachenkova O, Light S, Elofsson A (March 2017). "High GC content causes orphan proteins to be intrinsically disordered". PLOS Computational Biology. 13 (3) e1005375. Bibcode:2017PLSCB..13E5375B. doi: 10.1371/journal.pcbi.1005375 . PMC 5389847 . PMID 28355220.
↑ Bitard-Feildel T, Heberlein M, Bornberg-Bauer E, Callebaut I (December 2015). "Detection of orphan domains in Drosophila using "hydrophobic cluster analysis"". Biochimie. 119: 244–53. doi:10.1016/j.biochi.2015.02.019. PMID 25736992.
↑ Mukherjee S, Panda A, Ghosh TC (June 2015). "Elucidating evolutionary features and functional implications of orphan genes in Leishmania major". Infection, Genetics and Evolution. 32: 330–7. Bibcode:2015InfGE..32..330M. doi:10.1016/j.meegid.2015.03.031. PMID 25843649.
1 2 3 4 5 6 7 8 9 10 Wilson BA, Foy SG, Neme R, Masel J (June 2017). "Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth". Nature Ecology & Evolution. 1 (6) 0146: 0146–146. Bibcode:2017NatEE...1..146W. doi:10.1038/s41559-017-0146. PMC 5476217 . PMID 28642936.
↑ Jeon J, Choi J, Lee GW, Park SY, Huh A, Dean RA, et al. (February 2015). "Genome-wide profiling of DNA methylation provides insights into epigenetic regulation of fungal development in a plant pathogenic fungus, Magnaporthe oryzae". Scientific Reports. 5 8567. Bibcode:2015NatSR...5.8567J. doi:10.1038/srep08567. PMC 4338423 . PMID 25708804.
↑ Bornberg-Bauer E, Hlouchova K, Lange A (June 2021). "Structure and function of naturally evolved de novo proteins". Current Opinion in Structural Biology. 68: 175–183. doi: 10.1016/j.sbi.2020.11.010 . PMID 33567396.
↑ Eicholt, Lars A.; Aubel, Margaux; Berk, Katrin; Bornberg-Bauer, Erich; Lange, Andreas (2022-07-13). "Heterologous expression of naturally evolved putative de novo proteins with chaperones". Protein Science. 31 (8) e4371. Wiley. doi:10.1002/pro.4371. ISSN 0961-8368. PMC 9278007 . PMID 35900020.
↑ Pan X, Ye P, Yuan DS, Wang X, Bader JS, Boeke JD (March 2006). "A DNA integrity network in the yeast Saccharomyces cerevisiae". Cell. 124 (5): 1069–1081. doi: 10.1016/j.cell.2005.12.036 . PMID 16487579. S2CID 84338859.
↑ Hannon Bozorgmehr, Joseph (2024-02-05). "Four classic "de novo" genes all have plausible homologs and likely evolved from retro-duplicated or pseudogenic sequences". Molecular Genetics and Genomics. 299 (1) 6. doi:10.1007/s00438-023-02090-6. PMID 38315248.
↑ Seçkin, Ercan; Colinet, Dominique; Sarti, Edoardo; Danchin, Etienne G. J. (2025-11-25). "Orphan and de novo Genes in Fungi and Animals: Identification, Origins and Functions". Genome Biology and Evolution. 17 (12) evaf220. doi:10.1093/gbe/evaf220. PMC 12684174 . PMID 41289037.
1 2 3 Aubel, Margaux; Eicholt, Lars; Bornberg-Bauer, Erich (2023-03-29). "Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning". F1000Research. 12: 347. doi: 10.12688/f1000research.130443.1 . PMC 10126731 . PMID 37113259.
↑ Chen, Jianhai; Li, Qingrong; Xia, Shengqian; Arsala, Deanna; Sosa, Dylan; Wang, Dong; Long, Manyuan (June 2024). "The Rapid Evolution of De Novo Proteins in Structure and Complex". Genome Biology and Evolution. 16 (6): evae107. doi:10.1093/gbe/evae107. PMC 11149777 . PMID 38753069.{{cite journal}}: CS1 maint: article number as page number (link)
↑ Middendorf, Lasse; Iyengar, Bharat Ravi; Eicholt, Lars A (2024-08-05). "Sequence, Structure, and Functional Space of Drosophila De Novo Proteins". Genome Biology and Evolution. 16 (8): evae176. doi:10.1093/gbe/evae176. PMC 11363682 . PMID 39212966.{{cite journal}}: CS1 maint: article number as page number (link)
↑ Eicholt, Lars A (2026). "Structure and Disorder Predictions of Microproteins: Usage, Applications, and Pitfalls". In Wenkel, Stephan (ed.). Microproteins. Methods in Molecular Biology. Vol. 2992. New York, NY: Humana. pp. 129–150. doi:10.1007/978-1-0716-5013-4_10. ISBN 978-1-0716-5012-7. PMID 41241904.
↑ Middendorf, Lasse; Eicholt, Lars A (June 2024). "Random, de novo, and conserved proteins: How structure and disorder predictors perform differently". Proteins. 92 (6): 757–767. doi: 10.1002/prot.26652 . PMID 38226524.
↑ Liu, Jing (August 2023). "Do "Newly Born" orphan proteins resemble "Never Born" proteins? A study using three deep learning algorithms". Proteins. 91 (8): 1097–1115. doi: 10.1002/prot.26496 . PMID 37092778.
↑ David L, Huber W, Granovskaia M, Toedling J, Palm CJ, Bofkin L, et al. (April 2006). "A high-resolution map of transcription in the yeast genome". Proceedings of the National Academy of Sciences of the United States of America. 103 (14): 5320–5325. Bibcode:2006PNAS..103.5320D. doi: 10.1073/pnas.0601091103 . PMC 1414796 . PMID 16569694.
↑ Tisseur M, Kwapisz M, Morillon A (November 2011). "Pervasive transcription – Lessons from yeast". Biochimie. 93 (11): 1889–1896. doi:10.1016/j.biochi.2011.07.001. PMID 21771634.
↑ Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M (June 2008). "The transcriptional landscape of the yeast genome defined by RNA sequencing". Science. 320 (5881): 1344–1349. Bibcode:2008Sci...320.1344N. doi:10.1126/science.1158441. PMC 2951732 . PMID 18451266.
↑ Clark MB, Amaral PP, Schlesinger FJ, Dinger ME, Taft RJ, Rinn JL, et al. (July 2011). "The reality of pervasive transcription". PLOS Biology. 9 (7): e1000625, discussion e1001102. doi: 10.1371/journal.pbio.1000625 . PMC 3134446 . PMID 21765801.
1 2 Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJ, Jackson SE, et al. (September 2014). "Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes". Cell Reports. 8 (5): 1365–1379. doi:10.1016/j.celrep.2014.07.045. PMC 4216110 . PMID 25159147.
↑ Ruiz-Orera J, Verdaguer-Grau P, Villanueva-Cañas JL, Messeguer X, Albà MM (May 2018). "Translation of neutrally evolving peptides provides a basis for de novo gene evolution". Nature Ecology & Evolution. 2 (5): 890–896. Bibcode:2018NatEE...2..890R. doi:10.1038/s41559-018-0506-6. hdl: 10230/36048 . PMID 29556078. S2CID 4959952.
↑ Ruiz-Orera J, Messeguer X, Subirana JA, Alba MM (September 2014). "Long non-coding RNAs as a source of new peptides". eLife. 3 e03523. arXiv: 1405.4174 . Bibcode:2014arXiv1405.4174R. doi: 10.7554/eLife.03523 . PMC 4359382 . PMID 25233276.
1 2 3 Wilson BA, Masel J (2011). "Putatively noncoding transcripts show extensive association with ribosomes". Genome Biology and Evolution. 3: 1245–1252. doi:10.1093/gbe/evr099. PMC 3209793 . PMID 21948395.
↑ Chen J, Brunner AD, Cogan JZ, Nuñez JK, Fields AP, Adamson B, et al. (March 2020). "Pervasive functional translation of noncanonical human open reading frames". Science. 367 (6482): 1140–1146. Bibcode:2020Sci...367.1140C. doi:10.1126/science.aay0262. PMC 7289059 . PMID 32139545.
1 2 Silveira AB, Trontin C, Cortijo S, Barau J, Del Bem LE, Loudet O, et al. (April 2013). "Extensive natural epigenetic variation at a de novo originated gene". PLOS Genetics. 9 (4) e1003437. doi: 10.1371/journal.pgen.1003437 . PMC 3623765 . PMID 23593031.
↑ Kimmins S, Sassone-Corsi P (March 2005). "Chromatin remodelling and epigenetic features of germ cells". Nature. 434 (7033): 583–9. Bibcode:2005Natur.434..583K. doi:10.1038/nature03368. PMID 15800613. S2CID 4373304.
1 2 Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, et al. (November 2021). "Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution". Genome Research. 31 (12): 2303–2315. doi:10.1101/gr.275638.121. PMC 8647833 . PMID 34810219.
1 2 Vakirlis, Nikolaos; Vance, Zoe; Duggan, Kate M.; McLysaght, Aoife (2022-12-20). "De novo birth of functional microproteins in the human lineage". Cell Reports. 41 (12) 111808. doi: 10.1016/j.celrep.2022.111808 . ISSN 2211-1247. PMC 10073203 . PMID 36543139. S2CID 254966620.
↑ Dinger ME, Pang KC, Mercer TR, Mattick JS (November 2008). "Differentiating protein-coding and noncoding RNA: challenges and ambiguities". PLOS Computational Biology. 4 (11) e1000176. Bibcode:2008PLSCB...4E0176D. doi: 10.1371/journal.pcbi.1000176 . PMC 2518207 . PMID 19043537.
↑ Iyengar, Bharat Ravi; Bornberg-Bauer, Erich (2023-04-04). "Neutral Models of De Novo Gene Emergence Suggest that Gene Evolution has a Preferred Trajectory". Molecular Biology and Evolution. 40 (4): msad079. doi: 10.1093/molbev/msad079 . PMC 10118301 . PMID 37011142.{{cite journal}}: CS1 maint: article number as page number (link)
↑ Iyengar, Bharat Ravi; Grandchamp, Anna; Bornberg-Bauer, Erich (2024-07-23). "How antisense transcripts can evolve to encode novel proteins". Nature Communications. 15 (1) 6187. Bibcode:2024NatCo..15.6187I. doi: 10.1038/s41467-024-50550-3 . PMC 11266595 . PMID 39043684.
↑ Stewart NB, Rogers RL (September 2019). "Chromosomal rearrangements as a source of new gene formation in Drosophila yakuba". PLOS Genetics. 15 (9) e1008314. doi: 10.1371/journal.pgen.1008314 . PMC 6776367 . PMID 31545792.
↑ Swanson WJ, Vacquier VD (February 2002). "The rapid evolution of reproductive proteins". Nature Reviews Genetics. 3 (2): 137–44. doi:10.1038/nrg733. PMID 11836507. S2CID 25696990.
↑ Bustamante CD, Fledel-Alon A, Williamson S, Nielsen R, Hubisz MT, Glanowski S, Tanenbaum DM, White TJ, Sninsky JJ, Hernandez RD, Civello D, Adams MD, Cargill M, Clark AG (October 2005). "Natural selection on protein-coding genes in the human genome". Nature. 437 (7062): 1153–7. Bibcode:2005Natur.437.1153B. doi:10.1038/nature04240. PMID 16237444. S2CID 4423768.
↑ Clark NL, Aagaard JE, Swanson WJ (January 2006). "Evolution of reproductive proteins from animals and plants". Reproduction. 131 (1): 11–22. doi: 10.1530/rep.1.00357 . PMID 16388004.
↑ Rivard EL, Ludwig AG, Patel PH, Grandchamp A, Arnold SE, Berger A, et al. (September 2021). "A putative de novo evolved gene required for spermatid chromatin condensation in Drosophila melanogaster". PLOS Genetics. 17 (9) e1009787. doi: 10.1371/journal.pgen.1009787 . PMC 8445463 . PMID 34478447.
↑ Cridland JM, Majane AC, Zhao L, Begun DJ (January 2022). "Population biology of accessory gland-expressed de novo genes in Drosophila melanogaster". Genetics. 220 (1) iyab207. doi:10.1093/genetics/iyab207. PMC 8733444 . PMID 34791207.
↑ Witt, Evan; Benjamin, Sigi; Svetec, Nicolas; Zhao, Li (2019-08-16). Landry, Christian R; Wittkopp, Patricia J; White-Cooper, Helen (eds.). "Testis single-cell RNA-seq reveals the dynamics of de novo gene transcription and germline mutational bias in Drosophila". eLife. 8 e47138. doi: 10.7554/eLife.47138 . ISSN 2050-084X. PMC 6697446 . PMID 31418408. S2CID 198249413.
1 2 3 Luis Villanueva-Cañas J, Ruiz-Orera J, Agea MI, Gallo M, Andreu D, Albà MM (July 2017). "New Genes and Functional Innovation in Mammals". Genome Biology and Evolution. 9 (7): 1886–1900. doi:10.1093/gbe/evx136. PMC 5554394 . PMID 28854603.
↑ Schmidt EE (July 1996). "Transcriptional promiscuity in testes". Current Biology. 6 (7): 768–9. Bibcode:1996CBio....6..768S. doi: 10.1016/S0960-9822(02)00589-4 . PMID 8805310. S2CID 14318566.
↑ White-Cooper H, Davidson I (July 2011). "Unique aspects of transcription regulation in male germ cells". Cold Spring Harbor Perspectives in Biology. 3 (7) a002626. doi:10.1101/cshperspect.a002626. PMC 3119912 . PMID 21555408.
↑ Kleene KC (August 2001). "A possible meiotic function of the peculiar patterns of gene expression in mammalian spermatogenic cells". Mechanisms of Development. 106 (1–2): 3–23. doi: 10.1016/S0925-4773(01)00413-0 . PMID 11472831. S2CID 949694.
1 2 3 4 Rajon E, Masel J (January 2011). "Evolution of molecular error rates and the consequences for evolvability". Proceedings of the National Academy of Sciences of the United States of America. 108 (3): 1082–7. Bibcode:2011PNAS..108.1082R. doi: 10.1073/pnas.1012918108 . PMC 3024668 . PMID 21199946.
↑ Masel J (March 2006). "Cryptic genetic variation is enriched for potential adaptations". Genetics. 172 (3): 1985–1991. Bibcode:2006Genet.172.1985M. doi:10.1534/genetics.105.051649. PMC 1456269 . PMID 16387877.
↑ Casola C (2018). "From de novo to "de nono": most novel protein coding genes identified with phylostratigraphy represent old genes or recent duplicates". bioRxiv 10.1101/287193 .
↑ Willis S, Masel J (September 2018). "Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes". Genetics. 210 (1): 303–313. doi:10.1534/genetics.118.301249. PMC 6116962 . PMID 30026186.
↑ Abrusán G (December 2013). "Integration of new genes into cellular networks, and their structural maturation". Genetics. 195 (4): 1407–1417. doi:10.1534/genetics.113.152256. PMC 3832282 . PMID 24056411.
1 2 Vakirlis N, Acar O, Hsu B, Castilho Coelho N, Van Oss SB, Wacholder A, et al. (February 2020). "De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences". Nature Communications. 11 (1) 781. Bibcode:2020NatCo..11..781V. doi:10.1038/s41467-020-14500-z. PMC 7005711 . PMID 32034123.
↑ Giacomelli MG, Hancock AS, Masel J (February 2007). "The conversion of 3' UTRs into coding regions". Molecular Biology and Evolution. 24 (2): 457–464. doi:10.1093/molbev/msl172. PMC 1808353 . PMID 17099057.
1 2 3 Bornberg-Bauer E, Schmitz J, Heberlein M (October 2015). "Emergence of de novo proteins from 'dark genomic matter' by 'grow slow and moult'". Biochemical Society Transactions. 43 (5): 867–873. doi:10.1042/BST20150089. PMID 26517896.
↑ Wilder JA, Hewett EK, Gansner ME (December 2009). "Molecular evolution of GYPC: evidence for recent structural innovation and positive selection in humans". Molecular Biology and Evolution. 26 (12): 2679–2687. doi:10.1093/molbev/msp183. PMC 2775107 . PMID 19679754.
↑ Vakhrusheva AA, Kazanov MD, Mironov AA, Bazykin GA (February 2011). "Evolution of prokaryotic genes by shift of stop codons". Journal of Molecular Evolution. 72 (2): 138–146. Bibcode:2011JMolE..72..138V. doi:10.1007/s00239-010-9408-1. PMID 21082168. S2CID 812377.
↑ Andreatta ME, Levine JA, Foy SG, Guzman LD, Kosinski LJ, Cordes MH, Masel J (May 2015). "The Recent De Novo Origin of Protein C-Termini". Genome Biology and Evolution. 7 (6): 1686–1701. doi:10.1093/gbe/evv098. PMC 4494051 . PMID 26002864.
↑ Kleppe AS, Bornberg-Bauer E (November 2018). "Robustness by intrinsically disordered C-termini and translational readthrough". Nucleic Acids Research. 46 (19): 10184–10194. doi:10.1093/nar/gky778. PMC 6365619 . PMID 30247639.
↑ Klasberg S, Bitard-Feildel T, Callebaut I, Bornberg-Bauer E (July 2018). "Origins and structural properties of novel and de novo protein domains during insect evolution". The FEBS Journal. 285 (14): 2605–2625. doi: 10.1111/febs.14504 . PMID 29802682.
↑ Deng C, Cheng CH, Ye H, He X, Chen L (December 2010). "Evolution of an antifreeze protein by neofunctionalization under escape from adaptive conflict". Proceedings of the National Academy of Sciences of the United States of America. 107 (50): 21593–21598. Bibcode:2010PNAS..10721593D. doi: 10.1073/pnas.1007883107 . PMC 3003108 . PMID 21115821.
↑ Long M, VanKuren NW, Chen S, Vibranovski MD (2013). "New gene evolution: little did we know". Annual Review of Genetics. 47: 307–333. doi:10.1146/annurev-genet-111212-133301. PMC 4281893 . PMID 24050177.
↑ Chen, Jian-Hai; Landback, Patrick; Arsala, Deanna; Guzzetta, Alexander; Xia, Shengqian; Atlas, Jared; Sosa, Dylan; Zhang, Yong E.; Cheng, Jingqiu; Shen, Bairong; Long, Manyuan (2025-03-01). "Evolutionarily new genes in humans with disease phenotypes reveal functional enrichment patterns shaped by adaptive innovation and sexual selection". Genome Research. 35 (3): 379–392. doi:10.1101/gr.279498.124. ISSN 1088-9051. PMC 11960464 . PMID 39952680.
↑ Lee, UnJin; Mozeika, Shawn M.; Zhao, Li (2024-06-04). "A Synergistic, Cultivator Model of De Novo Gene Origination". Genome Biology and Evolution. 16 (6) evae103. doi:10.1093/gbe/evae103. PMC 11152449 . PMID 38748819.
↑ Chen S, Krinsky BH, Long M (September 2013). "New genes as drivers of phenotypic evolution". Nature Reviews Genetics. 14 (9): 645–60. doi:10.1038/nrg3521. PMC 4236023 . PMID 23949544.
↑ Suenaga Y, Islam SM, Alagu J, Kaneko Y, Kato M, Tanaka Y, et al. (January 2014). "NCYM, a Cis-antisense gene of MYCN, encodes a de novo evolved protein that inhibits GSK3β resulting in the stabilization of MYCN in human neuroblastomas". PLOS Genetics. 10 (1) e1003996. doi: 10.1371/journal.pgen.1003996 . PMC 3879166 . PMID 24391509.
↑ Lin B, White JT, Ferguson C, Bumgarner R, Friedman C, Trask B, et al. (February 2000). "PART-1: a novel human prostate-specific, androgen-regulated gene that maps to chromosome 5q12". Cancer Research. 60 (4): 858–63. PMID 10706094.
↑ Samusik N, Krukovskaya L, Meln I, Shilov E, Kozlov AP (2013). "PBOV1 is a human de novo gene with tumor-specific expression that is associated with a positive clinical outcome of cancer". PLOS ONE. 8 (2) e56162. Bibcode:2013PLoSO...856162S. doi: 10.1371/journal.pone.0056162 . PMC 3572036 . PMID 23418531.
↑ Guerzoni D, McLysaght A (April 2016). "De Novo Genes Arise at a Slow but Steady Rate along the Primate Lineage and Have Been Subject to Incomplete Lineage Sorting". Genome Biology and Evolution. 8 (4): 1222–32. doi:10.1093/gbe/evw074. PMC 4860702 . PMID 27056411.
↑ Pekarsky Y, Rynditch A, Wieser R, Fonatsch C, Gardiner K (September 1997). "Activation of a novel gene in 3q21 and identification of intergenic fusion transcripts with ecotropic viral insertion site I in leukemia". Cancer Research. 57 (18): 3914–9. PMID 9307271.
↑ Papamichos SI, Margaritis D, Kotsianidis I (2015). "Adaptive Evolution Coupled with Retrotransposon Exaptation Allowed for the Generation of a Human-Protein-Specific Coding Gene That Promotes Cancer Cell Proliferation and Metastasis in Both Haematological Malignancies and Solid Tumours: The Extraordinary Case of MYEOV Gene". Scientifica. 2015 984706. doi: 10.1155/2015/984706 . PMC 4629056 . PMID 26568894.
1 2 Kozlov AP (2016). "Expression of evolutionarily novel genes in tumors". Infectious Agents and Cancer. 11 34. doi: 10.1186/s13027-016-0077-6 . PMC 4949931 . PMID 27437030.
↑ Li CY, Zhang Y, Wang Z, Zhang Y, Cao C, Zhang PW, et al. (March 2010). "A human-specific de novo protein-coding gene associated with human brain functions". PLOS Computational Biology. 6 (3) e1000734. Bibcode:2010PLSCB...6E0734L. doi: 10.1371/journal.pcbi.1000734 . PMC 2845654 . PMID 20376170.
↑ Hannon Bozorgmehr J (December 2024). "The De Novo Emergence of Two Brain Genes in the Human Lineage Appears to be Unsupported". Journal of Molecular Evolution. 93 (1): 3–10. doi:10.1007/s00239-024-10227-3. PMID 39725692.
↑ Leushkin E, Kaessmann H (October 2024). "Identification of old coding regions disproves the hominoid de novo status of genes". Nature Ecology & Evolution. 8 (10): 1826–1830. Bibcode:2024NatEE...8.1826L. doi:10.1038/s41559-024-02513-6. PMID 39187607.
↑ Reinhart JA, Jones CD (December 2013). "Two rapidly evolving genes contribute to male fitness in Drosophila". Journal of Molecular Evolution. 77 (5): 246–259. Bibcode:2013JMolE..77..246R. doi:10.1007/s00239-013-9594-8. PMC 3880551 . PMID 24221639.
1 2 Zhang YE, Landback P, Vibranovski MD, Long M (October 2011). "Accelerated recruitment of new brain development genes into the human genome". PLOS Biology. 9 (10) e1001179. doi: 10.1371/journal.pbio.1001179 . PMC 3196496 . PMID 22028629.
↑ Wang J, Xie G, Singh M, Ghanbarian AT, Raskó T, Szvetnik A, et al. (December 2014). "Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells" (PDF). Nature. 516 (7531): 405–9. Bibcode:2014Natur.516..405W. doi:10.1038/nature13804. PMID 25317556. S2CID 205240839.
↑ Dolstra H, Fredrix H, Maas F, Coulie PG, Brasseur F, Mensink E, et al. (January 1999). "A human minor histocompatibility antigen specific for B cell acute lymphoblastic leukemia". The Journal of Experimental Medicine. 189 (2): 301–8. doi:10.1084/jem.189.2.301. PMC 2192993 . PMID 9892612.
↑ Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, et al. (January 2009). "InterPro: the integrative protein signature database". Nucleic Acids Research. 37 (Database issue): D211-5. doi:10.1093/nar/gkn785. PMC 2686546 . PMID 18940856.
↑ Murphy DN, McLysaght A (2012). "De novo origin of protein-coding genes in murine rodents". PLOS ONE. 7 (11) e48650. Bibcode:2012PLoSO...748650M. doi: 10.1371/journal.pone.0048650 . PMC 3504067 . PMID 23185269.
↑ Zhang L, Ren Y, Yang T, Li G, Chen J, Gschwend AR, et al. (April 2019). "Rapid evolution of protein diversity by de novo origination in Oryza". Nature Ecology & Evolution. 3 (4): 679–690. Bibcode:2019NatEE...3..679Z. doi:10.1038/s41559-019-0822-5. PMID 30858588. S2CID 73728579.
↑ Prabh N, Rödelsperger C (July 2019). "De Novo, Divergence, and Mixed Origin Contribute to the Emergence of Orphan Genes in Pristionchus Nematodes". G3. 9 (7): 2277–2286. doi:10.1534/g3.119.400326. PMC 6643871 . PMID 31088903.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:3-1] 1 2 Long M, Betrán E, Thornton K, Wang W (November 2003). "The origin of new genes: glimpses from the young and old". Nature Reviews Genetics. 4 (11): 865–75. doi:10.1038/nrg1204. PMID 14634634. S2CID 33999892.

[2] Wang W, Yu H, Long M (May 2004). "Duplication-degeneration as a mechanism of gene fission and the origin of new genes in Drosophila species". Nature Genetics. 36 (5): 523–7. Bibcode:2004NaGen..36..523W. doi: 10.1038/ng1338 . PMID 15064762.

[NAT-20191016-3] Levy A (October 2019). "How evolution builds genes from scratch". Nature. 574 (7778): 314–316. Bibcode:2019Natur.574..314L. doi: 10.1038/d41586-019-03061-x . PMID 31619796.

[4] Schmitz JF, Bornberg-Bauer E (2017). "Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA". F1000Research. 6: 57. Bibcode:2017JSPS....6...57S. doi: 10.12688/f1000research.10079.1 . PMC 5247788 . PMID 28163910.

[#25773713-5] 1 2 3 4 Schlötterer C (April 2015). "Genes from scratch--the evolutionary fate of de novo genes". Trends in Genetics. 31 (4): 215–9. doi:10.1016/j.tig.2015.02.007. PMC 4383367 . PMID 25773713.

[#20651121-6] 1 2 3 Kaessmann H (October 2010). "Origins, evolution, and phenotypic impact of new genes". Genome Research. 20 (10): 1313–26. doi:10.1101/gr.101386.109. PMC 2945180 . PMID 20651121.

[#860134-7] 1 2 Jacob F (June 1977). "Evolution and tinkering". Science. 196 (4295): 1161–1166. Bibcode:1977Sci...196.1161J. doi:10.1126/science.860134. PMID 860134. S2CID 29756896.

[:0-8] Van Oss SB, Carvunis AR (May 2019). "De novo gene birth". PLOS Genetics. 15 (5) e1008160. doi: 10.1371/journal.pgen.1008160 . PMC 6542195 . PMID 31120894.

[9] Fesenko, Igor; Shabalina, Svetlana A; Storz, Gisela; Koonin, Eugene V (2025-11-26). "De novo origin of numerous microproteins in enterobacteria". Nucleic Acids Research. 53 (22) gkaf1319. doi:10.1093/nar/gkaf1319. ISSN 0305-1048. PMC 12700099 . PMID 41385328.

[10] Khalturin K, Hemmrich G, Fraune S, Augustin R, Bosch TC (September 2009). "More than just orphans: are taxonomically-restricted genes important in evolution?". Trends in Genetics. 25 (9): 404–413. doi:10.1016/j.tig.2009.07.006. PMID 19716618.

[#21878963-11] 1 2 3 4 5 Tautz D, Domazet-Lošo T (August 2011). "The evolutionary origin of orphan genes". Nature Reviews. Genetics. 12 (10): 692–702. doi:10.1038/nrg3053. PMID 21878963. S2CID 31738556.

[12] Weisman, Caroline M.; Murray, Andrew W.; Eddy, Sean R. (2020-11-02). "Many, but not all, lineage-specific genes can be explained by homology detection failure". PLOS Biology. 18 (11) e3000862. doi: 10.1371/journal.pbio.3000862 . PMC 7660931 . PMID 33137085.

[13] Moyers, Bryan A.; Zhang, Jianzhi (January 2015). "Phylostratigraphic bias creates spurious patterns of genome evolution". Molecular Biology and Evolution. 32 (1): 258–267. doi:10.1093/molbev/msu286. PMC 4271527 . PMID 25312911.

[14] Moyers, Bryan A.; Zhang, Jianzhi (May 2016). "Evaluating Phylostratigraphic Evidence for Widespread De Novo Gene Birth in Genome Evolution". Molecular Biology and Evolution. 33 (5): 1245–1256. doi:10.1093/molbev/msw008. PMC 5010002 . PMID 26758516.

[Reinhardt2013-15] Reinhardt, Josephine A.; Wanjiru, B. M.; Brant, A. T.; Saelao, P.; Begun, David J.; Jones, Corbin D. (2013-10-17). "De Novo ORFs in Drosophila Are Important to Organismal Fitness and Evolved Rapidly from Previously Non-coding Sequences". PLOS Genetics. 9 (10) e1003860. doi: 10.1371/journal.pgen.1003860 . PMC 3798262 . PMID 24146629.

[Casola2025-16] Casola, Claudio; Luria, Victor; Vakirlis, Nikolaos; Zhao, Li (2025-11-28). "De Novo Genes: Current Status and Future Goals". Genome Biology and Evolution. 17 (12) evaf230. doi:10.1093/gbe/evaf230. PMC 12708343 . PMID 41313722.

[Zhao2024-17] Zhao, Li; Svetec, Nicolas; Begun, David J. (November 2024). "De Novo Genes". Annual Review of Genetics. 58 (1): 211–232. doi:10.1146/annurev-genet-111523-102413. PMC 12051474 . PMID 39088850.

[Grandchamp2025-18] 1 2 Grandchamp, Anna; Aubel, Margaux; Eicholt, Lars A.; Roginski, Paul; Luria, Victor; Karger, Amir; Dohmen, Elias (2025-10-23). "De Novo Gene Emergence: Summary, Classification, and Challenges of Current Methods". Genome Biology and Evolution. 17 (11): evaf197. doi:10.1093/gbe/evaf197. PMC 12605812 . PMID 41126639.{{cite journal}}: CS1 maint: article number as page number (link)

[19] Weisman, Caroline M. (August 2022). "The Origins and Functions of De Novo Genes: Against All Odds?". Journal of Molecular Evolution. 90 (3–): 244–257. Bibcode:2022JMolE..90..244W. doi:10.1007/s00239-022-10055-3. PMC 9233646 . PMID 35451603.

[20] Ohno S (1970) Evolution by Gene DuplicationAllen & Unwin; Springer-Verlag

[21] Tautz D (2014). "The discovery of de novo gene evolution". Perspectives in Biology and Medicine. 57 (1): 149–61. doi:10.1353/pbm.2014.0006. hdl: 11858/00-001M-0000-0024-3416-1 . PMID 25345708. S2CID 29552265.

[22] Grassé P-P (1977) Evolution of living organisms: evidence for a new theory of transformationAcademic Press

[23] Barrell BG, Air GM, Hutchison CA (November 1976). "Overlapping genes in bacteriophage phiX174". Nature. 264 (5581): 34–41. Bibcode:1976Natur.264...34B. doi:10.1038/264034a0. PMID 1004533. S2CID 4264796.

[24] Shaw DC, Walker JE, Northrop FD, Barrell BG, Godson GN, Fiddes JC (April 1978). "Gene K, a new overlapping gene in bacteriophage G4". Nature. 272 (5653): 510–5. Bibcode:1978Natur.272..510S. doi:10.1038/272510a0. PMID 692656. S2CID 4218777.

[25] Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, et al. (February 1977). "Nucleotide sequence of bacteriophage phi X174 DNA". Nature. 265 (5596): 687–95. Bibcode:1977Natur.265..687S. doi:10.1038/265687a0. PMID 870828. S2CID 4206886.

[26] Keese PK, Gibbs A (October 1992). "Origins of genes: "big bang" or continuous creation?". Proceedings of the National Academy of Sciences of the United States of America. 89 (20): 9489–93. Bibcode:1992PNAS...89.9489K. doi: 10.1073/pnas.89.20.9489 . PMC 50157 . PMID 1329098.

[27] Ohno S (April 1984). "Birth of a unique enzyme from an alternative reading frame of the preexisted, internally repetitious coding sequence". Proceedings of the National Academy of Sciences of the United States of America. 81 (8): 2421–5. Bibcode:1984PNAS...81.2421O. doi: 10.1073/pnas.81.8.2421 . PMC 345072 . PMID 6585807.

[28] Sabath N, Wagner A, Karlin D (December 2012). "Evolution of viral proteins originated de novo by overprinting". Molecular Biology and Evolution. 29 (12): 3767–80. doi:10.1093/molbev/mss179. PMC 3494269 . PMID 22821011.

[29] Makałowska I, Lin CF, Hernandez K (October 2007). "Birth and death of gene overlaps in vertebrates". BMC Evolutionary Biology. 7 (1): 193. Bibcode:2007BMCEE...7..193M. doi: 10.1186/1471-2148-7-193 . PMC 2151771 . PMID 17939861.

[30] Samandi S, Roy AV, Delcourt V, Lucier JF, Gagnon J, Beaudoin MC, et al. (October 2017). "Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins". eLife. 6 e27860. doi: 10.7554/eLife.27860 . PMC 5703645 . PMID 29083303.

[#32138667-31] 1 2 Khan YA, Jungreis I, Wright JC, Mudge JM, Choudhary JS, Firth AE, Kellis M (March 2020). "Evidence for a novel overlapping coding sequence in POLG initiated at a CUG start codon". BMC Genetics. 21 (1) 25. doi: 10.1186/s12863-020-0828-7 . PMC 7059407 . PMID 32138667.

[32] Makałowski W, Mitchell GA, Labuda D (June 1994). "Alu sequences in the coding regions of mRNA: a source of protein variability". Trends in Genetics. 10 (6): 188–93. doi:10.1016/0168-9525(94)90254-2. PMID 8073532.

[33] Sorek R (October 2007). "The birth of new exons: mechanisms and evolutionary consequences". RNA. 13 (10): 1603–8. doi:10.1261/rna.682507. PMC 1986822 . PMID 17709368.

[#1822278-34] 1 2 Dorit RL, Gilbert W (December 1991). "The limited universe of exons". Current Opinion in Genetics & Development. 1 (4): 464–9. doi:10.1016/S0959-437X(05)80193-5. PMID 1822278.

[35] Chothia C (June 1992). "Proteins. One thousand families for the molecular biologist". Nature. 357 (6379): 543–4. Bibcode:1992Natur.357..543C. doi: 10.1038/357543a0 . PMID 1608464. S2CID 4355476.

[36] Oliver SG, van der Aart QJ, Agostoni-Carbone ML, Aigle M, Alberghina L, Alexandraki D, et al. (May 1992). "The complete DNA sequence of yeast chromosome III". Nature. 357 (6373): 38–46. Bibcode:1992Natur.357...38O. doi:10.1038/357038a0. PMID 1574125. S2CID 4271784.

[#8763498-37] 1 2 Dujon B (July 1996). "The yeast genome project: what did we learn?". Trends in Genetics. 12 (7): 263–70. doi:10.1016/0168-9525(96)10027-5. PMID 8763498.

[#17435230-38] 1 2 3 Begun DJ, Lindfors HA, Kern AD, Jones CD (June 2007). "Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade". Genetics. 176 (2): 1131–7. Bibcode:2007Genet.176.1131B. doi:10.1534/genetics.106.069245. PMC 1894579 . PMID 17435230.

[#16777968-39] 1 2 3 Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ (June 2006). "Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression". Proceedings of the National Academy of Sciences of the United States of America. 103 (26): 9935–9. Bibcode:2006PNAS..103.9935L. doi: 10.1073/pnas.0509809103 . PMC 1502557 . PMID 16777968.

[#16361246-40] 1 2 3 4 Begun DJ, Lindfors HA, Thompson ME, Holloway AK (March 2006). "Recently evolved genes identified from Drosophila yakuba and D. erecta accessory gland expressed sequence tags". Genetics. 172 (3): 1675–81. doi:10.1534/genetics.105.050336. PMC 1456303 . PMID 16361246.

[#26323763-41] 1 2 McLysaght A, Guerzoni D (September 2015). "New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation". Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 370 (1678) 20140332. doi:10.1098/rstb.2014.0332. PMC 4571571 . PMID 26323763.

[Cai2008-42] 1 2 3 Cai J, Zhao R, Jiang H, Wang W (May 2008). "De novo origination of a new protein-coding gene in Saccharomyces cerevisiae". Genetics. 179 (1): 487–96. Bibcode:2008Genet.179..487C. doi:10.1534/genetics.107.084491. PMC 2390625 . PMID 18493065.

[Bungard2017-43] 1 2 3 Bungard D, Copple JS, Yan J, Chhun JJ, Kumirov VK, Foy SG, et al. (November 2017). "Foldability of a Natural De Novo Evolved Protein". Structure. 25 (11): 1687–1696.e4. doi:10.1016/j.str.2017.09.006. PMC 5677532 . PMID 29033289.

[:6-44] 1 2 Li L, Foster CM, Gan Q, Nettleton D, James MG, Myers AM, et al. (May 2009). "Identification of the novel protein QQS as a component of the starch metabolic network in Arabidopsis leaves". The Plant Journal. 58 (3): 485–98. Bibcode:2009PlJ....58..485L. doi: 10.1111/j.1365-313X.2009.03793.x . PMID 19154206.

[#19733073-45] 1 2 Heinen TJ, Staubach F, Häming D, Tautz D (September 2009). "Emergence of a new gene from an intergenic region". Current Biology. 19 (18): 1527–31. Bibcode:2009CBio...19.1527H. doi: 10.1016/j.cub.2009.07.049 . PMID 19733073. S2CID 12446879.

[#19064677-46] 1 2 3 4 5 6 7 8 Toll-Riera M, Bosch N, Bellora N, Castelo R, Armengol L, Estivill X, et al. (March 2009). "Origin of primate orphan genes: a comparative genomics approach". Molecular Biology and Evolution. 26 (3): 603–12. doi: 10.1093/molbev/msn281 . PMID 19064677.

[#19726446-47] 1 2 3 Knowles DG, McLysaght A (October 2009). "Recent de novo origin of human protein-coding genes". Genome Research. 19 (10): 1752–9. doi:10.1101/gr.095026.109. PMC 2765279 . PMID 19726446.

[#19726569-48] Wu, et al. (November 2011). "De Novo Origin of Human Protein-Coding Genes". PLOS Genetics. 7 (11) e1002379. doi: 10.1371/journal.pgen.1002379 . PMC 3213175 . PMID 22102831.

[49] Pereira, Andres Barboza; Marano, Matthew; Bathala, Ramya; Zaragoza, Rigoberto Ayala; Neira, Andres; Samano, Alex; Owoyemi, Adekola; Casola, Claudio (January 2025). "Orphan genes are not a distinct biological entity". BioEssays. 47 (1) e2400146. doi:10.1002/bies.202400146. PMC 11662153 . PMID 39491810.

[#18029048-50] 1 2 Domazet-Loso T, Brajković J, Tautz D (November 2007). "A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages". Trends in Genetics. 23 (11): 533–9. doi:10.1016/j.tig.2007.08.014. PMID 18029048.

[#26116928-51] 1 2 3 Gehrmann T, Reinders MJ (November 2015). "Proteny: discovering and visualizing statistically significant syntenic clusters at the proteome level". Bioinformatics. 31 (21): 3437–44. doi:10.1093/bioinformatics/btv389. PMC 4612220 . PMID 26116928.

[52] Świrski, Michał I.; Tierney, Matthew T.; Albà, Mar; et al. (October 2025). "Translons: a common name for all translated regions". Nature Methods. 22 (10): 2002–2006. doi:10.1038/s41592-025-02810-3. hdl: 10261/405271 . PMID 40890551.

[53] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (October 1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–10. doi:10.1016/S0022-2836(05)80360-2. PMID 2231712. S2CID 14441902.

[#27452112-54] 1 2 3 4 5 6 McLysaght A, Hurst LD (September 2016). "Open questions in the study of de novo genes: what, how and why". Nature Reviews Genetics. 17 (9): 567–78. doi:10.1038/nrg.2016.78. PMID 27452112. S2CID 6033249.^{[ permanent dead link ]}

[55] Elhaik E, Sabath N, Graur D (January 2006). "The "inverse relationship between evolutionary rate and age of mammalian genes" is an artifact of increased genetic distance with rate of evolution and time of divergence". Molecular Biology and Evolution. 23 (1): 1–3. doi: 10.1093/molbev/msj006 . PMID 16151190.

[56] Albà MM, Castresana J (April 2007). "On homology searches by protein Blast and the characterization of the age of genes". BMC Evolutionary Biology. 7 (1): 53. Bibcode:2007BMCEE...7...53A. doi: 10.1186/1471-2148-7-53 . PMC 1855329 . PMID 17408474.

[57] Moyers BA, Zhang J (May 2016). "Evaluating Phylostratigraphic Evidence for Widespread De Novo Gene Birth in Genome Evolution". Molecular Biology and Evolution. 33 (5): 1245–56. doi:10.1093/molbev/msw008. PMC 5010002 . PMID 26758516.

[#25312911-58] Moyers BA, Zhang J (January 2015). "Phylostratigraphic bias creates spurious patterns of genome evolution". Molecular Biology and Evolution. 32 (1): 258–67. doi:10.1093/molbev/msu286. PMC 4271527 . PMID 25312911.

[#28087778-59] 1 2 Domazet-Lošo T, Carvunis AR, Albà MM, Šestak MS, Bakaric R, Neme R, et al. (April 2017). "No Evidence for Phylostratigraphic Bias Impacting Inferences on Patterns of Gene Emergence and Evolution". Molecular Biology and Evolution. 34 (4): 843–856. doi:10.1093/molbev/msw284. PMC 5400388 . PMID 28087778.

[60] Tassios, Emilios; de Leuw, Jori; Nikolaou, Christoforos; Kupczok, Anne; Vakirlis, Nikolaos (2025-12-27). "Machine learning can distinguish orphans that have resulted from sequence divergence beyond recognition". Bioinformatics Advances vbaf324. doi: 10.1093/bioadv/vbaf324 .

[61] Ghiurcuta CG, Moret BM (June 2014). "Evaluating synteny for improved comparative studies". Bioinformatics. 30 (12): i9-18. doi:10.1093/bioinformatics/btu259. PMC 4058928 . PMID 24932010.

[62] Jean G, Nikolski M (2011). "SyDiG: uncovering Synteny in Distant Genomes" (PDF). International Journal of Bioinformatics Research and Applications. 7 (1) 39169: 43–62. doi:10.1504/IJBRA.2011.039169. PMID 21441096. S2CID 2644451.

[63] Liu D, Hunt M, Tsai IJ (January 2018). "Inferring synteny between genome assemblies: a systematic evaluation". BMC Bioinformatics. 19 (1) 26. Bibcode:2018BMCBi..19...26L. doi: 10.1186/s12859-018-2026-4 . PMC 5791376 . PMID 29382321.

[64] Ranz JM, Casals F, Ruiz A (February 2001). "How malleable is the eukaryotic genome? Extreme rate of chromosomal rearrangement in the genus Drosophila". Genome Research. 11 (2): 230–9. doi:10.1101/gr.162901. PMC 311025 . PMID 11157786.

[#28981695-65] 1 2 Lu TC, Leu JY, Lin WC (November 2017). "A Comprehensive Analysis of Transcript-Supported De Novo Genes in Saccharomyces sensu stricto Yeasts". Molecular Biology and Evolution. 34 (11): 2823–2838. doi:10.1093/molbev/msx210. PMC 5850716 . PMID 28981695.

[#27401176-66] 1 2 3 4 Li ZW, Chen X, Wu Q, Hagmann J, Han TS, Zou YP, Ge S, Guo YL (August 2016). "On the Origin of De Novo Genes in Arabidopsis thaliana Populations". Genome Biology and Evolution. 8 (7): 2190–202. doi:10.1093/gbe/evw164. PMC 4987118 . PMID 27401176.

[#21164016-67] 1 2 3 4 5 6 7 8 Chen S, Zhang YE, Long M (December 2010). "New genes in Drosophila quickly become essential". Science. 330 (6011): 1682–5. Bibcode:2010Sci...330.1682C. doi:10.1126/science.1196380. PMC 7211344 . PMID 21164016. S2CID 7899890.

[#24457212-68] 1 2 3 4 5 6 7 Zhao L, Saelao P, Jones CD, Begun DJ (February 2014). "Origin and spread of de novo genes in Drosophila melanogaster populations". Science. 343 (6172): 769–72. Bibcode:2014Sci...343..769Z. doi:10.1126/science.1248286. PMC 4391638 . PMID 24457212.

[#18550802-69] 1 2 3 4 Zhou Q, Zhang G, Zhang Y, Xu S, Zhao R, Zhan Z, et al. (September 2008). "On the origin of new genes in Drosophila". Genome Research. 18 (9): 1446–55. doi:10.1101/gr.076588.108. PMC 2527705 . PMID 18550802.

[#22102831-70] 1 2 3 4 5 6 7 Wu DD, Irwin DM, Zhang YP (November 2011). "De novo origin of human protein-coding genes". PLOS Genetics. 7 (11) e1002379. doi: 10.1371/journal.pgen.1002379 . PMC 3213175 . PMID 22102831.

[:1-71] Vakirlis N, McLysaght A (2019). "Computational Prediction of de Novo Emerged Protein-Coding Genes". Computational Methods in Protein Evolution. Methods in Molecular Biology. Vol. 1851. Springer. pp. 63–81. doi:10.1007/978-1-4939-8736-8_4. ISBN 978-1-4939-8735-1. PMID 30298392. S2CID 52942639.

[#22722833-72] 1 2 3 4 5 6 7 8 9 10 11 12 Carvunis AR, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, et al. (July 2012). "Proto-genes and de novo gene birth". Nature. 487 (7407): 370–374. Bibcode:2012Natur.487..370C. doi:10.1038/nature11184. PMC 3401362 . PMID 22722833.

[#24814287-73] Doolittle WF, Brunet TD, Linquist S, Gregory TR (May 2014). "Distinguishing between "function" and "effect" in genome biology". Genome Biology and Evolution. 6 (5): 1234–1237. doi:10.1093/gbe/evu098. PMC 4041003 . PMID 24814287.

[#24753594-74] 1 2 Kellis M, Wold B, Snyder MP, Bernstein BE, Kundaje A, Marinov GK, et al. (April 2014). "Defining functional DNA elements in the human genome". Proceedings of the National Academy of Sciences of the United States of America. 111 (17): 6131–6138. Bibcode:2014PNAS..111.6131K. doi: 10.1073/pnas.1318948111 . PMC 4035993 . PMID 24753594.

[#31674305-75] 1 2 3 Keeling DM, Garza P, Nartey CM, Carvunis AR (November 2019). "The meanings of 'function' in biology and the problematic case of de novo gene emergence". eLife. 8 e47014. doi: 10.7554/eLife.47014 . PMC 6824840 . PMID 31674305.

[76] Andersson DI, Jerlström-Hultqvist J, Näsvall J (June 2015). "Evolution of new functions de novo and from preexisting genes". Cold Spring Harbor Perspectives in Biology. 7 (6) a017996. Bibcode:2015CSHPB...717996A. doi:10.1101/cshperspect.a017996. PMC 4448608 . PMID 26032716.

[#510214-77] Xie C, Bekpen C, Künzel S, Keshavarz M, Krebs-Wheaton R, Skrabar N, et al. (January 2019). "Studying the dawn of de novo gene emergence in mice reveals fast integration of new genes into functional networks". bioRxiv 10.1101/510214 .

[78] Ruiz-Orera J, Hernandez-Rodriguez J, Chiva C, Sabidó E, Kondova I, Bontrop R, et al. (December 2015). "Origins of De Novo Genes in Human and Chimpanzee". PLOS Genetics. 11 (12) e1005721. arXiv: 1507.07744 . Bibcode:2015arXiv150707744R. doi: 10.1371/journal.pgen.1005721 . PMC 4697840 . PMID 26720152.

[79] MIYATA, TAKASHI; YASUNAGA, TERUO; NISHIDA, TOSHIRŌ (1980). "Nucleotide sequence divergence and functional constraint in mRNA evolution". Proceedings of the National Academy of Sciences of the United States of America. 77 (12): 7328–7332. Bibcode:1980PNAS...77.7328M. doi: 10.1073/pnas.77.12.7328 . PMC 350496 . PMID 6938980.

[Dohmen2025-80] Dohmen, Elias; Aubel, Margaux; Eicholt, Lars A.; Roginski, Paul; Luria, Victor; Karger, Amir; Grandchamp, Anna (2025-10-06). "DeNoFo: a file format and toolkit for standardized, comparable de novo gene annotation". Bioinformatics. 41 (10): btaf539. doi:10.1093/bioinformatics/btaf539. PMC 12516307 . PMID 41051215.{{cite journal}}: CS1 maint: article number as page number (link)

[Roginski2024-81] Roginski, Paul; Grandchamp, Anna; Quignot, Chloé; Lopes, Anne (2024-08-30). "De Novo Emerged Gene Search in Eukaryotes with DENSE". Genome Biology and Evolution. 16 (8): evae159. doi:10.1093/gbe/evae159. PMC 11363675 . PMID 39212967.{{cite journal}}: CS1 maint: article number as page number (link)

[Vakirlis2024-82] Vakirlis, Nikolaos; Acar, Omer; Cherupally, Vijay; Carvunis, Anne-Ruxandra (2024-07-15). "Ancestral Sequence Reconstruction as a Tool to Detect and Study De Novo Gene Emergence". Genome Biology and Evolution. 16 (8): evae151. doi:10.1093/gbe/evae151. PMC 11299112 . PMID 39004885.{{cite journal}}: CS1 maint: article number as page number (link)

[Peng2024-83] 1 2 Peng, Junhui; Zhao, Li (2024-01-27). "The origin and structural evolution of de novo genes in Drosophila". Nature Communications. 15 (1): 810. Bibcode:2024NatCo..15..810P. doi:10.1038/s41467-024-45028-1. PMC 10821953 . PMID 38280868.

[Heames2020-84] 1 2 3 4 5 6 7 Heames B, Schmitz J, Bornberg-Bauer E (May 2020). "A Continuum of Evolving De Novo Genes Drives Protein-Coding Novelty in Drosophila". Journal of Molecular Evolution. 88 (4): 382–398. Bibcode:2020JMolE..88..382H. doi:10.1007/s00239-020-09939-z. PMC 7162840 . PMID 32253450.

[#31152050-85] 1 2 3 Durand É, Gagnon-Arsenault I, Hallin J, Hatin I, Dubé AK, Nielly-Thibault L, et al. (June 2019). "Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations". Genome Research. 29 (6): 932–943. doi: 10.1101/gr.239822.118 . PMC 6581059 . PMID 31152050.

[Dowling2020-86] 1 2 3 4 5 Dowling D, Schmitz JF, Bornberg-Bauer E (November 2020). "Stochastic Gain and Loss of Novel Transcribed Open Reading Frames in the Human Lineage". Genome Biology and Evolution. 12 (11): 2183–2195. doi:10.1093/gbe/evaa194. PMC 7674706 . PMID 33210146.

[#23433480-87] 1 2 Neme R, Tautz D (February 2013). "Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution". BMC Genomics. 14 (1): 117. Bibcode:2013BMCG...14..117N. doi: 10.1186/1471-2164-14-117 . PMC 3616865 . PMID 23433480.

[Schmitz2018-88] 1 2 3 4 5 Schmitz JF, Ullrich KK, Bornberg-Bauer E (October 2018). "Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover". Nature Ecology & Evolution. 2 (10): 1626–1632. Bibcode:2018NatEE...2.1626S. doi:10.1038/s41559-018-0639-7. PMID 30201962. S2CID 52181376.

[#32066524-89] 1 2 Vakirlis N, Carvunis AR, McLysaght A (February 2020). "Synteny-based analyses indicate that sequence divergence is not the main source of orphan genes". eLife. 9 e53500. doi: 10.7554/eLife.53500 . PMC 7028367 . PMID 32066524.

[#24554240-90] 1 2 3 Palmieri N, Kosiol C, Schlötterer C (February 2014). "The life cycle of Drosophila orphan genes". eLife. 3 e01311. arXiv: 1401.4956 . Bibcode:2014arXiv1401.4956P. doi: 10.7554/eLife.01311 . PMC 3927632 . PMID 24554240.

[#30232197-91] 1 2 Prabh N, Roeseler W, Witte H, Eberhardt G, Sommer RJ, Rödelsperger C (November 2018). "Pristionchus nematodes". Genome Research. 28 (11): 1664–1674. doi:10.1101/gr.234971.118. PMC 6211646 . PMID 30232197.

[#23348040-92] 1 2 Wissler L, Gadau J, Simola DF, Helmkampf M, Bornberg-Bauer E (2013). "Mechanisms and dynamics of orphan gene emergence in insect genomes". Genome Biology and Evolution. 5 (2): 439–55. doi:10.1093/gbe/evt009. PMC 3590893 . PMID 23348040.

[Schmitz2020-93] 1 2 3 4 Schmitz JF, Chain FJ, Bornberg-Bauer E (August 2020). "Evolution of novel genes in three-spined stickleback populations". Heredity. 125 (1–2): 50–59. Bibcode:2020Hered.125...50S. doi:10.1038/s41437-020-0319-7. PMC 7413265 . PMID 32499660.

[94] Neme R, Tautz D (February 2016). "Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence". eLife. 5 e09977. Bibcode:2016eLife...509977N. doi: 10.7554/eLife.09977 . PMC 4829534 . PMID 26836309.

[95] Kutter C, Watt S, Stefflova K, Wilson MD, Goncalves A, Ponting CP, Odom DT, Marques AC (2012). "Rapid turnover of long noncoding RNAs and the evolution of gene expression". PLOS Genetics. 8 (7) e1002841. doi: 10.1371/journal.pgen.1002841 . PMC 3406015 . PMID 22844254.

[96] Lebherz, Marie A.; Iyengar, Bharat Ravi; Bornberg-Bauer, Erich (2024-07-03). "Modeling Length Changes in De Novo Open Reading Frames during Neutral Evolution". Genome Biology and Evolution. 16 (7) evae129. doi:10.1093/gbe/evae129. PMC 11339603 . PMID 38879874.

[:7-97] 1 2 3 4 5 Reinhardt JA, Wanjiru BM, Brant AT, Saelao P, Begun DJ, Jones CD (2013). "De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences". PLOS Genetics. 9 (10) e1003860. doi: 10.1371/journal.pgen.1003860 . PMC 3798262 . PMID 24146629.

[Gubala2017-98] 1 2 3 Gubala AM, Schmitz JF, Kearns MJ, Vinh TT, Bornberg-Bauer E, Wolfner MF, Findlay GD (May 2017). "The Goddard and Saturn Genes Are Essential for Drosophila Male Fertility and May Have Arisen De Novo". Molecular Biology and Evolution. 34 (5): 1066–1082. doi:10.1093/molbev/msx057. PMC 5400382 . PMID 28104747.

[Lange2021-99] 1 2 3 Lange A, Patel PH, Heames B, Damry AM, Saenger T, Jackson CJ, et al. (March 2021). "Structural and functional characterization of a putative de novo gene in Drosophila". Nature Communications. 12 (1) 1667. Bibcode:2021NatCo..12.1667L. doi:10.1038/s41467-021-21667-6. PMC 7954818 . PMID 33712569.

[100] Zile K, Dessimoz C, Wurm Y, Masel J (August 2020). "Only a Single Taxonomically Restricted Gene Family in the Drosophila melanogaster Subgroup Can Be Identified with High Confidence". Genome Biology and Evolution. 12 (8): 1355–1366. doi:10.1093/gbe/evaa127. PMC 8059200 . PMID 32589737.

[Zhuang2019-101] 1 2 3 Zhuang X, Yang C, Murphy KR, Cheng CC (March 2019). "Molecular mechanism and history of non-sense to sense evolution of antifreeze glycoprotein gene in northern gadids". Proceedings of the National Academy of Sciences of the United States of America. 116 (10): 4400–4405. Bibcode:2019PNAS..116.4400Z. doi: 10.1073/pnas.1817138116 . PMC 6410882 . PMID 30765531.

[Baalsrud2018-102] 1 2 Baalsrud HT, Tørresen OK, Solbakken MH, Salzburger W, Hanel R, Jakobsen KS, Jentoft S (March 2018). "De Novo Gene Evolution of Antifreeze Glycoproteins in Codfishes Revealed by Whole Genome Sequence Data". Molecular Biology and Evolution. 35 (3): 593–606. doi:10.1093/molbev/msx311. PMC 5850335 . PMID 29216381.

[103] Xie C, Bekpen C, Künzel S, Keshavarz M, Krebs-Wheaton R, Skrabar N, et al. (August 2019). "A de novo evolved gene in the house mouse regulates female pregnancy cycles". eLife. 8 e44392. doi: 10.7554/eLife.44392 . PMC 6760900 . PMID 31436535.

[:4-104] Li D, Dong Y, Jiang Y, Jiang H, Cai J, Wang W (April 2010). "A de novo originated gene depresses budding yeast mating pathway and is repressed by the protein encoded by its antisense strand". Cell Research. 20 (4): 408–20. doi: 10.1038/cr.2010.31 . PMID 20195295.

[:5-105] Li D, Yan Z, Lu L, Jiang H, Wang W (December 2014). "Pleiotropy of the de novo-originated gene MDF1". Scientific Reports. 4 7280. Bibcode:2014NatSR...4.7280L. doi:10.1038/srep07280. PMC 4250933 . PMID 25452167.

[:12-106] 1 2 Moutinho AF, Eyre-Walker A, Dutheil JY (September 2022). "Strong evidence for the adaptive walk model of gene evolution in Drosophila and Arabidopsis". PLOS Biology. 20 (9) e3001775. doi: 10.1371/journal.pbio.3001775 . PMC 9470001 . PMID 36099311.

[#19944701-107] 1 2 3 Ekman D, Elofsson A (February 2010). "Identifying and quantifying orphan protein sequences in fungi". Journal of Molecular Biology. 396 (2): 396–405. doi:10.1016/j.jmb.2009.11.053. PMID 19944701.

[108] Domazet-Loso T, Tautz D (October 2003). "An evolutionary analysis of orphan genes in Drosophila". Genome Research. 13 (10): 2213–2219. doi:10.1101/gr.1311003. PMC 403679 . PMID 14525923.

[109] Guo WJ, Li P, Ling J, Ye SP (2007). "Significant comparative characteristics between orphan and nonorphan genes in the rice (Oryza sativa L.) genome". Comparative and Functional Genomics. 2007 21676. doi: 10.1155/2007/21676 . PMC 2216055 . PMID 18273382.

[110] Wolf YI, Novichkov PS, Karev GP, Koonin EV, Lipman DJ (May 2009). "The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages". Proceedings of the National Academy of Sciences of the United States of America. 106 (18): 7273–7280. doi: 10.1073/pnas.0901808106 . PMC 2666616 . PMID 19351897.

[#26296317-111] 1 2 Sun W, Zhao XW, Zhang Z (September 2015). "Identification and evolution of the orphan genes in the domestic silkworm, Bombyx mori". FEBS Letters. 589 (19 Pt B): 2731–2738. Bibcode:2015FEBSL.589.2731S. doi: 10.1016/j.febslet.2015.08.008 . PMID 26296317.

[#21332978-112] 1 2 3 Donoghue MT, Keshavaiah C, Swamidatta SH, Spillane C (February 2011). "Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana". BMC Evolutionary Biology. 11 (1) 47. Bibcode:2011BMCEE..11...47D. doi: 10.1186/1471-2148-11-47 . PMC 3049755 . PMID 21332978.

[#30232198-113] 1 2 3 4 Werner MS, Sieriebriennikov B, Prabh N, Loschko T, Lanz C, Sommer RJ (November 2018). "Young genes have distinct gene structure, epigenetic profiles, and transcriptional regulation". Genome Research. 28 (11): 1675–1687. doi:10.1101/gr.234872.118. PMC 6211652 . PMID 30232198.

[#29220506-114] 1 2 3 4 5 Vakirlis N, Hebert AS, Opulente DA, Achaz G, Hittinger CT, Fischer G, et al. (March 2018). "A Molecular Portrait of De Novo Genes in Yeasts". Molecular Biology and Evolution. 35 (3): 631–645. doi:10.1093/molbev/msx315. PMC 5850487 . PMID 29220506.

[#30692195-115] Foy SG, Wilson BA, Bertram J, Cordes MH, Masel J (April 2019). "A Shift in Aggregation Avoidance Strategy Marks a Long-Term Direction to Protein Evolution". Genetics. 211 (4): 1345–1355. doi:10.1534/genetics.118.301719. PMC 6456324 . PMID 30692195.

[james21-116] 1 2 James JE, Willis SM, Nelson PG, Weibel C, Kosinski LJ, Masel J (January 2021). "Universal and taxon-specific trends in protein sequences as a function of age". eLife. 10 e57347. doi: 10.7554/eLife.57347 . PMC 7819706 . PMID 33416492.

[#30395322-117] 1 2 Zhang JY, Zhou Q (January 2019). "On the Regulatory Evolution of New Genes Throughout Their Life History". Molecular Biology and Evolution. 36 (1): 15–27. doi: 10.1093/molbev/msy206 . PMID 30395322. S2CID 53216993.

[118] Wu B, Knudson A (July 2018). "De Novo Origin of Protein-Coding Genes in Yeast". mBio. 9 (4). doi: 10.1128/mBio.01024-18 . PMC 6069113 . PMID 30065088.

[#30075701-119] 1 2 Bekpen C, Xie C, Tautz D (August 2018). "Dealing with the adaptive immune system during de novo evolution of genes from intergenic sequences". BMC Evolutionary Biology. 18 (1) 121. Bibcode:2018BMCEE..18..121B. doi: 10.1186/s12862-018-1232-z . PMC 6091031 . PMID 30075701.

[120] Pertea M, Shumate A, Pertea G, Varabyou A, Chang YC, Madugundu A, et al. (2018). "Thousands of large-scale RNA sequencing experiments yield a comprehensive new human gene list and reveal extensive transcriptional noise". bioRxiv 10.1101/332825 .

[:11-121] 1 2 Peng, Junhui; Zhao, Li (2023-06-27). "The origin and structural evolution of de novo genes in Drosophila". bioRxiv 10.1101/2023.03.13.532420 .

[122] Nielly-Thibault L, Landry CR (August 2019). "Differences Between the Raw Material and the Products of de Novo Gene Birth Can Result from Mutational Biases". Genetics. 212 (4): 1353–1366. doi:10.1534/genetics.119.302187. PMC 6707459 . PMID 31227545.

[123] Vakirlis, Nikolaos; Acar, Omer; Hsu, Brian (2020-02-07). "De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences". Nature Communications. 11 (1) 781. Bibcode:2020NatCo..11..781V. doi:10.1038/s41467-020-14500-z. PMC 7005711 . PMID 32034123.

[124] Vakirlis, Nikolaos; Fuqua, Zachary (2025-09-25). "De novo transmembrane proteins may emerge from poly-A tracts". Journal of Evolutionary Biology. 38 (9): 1272–1277. doi: 10.1093/jeb/voaf089 . PMID 40650590.

[125] Tassios, Emilios; Nikolaou, Christoforos; Vakirlis, Nikolaos (2023-03-04). "Intergenic Regions of Saccharomycotina Yeasts are Enriched in Potential to Encode Transmembrane Domains". Molecular Biology and Evolution. 40 (3) msad059. doi:10.1093/molbev/msad059. PMC 10063215 . PMID 36917489.

[126] Heames, Brennen; Buchel, Filip; Aubel, Margaux (2023-04-06). "Experimental characterization of de novo proteins and their unevolved random-sequence counterparts". Nature Ecology & Evolution. 7 (4): 570–580. Bibcode:2023NatEE...7..570H. doi:10.1038/s41559-023-02010-2. PMC 10089919 . PMID 37024625.

[127] Aubel, Margaux; Buchel, Filip; Heames, Brennen (2024-04-02). "High-throughput Selection of Human de novo-emerged sORFs with High Folding Potential". Genome Biology and Evolution. 16 (4) evae069. doi:10.1093/gbe/evae069. PMC 11024478 . PMID 38597156.

[128] Frumkin, Idan; Laub, Michael T. (2023-11-09). "Selection of a de novo gene that can promote survival of Escherichia coli by modulating protein homeostasis pathways". Nature Ecology & Evolution. 7 (12): 2067–2079. Bibcode:2023NatEE...7.2067F. doi:10.1038/s41559-023-02224-4. PMC 10697842 . PMID 37945946.

[129] Frumkin, Idan; Vassallo, Christopher N.; Chen, Yi Hua; Laub, Michael T. (2025-10-21). "Emergence of antiphage functions from random sequence libraries reveals mechanisms of gene birth". Proceedings of the National Academy of Sciences of the United States of America. 122 (42) e2513255122. Bibcode:2025PNAS..12213255F. doi:10.1073/pnas.2513255122. PMC 12557735 . PMID 41091762.

[130] Kosinski L, Aviles N, Gomez K, Masel J (June 2022). "Random peptides rich in small and disorder-promoting amino acids are less likely to be harmful". Genome Biology and Evolution. 14 (6) evac085. doi:10.1093/gbe/evac085. PMC 9210321 . PMID 35668555.

[131] Peng, Yingying; et al. (August 2025). "Gene regulatory network integration underlies de novo gene evolution and developmental system drift". Nature Ecology & Evolution. 9 (8): 1487–1498. doi:10.1038/s41559-025-02747-y. PMID 40659874.

[#28355220-132] 1 2 3 Basile W, Sachenkova O, Light S, Elofsson A (March 2017). "High GC content causes orphan proteins to be intrinsically disordered". PLOS Computational Biology. 13 (3) e1005375. Bibcode:2017PLSCB..13E5375B. doi: 10.1371/journal.pcbi.1005375 . PMC 5389847 . PMID 28355220.

[133] Bitard-Feildel T, Heberlein M, Bornberg-Bauer E, Callebaut I (December 2015). "Detection of orphan domains in Drosophila using "hydrophobic cluster analysis"". Biochimie. 119: 244–53. doi:10.1016/j.biochi.2015.02.019. PMID 25736992.

[134] Mukherjee S, Panda A, Ghosh TC (June 2015). "Elucidating evolutionary features and functional implications of orphan genes in Leishmania major". Infection, Genetics and Evolution. 32: 330–7. Bibcode:2015InfGE..32..330M. doi:10.1016/j.meegid.2015.03.031. PMID 25843649.

[#28642936-135] 1 2 3 4 5 6 7 8 9 10 Wilson BA, Foy SG, Neme R, Masel J (June 2017). "Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth". Nature Ecology & Evolution. 1 (6) 0146: 0146–146. Bibcode:2017NatEE...1..146W. doi:10.1038/s41559-017-0146. PMC 5476217 . PMID 28642936.

[136] Jeon J, Choi J, Lee GW, Park SY, Huh A, Dean RA, et al. (February 2015). "Genome-wide profiling of DNA methylation provides insights into epigenetic regulation of fungal development in a plant pathogenic fungus, Magnaporthe oryzae". Scientific Reports. 5 8567. Bibcode:2015NatSR...5.8567J. doi:10.1038/srep08567. PMC 4338423 . PMID 25708804.

[Bornberg2021-137] Bornberg-Bauer E, Hlouchova K, Lange A (June 2021). "Structure and function of naturally evolved de novo proteins". Current Opinion in Structural Biology. 68: 175–183. doi: 10.1016/j.sbi.2020.11.010 . PMID 33567396.

[Eicholt_Aubel_Berk_Bornberg‐Bauer_2022_p.-138] Eicholt, Lars A.; Aubel, Margaux; Berk, Katrin; Bornberg-Bauer, Erich; Lange, Andreas (2022-07-13). "Heterologous expression of naturally evolved putative de novo proteins with chaperones". Protein Science. 31 (8) e4371. Wiley. doi:10.1002/pro.4371. ISSN 0961-8368. PMC 9278007 . PMID 35900020.

[Pan2006-139] Pan X, Ye P, Yuan DS, Wang X, Bader JS, Boeke JD (March 2006). "A DNA integrity network in the yeast Saccharomyces cerevisiae". Cell. 124 (5): 1069–1081. doi: 10.1016/j.cell.2005.12.036 . PMID 16487579. S2CID 84338859.

[Bozorgmehr2024FourClassic-140] Hannon Bozorgmehr, Joseph (2024-02-05). "Four classic "de novo" genes all have plausible homologs and likely evolved from retro-duplicated or pseudogenic sequences". Molecular Genetics and Genomics. 299 (1) 6. doi:10.1007/s00438-023-02090-6. PMID 38315248.

[Seckin2025-141] Seçkin, Ercan; Colinet, Dominique; Sarti, Edoardo; Danchin, Etienne G. J. (2025-11-25). "Orphan and de novo Genes in Fungi and Animals: Identification, Origins and Functions". Genome Biology and Evolution. 17 (12) evaf220. doi:10.1093/gbe/evaf220. PMC 12684174 . PMID 41289037.

[Aubel2023-142] 1 2 3 Aubel, Margaux; Eicholt, Lars; Bornberg-Bauer, Erich (2023-03-29). "Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning". F1000Research. 12: 347. doi: 10.12688/f1000research.130443.1 . PMC 10126731 . PMID 37113259.

[Chen2024-143] Chen, Jianhai; Li, Qingrong; Xia, Shengqian; Arsala, Deanna; Sosa, Dylan; Wang, Dong; Long, Manyuan (June 2024). "The Rapid Evolution of De Novo Proteins in Structure and Complex". Genome Biology and Evolution. 16 (6): evae107. doi:10.1093/gbe/evae107. PMC 11149777 . PMID 38753069.{{cite journal}}: CS1 maint: article number as page number (link)

[MiddendorfIyengarEicholt2024-144] Middendorf, Lasse; Iyengar, Bharat Ravi; Eicholt, Lars A (2024-08-05). "Sequence, Structure, and Functional Space of Drosophila De Novo Proteins". Genome Biology and Evolution. 16 (8): evae176. doi:10.1093/gbe/evae176. PMC 11363682 . PMID 39212966.{{cite journal}}: CS1 maint: article number as page number (link)

[Eicholt2026-145] Eicholt, Lars A (2026). "Structure and Disorder Predictions of Microproteins: Usage, Applications, and Pitfalls". In Wenkel, Stephan (ed.). Microproteins. Methods in Molecular Biology. Vol. 2992. New York, NY: Humana. pp. 129–150. doi:10.1007/978-1-0716-5013-4_10. ISBN 978-1-0716-5012-7. PMID 41241904.

[MiddendorfEicholt2024Proteins-146] Middendorf, Lasse; Eicholt, Lars A (June 2024). "Random, de novo, and conserved proteins: How structure and disorder predictors perform differently". Proteins. 92 (6): 757–767. doi: 10.1002/prot.26652 . PMID 38226524.

[Liu2023-147] Liu, Jing (August 2023). "Do "Newly Born" orphan proteins resemble "Never Born" proteins? A study using three deep learning algorithms". Proteins. 91 (8): 1097–1115. doi: 10.1002/prot.26496 . PMID 37092778.

[148] David L, Huber W, Granovskaia M, Toedling J, Palm CJ, Bofkin L, et al. (April 2006). "A high-resolution map of transcription in the yeast genome". Proceedings of the National Academy of Sciences of the United States of America. 103 (14): 5320–5325. Bibcode:2006PNAS..103.5320D. doi: 10.1073/pnas.0601091103 . PMC 1414796 . PMID 16569694.

[149] Tisseur M, Kwapisz M, Morillon A (November 2011). "Pervasive transcription – Lessons from yeast". Biochimie. 93 (11): 1889–1896. doi:10.1016/j.biochi.2011.07.001. PMID 21771634.

[150] Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M (June 2008). "The transcriptional landscape of the yeast genome defined by RNA sequencing". Science. 320 (5881): 1344–1349. Bibcode:2008Sci...320.1344N. doi:10.1126/science.1158441. PMC 2951732 . PMID 18451266.

[151] Clark MB, Amaral PP, Schlesinger FJ, Dinger ME, Taft RJ, Rinn JL, et al. (July 2011). "The reality of pervasive transcription". PLOS Biology. 9 (7): e1000625, discussion e1001102. doi: 10.1371/journal.pbio.1000625 . PMC 3134446 . PMID 21765801.

[#25159147-152] 1 2 Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJ, Jackson SE, et al. (September 2014). "Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes". Cell Reports. 8 (5): 1365–1379. doi:10.1016/j.celrep.2014.07.045. PMC 4216110 . PMID 25159147.

[#29556078-153] Ruiz-Orera J, Verdaguer-Grau P, Villanueva-Cañas JL, Messeguer X, Albà MM (May 2018). "Translation of neutrally evolving peptides provides a basis for de novo gene evolution". Nature Ecology & Evolution. 2 (5): 890–896. Bibcode:2018NatEE...2..890R. doi:10.1038/s41559-018-0506-6. hdl: 10230/36048 . PMID 29556078. S2CID 4959952.

[154] Ruiz-Orera J, Messeguer X, Subirana JA, Alba MM (September 2014). "Long non-coding RNAs as a source of new peptides". eLife. 3 e03523. arXiv: 1405.4174 . Bibcode:2014arXiv1405.4174R. doi: 10.7554/eLife.03523 . PMC 4359382 . PMID 25233276.

[#21948395-155] 1 2 3 Wilson BA, Masel J (2011). "Putatively noncoding transcripts show extensive association with ribosomes". Genome Biology and Evolution. 3: 1245–1252. doi:10.1093/gbe/evr099. PMC 3209793 . PMID 21948395.

[#32139545-156] Chen J, Brunner AD, Cogan JZ, Nuñez JK, Fields AP, Adamson B, et al. (March 2020). "Pervasive functional translation of noncanonical human open reading frames". Science. 367 (6482): 1140–1146. Bibcode:2020Sci...367.1140C. doi:10.1126/science.aay0262. PMC 7289059 . PMID 32139545.

[#23593031-157] 1 2 Silveira AB, Trontin C, Cortijo S, Barau J, Del Bem LE, Loudet O, et al. (April 2013). "Extensive natural epigenetic variation at a de novo originated gene". PLOS Genetics. 9 (4) e1003437. doi: 10.1371/journal.pgen.1003437 . PMC 3623765 . PMID 23593031.

[158] Kimmins S, Sassone-Corsi P (March 2005). "Chromatin remodelling and epigenetic features of germ cells". Nature. 434 (7033): 583–9. Bibcode:2005Natur.434..583K. doi:10.1038/nature03368. PMID 15800613. S2CID 4373304.

[:2-159] 1 2 Papadopoulos C, Callebaut I, Gelly JC, Hatin I, Namy O, Renard M, et al. (November 2021). "Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution". Genome Research. 31 (12): 2303–2315. doi:10.1101/gr.275638.121. PMC 8647833 . PMID 34810219.

[:8-160] 1 2 Vakirlis, Nikolaos; Vance, Zoe; Duggan, Kate M.; McLysaght, Aoife (2022-12-20). "De novo birth of functional microproteins in the human lineage". Cell Reports. 41 (12) 111808. doi: 10.1016/j.celrep.2022.111808 . ISSN 2211-1247. PMC 10073203 . PMID 36543139. S2CID 254966620.

[161] Dinger ME, Pang KC, Mercer TR, Mattick JS (November 2008). "Differentiating protein-coding and noncoding RNA: challenges and ambiguities". PLOS Computational Biology. 4 (11) e1000176. Bibcode:2008PLSCB...4E0176D. doi: 10.1371/journal.pcbi.1000176 . PMC 2518207 . PMID 19043537.

[162] Iyengar, Bharat Ravi; Bornberg-Bauer, Erich (2023-04-04). "Neutral Models of De Novo Gene Emergence Suggest that Gene Evolution has a Preferred Trajectory". Molecular Biology and Evolution. 40 (4): msad079. doi: 10.1093/molbev/msad079 . PMC 10118301 . PMID 37011142.{{cite journal}}: CS1 maint: article number as page number (link)

[163] Iyengar, Bharat Ravi; Grandchamp, Anna; Bornberg-Bauer, Erich (2024-07-23). "How antisense transcripts can evolve to encode novel proteins". Nature Communications. 15 (1) 6187. Bibcode:2024NatCo..15.6187I. doi: 10.1038/s41467-024-50550-3 . PMC 11266595 . PMID 39043684.

[164] Stewart NB, Rogers RL (September 2019). "Chromosomal rearrangements as a source of new gene formation in Drosophila yakuba". PLOS Genetics. 15 (9) e1008314. doi: 10.1371/journal.pgen.1008314 . PMC 6776367 . PMID 31545792.

[165] Swanson WJ, Vacquier VD (February 2002). "The rapid evolution of reproductive proteins". Nature Reviews Genetics. 3 (2): 137–44. doi:10.1038/nrg733. PMID 11836507. S2CID 25696990.

[166] Bustamante CD, Fledel-Alon A, Williamson S, Nielsen R, Hubisz MT, Glanowski S, Tanenbaum DM, White TJ, Sninsky JJ, Hernandez RD, Civello D, Adams MD, Cargill M, Clark AG (October 2005). "Natural selection on protein-coding genes in the human genome". Nature. 437 (7062): 1153–7. Bibcode:2005Natur.437.1153B. doi:10.1038/nature04240. PMID 16237444. S2CID 4423768.

[167] Clark NL, Aagaard JE, Swanson WJ (January 2006). "Evolution of reproductive proteins from animals and plants". Reproduction. 131 (1): 11–22. doi: 10.1530/rep.1.00357 . PMID 16388004.

[168] Rivard EL, Ludwig AG, Patel PH, Grandchamp A, Arnold SE, Berger A, et al. (September 2021). "A putative de novo evolved gene required for spermatid chromatin condensation in Drosophila melanogaster". PLOS Genetics. 17 (9) e1009787. doi: 10.1371/journal.pgen.1009787 . PMC 8445463 . PMID 34478447.

[169] Cridland JM, Majane AC, Zhao L, Begun DJ (January 2022). "Population biology of accessory gland-expressed de novo genes in Drosophila melanogaster". Genetics. 220 (1) iyab207. doi:10.1093/genetics/iyab207. PMC 8733444 . PMID 34791207.

[170] Witt, Evan; Benjamin, Sigi; Svetec, Nicolas; Zhao, Li (2019-08-16). Landry, Christian R; Wittkopp, Patricia J; White-Cooper, Helen (eds.). "Testis single-cell RNA-seq reveals the dynamics of de novo gene transcription and germline mutational bias in Drosophila". eLife. 8 e47138. doi: 10.7554/eLife.47138 . ISSN 2050-084X. PMC 6697446 . PMID 31418408. S2CID 198249413.

[#28854603-171] 1 2 3 Luis Villanueva-Cañas J, Ruiz-Orera J, Agea MI, Gallo M, Andreu D, Albà MM (July 2017). "New Genes and Functional Innovation in Mammals". Genome Biology and Evolution. 9 (7): 1886–1900. doi:10.1093/gbe/evx136. PMC 5554394 . PMID 28854603.

[172] Schmidt EE (July 1996). "Transcriptional promiscuity in testes". Current Biology. 6 (7): 768–9. Bibcode:1996CBio....6..768S. doi: 10.1016/S0960-9822(02)00589-4 . PMID 8805310. S2CID 14318566.

[173] White-Cooper H, Davidson I (July 2011). "Unique aspects of transcription regulation in male germ cells". Cold Spring Harbor Perspectives in Biology. 3 (7) a002626. doi:10.1101/cshperspect.a002626. PMC 3119912 . PMID 21555408.

[174] Kleene KC (August 2001). "A possible meiotic function of the peculiar patterns of gene expression in mammalian spermatogenic cells". Mechanisms of Development. 106 (1–2): 3–23. doi: 10.1016/S0925-4773(01)00413-0 . PMID 11472831. S2CID 949694.

[#21199946-175] 1 2 3 4 Rajon E, Masel J (January 2011). "Evolution of molecular error rates and the consequences for evolvability". Proceedings of the National Academy of Sciences of the United States of America. 108 (3): 1082–7. Bibcode:2011PNAS..108.1082R. doi: 10.1073/pnas.1012918108 . PMC 3024668 . PMID 21199946.

[176] Masel J (March 2006). "Cryptic genetic variation is enriched for potential adaptations". Genetics. 172 (3): 1985–1991. Bibcode:2006Genet.172.1985M. doi:10.1534/genetics.105.051649. PMC 1456269 . PMID 16387877.

[#287193-177] Casola C (2018). "From de novo to "de nono": most novel protein coding genes identified with phylostratigraphy represent old genes or recent duplicates". bioRxiv 10.1101/287193 .

[178] Willis S, Masel J (September 2018). "Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes". Genetics. 210 (1): 303–313. doi:10.1534/genetics.118.301249. PMC 6116962 . PMID 30026186.

[#24056411-179] Abrusán G (December 2013). "Integration of new genes into cellular networks, and their structural maturation". Genetics. 195 (4): 1407–1417. doi:10.1534/genetics.113.152256. PMC 3832282 . PMID 24056411.

[:9-180] 1 2 Vakirlis N, Acar O, Hsu B, Castilho Coelho N, Van Oss SB, Wacholder A, et al. (February 2020). "De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences". Nature Communications. 11 (1) 781. Bibcode:2020NatCo..11..781V. doi:10.1038/s41467-020-14500-z. PMC 7005711 . PMID 32034123.

[181] Giacomelli MG, Hancock AS, Masel J (February 2007). "The conversion of 3' UTRs into coding regions". Molecular Biology and Evolution. 24 (2): 457–464. doi:10.1093/molbev/msl172. PMC 1808353 . PMID 17099057.

[#26517896-182] 1 2 3 Bornberg-Bauer E, Schmitz J, Heberlein M (October 2015). "Emergence of de novo proteins from 'dark genomic matter' by 'grow slow and moult'". Biochemical Society Transactions. 43 (5): 867–873. doi:10.1042/BST20150089. PMID 26517896.

[183] Wilder JA, Hewett EK, Gansner ME (December 2009). "Molecular evolution of GYPC: evidence for recent structural innovation and positive selection in humans". Molecular Biology and Evolution. 26 (12): 2679–2687. doi:10.1093/molbev/msp183. PMC 2775107 . PMID 19679754.

[184] Vakhrusheva AA, Kazanov MD, Mironov AA, Bazykin GA (February 2011). "Evolution of prokaryotic genes by shift of stop codons". Journal of Molecular Evolution. 72 (2): 138–146. Bibcode:2011JMolE..72..138V. doi:10.1007/s00239-010-9408-1. PMID 21082168. S2CID 812377.

[185] Andreatta ME, Levine JA, Foy SG, Guzman LD, Kosinski LJ, Cordes MH, Masel J (May 2015). "The Recent De Novo Origin of Protein C-Termini". Genome Biology and Evolution. 7 (6): 1686–1701. doi:10.1093/gbe/evv098. PMC 4494051 . PMID 26002864.

[186] Kleppe AS, Bornberg-Bauer E (November 2018). "Robustness by intrinsically disordered C-termini and translational readthrough". Nucleic Acids Research. 46 (19): 10184–10194. doi:10.1093/nar/gky778. PMC 6365619 . PMID 30247639.

[187] Klasberg S, Bitard-Feildel T, Callebaut I, Bornberg-Bauer E (July 2018). "Origins and structural properties of novel and de novo protein domains during insect evolution". The FEBS Journal. 285 (14): 2605–2625. doi: 10.1111/febs.14504 . PMID 29802682.

[188] Deng C, Cheng CH, Ye H, He X, Chen L (December 2010). "Evolution of an antifreeze protein by neofunctionalization under escape from adaptive conflict". Proceedings of the National Academy of Sciences of the United States of America. 107 (50): 21593–21598. Bibcode:2010PNAS..10721593D. doi: 10.1073/pnas.1007883107 . PMC 3003108 . PMID 21115821.

[189] Long M, VanKuren NW, Chen S, Vibranovski MD (2013). "New gene evolution: little did we know". Annual Review of Genetics. 47: 307–333. doi:10.1146/annurev-genet-111212-133301. PMC 4281893 . PMID 24050177.

[190] Chen, Jian-Hai; Landback, Patrick; Arsala, Deanna; Guzzetta, Alexander; Xia, Shengqian; Atlas, Jared; Sosa, Dylan; Zhang, Yong E.; Cheng, Jingqiu; Shen, Bairong; Long, Manyuan (2025-03-01). "Evolutionarily new genes in humans with disease phenotypes reveal functional enrichment patterns shaped by adaptive innovation and sexual selection". Genome Research. 35 (3): 379–392. doi:10.1101/gr.279498.124. ISSN 1088-9051. PMC 11960464 . PMID 39952680.

[191] Lee, UnJin; Mozeika, Shawn M.; Zhao, Li (2024-06-04). "A Synergistic, Cultivator Model of De Novo Gene Origination". Genome Biology and Evolution. 16 (6) evae103. doi:10.1093/gbe/evae103. PMC 11152449 . PMID 38748819.

[192] Chen S, Krinsky BH, Long M (September 2013). "New genes as drivers of phenotypic evolution". Nature Reviews Genetics. 14 (9): 645–60. doi:10.1038/nrg3521. PMC 4236023 . PMID 23949544.

[193] Suenaga Y, Islam SM, Alagu J, Kaneko Y, Kato M, Tanaka Y, et al. (January 2014). "NCYM, a Cis-antisense gene of MYCN, encodes a de novo evolved protein that inhibits GSK3β resulting in the stabilization of MYCN in human neuroblastomas". PLOS Genetics. 10 (1) e1003996. doi: 10.1371/journal.pgen.1003996 . PMC 3879166 . PMID 24391509.

[194] Lin B, White JT, Ferguson C, Bumgarner R, Friedman C, Trask B, et al. (February 2000). "PART-1: a novel human prostate-specific, androgen-regulated gene that maps to chromosome 5q12". Cancer Research. 60 (4): 858–63. PMID 10706094.

[195] Samusik N, Krukovskaya L, Meln I, Shilov E, Kozlov AP (2013). "PBOV1 is a human de novo gene with tumor-specific expression that is associated with a positive clinical outcome of cancer". PLOS ONE. 8 (2) e56162. Bibcode:2013PLoSO...856162S. doi: 10.1371/journal.pone.0056162 . PMC 3572036 . PMID 23418531.

[196] Guerzoni D, McLysaght A (April 2016). "De Novo Genes Arise at a Slow but Steady Rate along the Primate Lineage and Have Been Subject to Incomplete Lineage Sorting". Genome Biology and Evolution. 8 (4): 1222–32. doi:10.1093/gbe/evw074. PMC 4860702 . PMID 27056411.

[197] Pekarsky Y, Rynditch A, Wieser R, Fonatsch C, Gardiner K (September 1997). "Activation of a novel gene in 3q21 and identification of intergenic fusion transcripts with ecotropic viral insertion site I in leukemia". Cancer Research. 57 (18): 3914–9. PMID 9307271.

[198] Papamichos SI, Margaritis D, Kotsianidis I (2015). "Adaptive Evolution Coupled with Retrotransposon Exaptation Allowed for the Generation of a Human-Protein-Specific Coding Gene That Promotes Cancer Cell Proliferation and Metastasis in Both Haematological Malignancies and Solid Tumours: The Extraordinary Case of MYEOV Gene". Scientifica. 2015 984706. doi: 10.1155/2015/984706 . PMC 4629056 . PMID 26568894.

[#27437030-199] 1 2 Kozlov AP (2016). "Expression of evolutionarily novel genes in tumors". Infectious Agents and Cancer. 11 34. doi: 10.1186/s13027-016-0077-6 . PMC 4949931 . PMID 27437030.

[200] Li CY, Zhang Y, Wang Z, Zhang Y, Cao C, Zhang PW, et al. (March 2010). "A human-specific de novo protein-coding gene associated with human brain functions". PLOS Computational Biology. 6 (3) e1000734. Bibcode:2010PLSCB...6E0734L. doi: 10.1371/journal.pcbi.1000734 . PMC 2845654 . PMID 20376170.

[201] Hannon Bozorgmehr J (December 2024). "The De Novo Emergence of Two Brain Genes in the Human Lineage Appears to be Unsupported". Journal of Molecular Evolution. 93 (1): 3–10. doi:10.1007/s00239-024-10227-3. PMID 39725692.

[202] Leushkin E, Kaessmann H (October 2024). "Identification of old coding regions disproves the hominoid de novo status of genes". Nature Ecology & Evolution. 8 (10): 1826–1830. Bibcode:2024NatEE...8.1826L. doi:10.1038/s41559-024-02513-6. PMID 39187607.

[Two_rapidly_evolving_genes_contribu-203] Reinhart JA, Jones CD (December 2013). "Two rapidly evolving genes contribute to male fitness in Drosophila". Journal of Molecular Evolution. 77 (5): 246–259. Bibcode:2013JMolE..77..246R. doi:10.1007/s00239-013-9594-8. PMC 3880551 . PMID 24221639.

[#22028629-204] 1 2 Zhang YE, Landback P, Vibranovski MD, Long M (October 2011). "Accelerated recruitment of new brain development genes into the human genome". PLOS Biology. 9 (10) e1001179. doi: 10.1371/journal.pbio.1001179 . PMC 3196496 . PMID 22028629.

[205] Wang J, Xie G, Singh M, Ghanbarian AT, Raskó T, Szvetnik A, et al. (December 2014). "Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells" (PDF). Nature. 516 (7531): 405–9. Bibcode:2014Natur.516..405W. doi:10.1038/nature13804. PMID 25317556. S2CID 205240839.

[206] Dolstra H, Fredrix H, Maas F, Coulie PG, Brasseur F, Mensink E, et al. (January 1999). "A human minor histocompatibility antigen specific for B cell acute lymphoblastic leukemia". The Journal of Experimental Medicine. 189 (2): 301–8. doi:10.1084/jem.189.2.301. PMC 2192993 . PMID 9892612.

[207] Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, et al. (January 2009). "InterPro: the integrative protein signature database". Nucleic Acids Research. 37 (Database issue): D211-5. doi:10.1093/nar/gkn785. PMC 2686546 . PMID 18940856.

[#23185269-208] Murphy DN, McLysaght A (2012). "De novo origin of protein-coding genes in murine rodents". PLOS ONE. 7 (11) e48650. Bibcode:2012PLoSO...748650M. doi: 10.1371/journal.pone.0048650 . PMC 3504067 . PMID 23185269.

[209] Zhang L, Ren Y, Yang T, Li G, Chen J, Gschwend AR, et al. (April 2019). "Rapid evolution of protein diversity by de novo origination in Oryza". Nature Ecology & Evolution. 3 (4): 679–690. Bibcode:2019NatEE...3..679Z. doi:10.1038/s41559-019-0822-5. PMID 30858588. S2CID 73728579.

[#31088903-210] Prabh N, Rödelsperger C (July 2019). "De Novo, Divergence, and Mixed Origin Contribute to the Emergence of Orphan Genes in Pristionchus Nematodes". G3. 9 (7): 2277–2286. doi:10.1534/g3.119.400326. PMC 6643871 . PMID 31088903.

[1]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

[96]

[97]

[98]

[99]

[100]

[101]