Pseudogene

Last updated
Pseudogene defects.png

Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by gene duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA.

Contents

Most non-bacterial genomes contain many pseudogenes, often as many as functional genes. This is not surprising, since various biological processes are expected to accidentally create pseudogenes, and there are no specialized mechanisms to remove them from genomes. Eventually pseudogenes may be deleted from their genomes by chance of DNA replication or DNA repair errors, or they may accumulate so many mutational changes that they are no longer recognizable as former genes. Analysis of these degeneration events helps clarify the effects of non-selective processes in genomes.

Pseudogene sequences may be transcribed into RNA at low levels, due to promoter elements inherited from the ancestral gene or arising by new mutations. Although most of these transcripts will have no more functional significance than chance transcripts from other parts of the genome, some have given rise to beneficial regulatory RNAs and new proteins.

Properties

Pseudogenes are usually characterized by a combination of similarity or homology to a known gene, together with a loss of some functionality. That is, although every pseudogene has a DNA sequence that is similar to some functional gene, they are usually unable to produce functional final protein products. [1] Pseudogenes are sometimes difficult to identify and characterize in genomes, because the two requirements of similarity and loss of functionality are usually implied through sequence alignments rather than biologically proven.

  1. Homology is implied by sequence similarity between the DNA sequences of the pseudogene and a known gene. After aligning the two sequences, the percentage of identical base pairs is computed. A high sequence identity means that it is highly likely that these two sequences diverged from a common ancestral sequence (are homologous), and highly unlikely that these two sequences have evolved independently (see Convergent evolution).
  2. Nonfunctionality can manifest itself in many ways. Normally, a gene must go through several steps to a fully functional protein: Transcription, pre-mRNA processing, translation, and protein folding are all required parts of this process. If any of these steps fails, then the sequence may be considered nonfunctional. In high-throughput pseudogene identification, the most commonly identified disablements are premature stop codons and frameshifts, which almost universally prevent the translation of a functional protein product.

Pseudogenes for RNA genes are usually more difficult to discover as they do not need to be translated and thus do not have "reading frames". A number of rRNA pseudogenes have been identified on the basis of changes in rDNA array ends. [2]

Pseudogenes can complicate molecular genetic studies. For example, amplification of a gene by PCR may simultaneously amplify a pseudogene that shares similar sequences. This is known as PCR bias or amplification bias. Similarly, pseudogenes are sometimes annotated as genes in genome sequences.

Processed pseudogenes often pose a problem for gene prediction programs, often being misidentified as real genes or exons. It has been proposed that the identification of processed pseudogenes can help improve the accuracy of gene prediction methods. [3]

In 2014, 140 human pseudogenes have been shown to be translated. [4] However, the function, if any, of the protein products is unknown.

Types and origin

Mechanism of classical and processed pseudogene formation Pseudo gene schematic.png
Mechanism of classical and processed pseudogene formation

There are four main types of pseudogenes, all with distinct mechanisms of origin and characteristic features. The classifications of pseudogenes are as follows:

Processed

Processed pseudogene production Pseudogene2jpg.jpg
Processed pseudogene production

In higher eukaryotes, particularly mammals, retrotransposition is a fairly common event that has had a huge impact on the composition of the genome. For example, somewhere between 30 and 44% of the human genome consists of repetitive elements such as SINEs and LINEs (see retrotransposons). [7] [8] In the process of retrotransposition, a portion of the mRNA or hnRNA transcript of a gene is spontaneously reverse transcribed back into DNA and inserted into chromosomal DNA. Although retrotransposons usually create copies of themselves, it has been shown in an in vitro system that they can create retrotransposed copies of random genes, too. [9] Once these pseudogenes are inserted back into the genome, they usually contain a poly-A tail, and usually have had their introns spliced out; these are both hallmark features of cDNAs. However, because they are derived from an RNA product, processed pseudogenes also lack the upstream promoters of normal genes; thus, they are considered "dead on arrival", becoming non-functional pseudogenes immediately upon the retrotransposition event. [10] However, these insertions occasionally contribute exons to existing genes, usually via alternatively spliced transcripts. [11] A further characteristic of processed pseudogenes is common truncation of the 5' end relative to the parent sequence, which is a result of the relatively non-processive retrotransposition mechanism that creates processed pseudogenes. [12] Processed pseudogenes are continually being created in primates. [13] Human populations, for example, have distinct sets of processed pseudogenes across its individuals. [14]

It has been shown that processed pseudogenes accumulate mutations faster than non-processed pseudogenes. [15]

Non-processed (duplicated)

One way a pseudogene may arise Pseudogene3jpg.jpg
One way a pseudogene may arise

Gene duplication is another common and important process in the evolution of genomes. A copy of a functional gene may arise as a result of a gene duplication event caused by homologous recombination at, for example, repetitive SINE sequences on misaligned chromosomes and subsequently acquire mutations that cause the copy to lose the original gene's function. Duplicated pseudogenes usually have all the same characteristics as genes, including an intact exon-intron structure and regulatory sequences. The loss of a duplicated gene's functionality usually has little effect on an organism's fitness, since an intact functional copy still exists. According to some evolutionary models, shared duplicated pseudogenes indicate the evolutionary relatedness of humans and the other primates. [16] If pseudogenization is due to gene duplication, it usually occurs in the first few million years after the gene duplication, provided the gene has not been subjected to any selection pressure. [17] Gene duplication generates functional redundancy and it is not normally advantageous to carry two identical genes. Mutations that disrupt either the structure or the function of either of the two genes are not deleterious and will not be removed through the selection process. As a result, the gene that has been mutated gradually becomes a pseudogene and will be either unexpressed or functionless. This kind of evolutionary fate is shown by population genetic modeling [18] [19] and also by genome analysis. [17] [20] According to evolutionary context, these pseudogenes will either be deleted or become so distinct from the parental genes so that they will no longer be identifiable. Relatively young pseudogenes can be recognized due to their sequence similarity. [21]

Unitary pseudogenes

2 ways a pseudogene may be produced Pseudogene4jpg.jpg
2 ways a pseudogene may be produced

Various mutations (such as indels and nonsense mutations) can prevent a gene from being normally transcribed or translated, and thus the gene may become less- or non-functional or "deactivated". These are the same mechanisms by which non-processed genes become pseudogenes, but the difference in this case is that the gene was not duplicated before pseudogenization. Normally, such a pseudogene would be unlikely to become fixed in a population, but various population effects, such as genetic drift, a population bottleneck, or, in some cases, natural selection, can lead to fixation. The classic example of a unitary pseudogene is the gene that presumably coded the enzyme L-gulono-γ-lactone oxidase (GULO) in primates. In all mammals studied besides primates (except guinea pigs), GULO aids in the biosynthesis of ascorbic acid (vitamin C), but it exists as a disabled gene (GULOP) in humans and other primates. [22] [23] Another more recent example of a disabled gene links the deactivation of the caspase 12 gene (through a nonsense mutation) to positive selection in humans. [24]

Polymorphic pseudogenes

Some pseudogenes are still intact in some individuals but inactivated (mutated) in others. Abascal et al. have called these pseudogenes "polymorphic". [25] They are often homozygous for loss-of-function (LoF) variants, that is, in many people both copies are inactive. Polymorphic pseudogenes often represent non-essential (or dispensable) genes, as opposed to essential genes, and their frequent mutations are actually a criterion to establish them as non-essential. [26] Lopes-Marques et al. define polymorphic pseudogenes as genes that carry a LoF allele with a frequency higher than 1% (in global or certain sub-populations) and without overt pathogenic consequences when homozygous. [27]

Examples of pseudogene function

While the vast majority of pseudogenes have lost their function, some cases have emerged in which a pseudogene either re-gained its original or a similar function or evolved a new function. In the human genome, a number of examples have been identified that were originally classified as pseudogenes but later discovered to have a functional, although not necessarily protein-coding, role. [28] [29]

Examples include the following:

Protein-coding: "pseudo-pseudogenes"

Drosophila melanogaster Drosophila melanogaster - side (aka).jpg
Drosophila melanogaster

The rapid proliferation of DNA sequencing technologies has led to the identification of many apparent pseudogenes using gene prediction techniques. Pseudogenes are often identified by the appearance of a premature stop codon in a predicted mRNA sequence, which would, in theory, prevent synthesis (translation) of the normal protein product of the original gene. There have been some reports of translational readthrough of such premature stop codons in mammals. As alluded to in the figure above, a small amount of the protein product of such readthrough may still be recognizable and function at some level. If so, the pseudogene can be subject to natural selection. That appears to have happened during the evolution of Drosophila species.

In 2016 it was reported that four predicted pseudogenes in multiple Drosophila species actually encode proteins with biologically important functions, [30] "suggesting that such 'pseudo-pseudogenes' could represent a widespread phenomenon". For example, the functional protein (a glutamate olfactory receptor) from gene Ir75a is found only in neurons. This finding of tissue-specific biologically-functional genes that could have been classified as pseudogenes by in silico analysis complicates the analysis of sequence data. [30] Another Drosophilia pseudo-pseudogene is jingwei, [31] [32] which encodes a functional alcohol dehydrogenase enzyme in vivo. [33]

As of 2012, it appeared that there are approximately 12,000–14,000 pseudogenes in the human genome. [34] A 2016 proteogenomics analysis using mass spectrometry of peptides identified at least 19,262 human proteins produced from 16,271 genes or clusters of genes, with 8 new protein-coding genes identified that were previously considered pseudogenes. [35] An earlier analysis found that human PGAM4 (phosphoglycerate mutase), [36] previously thought to be a pseudogene, is not only functional, but also causes infertility if mutated. [37] [38]

A number of pseudo-pseudogenes were also found in prokaryotes, where some stop codon substitutions in essential genes appear to be retained, even positively selected for. [39] [40]

Non-protein-coding

siRNAs. Some endogenous siRNAs appear to be derived from pseudogenes, and thus some pseudogenes play a role in regulating protein-coding transcripts, as reviewed. [41] One of the many examples is psiPPM1K. Processing of RNAs transcribed from psiPPM1K yield siRNAs that can act to suppress the most common type of liver cancer, hepatocellular carcinoma. [42] This and much other research has led to considerable excitement about the possibility of targeting pseudogenes with/as therapeutic agents [43]

piRNAs. Some piRNAs are derived from pseudogenes located in piRNA clusters. [44] Those piRNAs regulate genes via the piRNA pathway in mammalian testes and are crucial for limiting transposable element damage to the genome. [45]

BRAF pseudogene acts as a ceRNA BrafDFPjpg.jpg
BRAF pseudogene acts as a ceRNA

microRNAs. There are many reports of pseudogene transcripts acting as microRNA decoys. Perhaps the earliest definitive example of such a pseudogene involved in cancer is the pseudogene of BRAF. The BRAF gene is a proto-oncogene that, when mutated, is associated with many cancers. Normally, the amount of BRAF protein is kept under control in cells through the action of miRNA. In normal situations, the amount of RNA from BRAF and the pseudogene BRAFP1 compete for miRNA, but the balance of the 2 RNAs is such that cells grow normally. However, when BRAFP1 RNA expression is increased (either experimentally or by natural mutations), less miRNA is available to control the expression of BRAF, and the increased amount of BRAF protein causes cancer. [46] This sort of competition for regulatory elements by RNAs that are endogenous to the genome has given rise to the term ceRNA.

PTEN. The PTEN gene is a known tumor suppressor gene. The PTEN pseudogene, PTENP1 is a processed pseudogene that is very similar in its genetic sequence to the wild-type gene. However, PTENP1 has a missense mutation which eliminates the codon for the initiating methionine and thus prevents translation of the normal PTEN protein. [47] In spite of that, PTENP1 appears to play a role in oncogenesis. The 3' UTR of PTENP1 mRNA functions as a decoy of PTEN mRNA by targeting micro RNAs due to its similarity to the PTEN gene, and overexpression of the 3' UTR resulted in an increase of PTEN protein level. [48] That is, overexpression of the PTENP1 3' UTR leads to increased regulation and suppression of cancerous tumors. The biology of this system is basically the inverse of the BRAF system described above.

Potogenes. Pseudogenes can, over evolutionary time scales, participate in gene conversion and other mutational events that may give rise to new or newly functional genes. This has led to the concept that pseudogenes could be viewed as potogenes: potential genes for evolutionary diversification. [49]

Bacterial pseudogenes

Pseudogenes are found in bacteria. [50] Most are found in bacteria that are not free-living; that is, they are either symbionts or obligate intracellular parasites. Thus, they do not require many genes that are needed by free-living bacteria, such as gene associated with metabolism and DNA repair. However, there is not an order to which functional genes are lost first. For example, the oldest pseudogenes in Mycobacterium leprae are in RNA polymerases and the biosynthesis of secondary metabolites while the oldest ones in Shigella flexneri and Shigella typhi are in DNA replication, recombination, and repair. [51]

Since most bacteria that carry pseudogenes are either symbionts or obligate intracellular parasites, genome size eventually reduces. An extreme example is the genome of Mycobacterium leprae , an obligate parasite and the causative agent of leprosy. It has been reported to have 1,133 pseudogenes which give rise to approximately 50% of its transcriptome. [51] The effect of pseudogenes and genome reduction can be further seen when compared to Mycobacterium marinum , a pathogen from the same family. Mycobacteirum marinum has a larger genome compared to Mycobacterium leprae because it can survive outside the host; therefore, the genome must contain the genes needed to do so. [52]

Although genome reduction focuses on what genes are not needed by getting rid of pseudogenes, selective pressures from the host can sway what is kept. In the case of a symbiont from the Verrucomicrobiota phylum, there are seven additional copies of the gene coding the mandelalide pathway. [53] The host, species from Lissoclinum, use mandelalides as part of its defense mechanism. [53]

The relationship between epistasis and the domino theory of gene loss was observed in Buchnera aphidicola. The domino theory suggests that if one gene of a cellular process becomes inactivated, then selection in other genes involved relaxes, leading to gene loss. [51] When comparing Buchnera aphidicola and Escherichia coli, it was found that positive epistasis furthers gene loss while negative epistasis hinders it.

The proS loci in Mycobacterium leprae and M. tuberculosis, showing three pseudogenes (indicated by crosses) in M. leprae that still have functional homologs in M. tuberculosis. Homologous genes are indicated by identical colors and thin blue vertical bars. Modified after Cole et al. 2001. Pseudogenes Mycobacteria.png
The proS loci in Mycobacterium leprae and M. tuberculosis, showing three pseudogenes (indicated by crosses) in M. leprae that still have functional homologs in M. tuberculosis. Homologous genes are indicated by identical colors and thin blue vertical bars. Modified after Cole et al. 2001.

See also

Related Research Articles

An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.

<span class="mw-page-title-main">Mutation</span> Alteration in the nucleotide sequence of a genome

In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from substitution,insertion or deletion of segments of DNA due to mobile genetic elements.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 24 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Regions that are completely nonfunctional are called junk DNA.

Molecular evolution describes how inherited DNA and/or RNA change over evolutionary time, and the consequences of this for proteins and other components of cells and organisms. Molecular evolution is the basis of phylogenetic approaches to describing the tree of life. Molecular evolution overlaps with population genetics, especially on shorter timescales. Topics in molecular evolution include the origins of new genes, the genetic nature of complex traits, the genetic basis of adaptation and speciation, the evolution of development, and patterns and processes underlying genomic changes during evolution.

The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

Gene duplication is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. Gene duplications can arise as products of several types of errors in DNA replication and repair machinery as well as through fortuitous capture by selfish genetic elements. Common sources of gene duplications include ectopic recombination, retrotransposition event, aneuploidy, polyploidy, and replication slippage.

<span class="mw-page-title-main">Gene family</span> Set of several similar genes

A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on different chromosomes, called the α-globin and β-globin loci. These two gene clusters are thought to have arisen as a result of a precursor gene being duplicated approximately 500 million years ago.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

<span class="mw-page-title-main">Point mutation</span> Replacement, insertion, or deletion of a single DNA or RNA nucleotide

A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequences that are moderately predictable based upon the specifics of the mutation. These consequences can range from no effect to deleterious effects, with regard to protein production, composition, and function.

<span class="mw-page-title-main">Silent mutation</span> DNA mutation with no observable effect on an organisms phenotype

Silent mutations, also called synonymous or samesense mutations, are mutations in DNA that do not have an observable effect on the organism's phenotype. The phrase silent mutation is often used interchangeably with the phrase synonymous mutation; however, synonymous mutations are not always silent, nor vice versa. Synonymous mutations can affect transcription, splicing, mRNA transport, and translation, any of which could alter phenotype, rendering the synonymous mutation non-silent. The substrate specificity of the tRNA to the rare codon can affect the timing of translation, and in turn the co-translational folding of the protein. This is reflected in the codon usage bias that is observed in many species. Mutations that cause the altered codon to produce an amino acid with similar functionality are often classified as silent; if the properties of the amino acid are conserved, this mutation does not usually significantly affect protein function.

An intergenic region is a stretch of DNA sequences located between genes. Intergenic regions may contain functional elements and junk DNA.

<span class="mw-page-title-main">Gene</span> Sequence of DNA that determines traits in an organism

In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes. During gene expression, DNA is first copied into RNA. RNA can be directly functional or be the intermediate template for the synthesis of a protein.

Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron structure. There are different mechanisms through which exon shuffling occurs: transposon mediated exon shuffling, crossover during sexual recombination of parental genomes and illegitimate recombination.

Neutral mutations are changes in DNA sequence that are neither beneficial nor detrimental to the ability of an organism to survive and reproduce. In population genetics, mutations in which natural selection does not affect the spread of the mutation in a species are termed neutral mutations. Neutral mutations that are inheritable and not linked to any genes under selection will be lost or will replace all other alleles of the gene. That loss or fixation of the gene proceeds based on random sampling known as genetic drift. A neutral mutation that is in linkage disequilibrium with other alleles that are under selection may proceed to loss or fixation via genetic hitchhiking and/or background selection.

<span class="mw-page-title-main">Untranslated region</span> Non-coding regions on either end of mRNA

In molecular genetics, an untranslated region refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR, or if it is found on the 3' side, it is called the 3' UTR. mRNA is RNA that carries information from DNA to the ribosome, the site of protein synthesis (translation) within a cell. The mRNA is initially transcribed from the corresponding DNA sequence and then translated into protein. However, several regions of the mRNA are usually not translated into protein, including the 5' and 3' UTRs.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.

<span class="mw-page-title-main">Genome evolution</span> Process by which a genome changes in structure or size over time

Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Genome evolution is a constantly changing and evolving field due to the steadily growing number of sequenced genomes, both prokaryotic and eukaryotic, available to the scientific community and the public at large.

<span class="mw-page-title-main">Short interspersed nuclear element</span>

Short interspersed nuclear elements (SINEs) are non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates. SINEs compose about 13% of the mammalian genome.

References

  1. Mighell AJ, Smith NR, Robinson PA, Markham AF (February 2000). "Vertebrate pseudogenes". FEBS Letters. 468 (2–3): 109–114. doi: 10.1016/S0014-5793(00)01199-6 . PMID   10692568. S2CID   42204036.
  2. Robicheau BM, Susko E, Harrigan AM, Snyder M (February 2017). "Ribosomal RNA Genes Contribute to the Formation of Pseudogenes and Junk DNA in the Human Genome". Genome Biology and Evolution. 9 (2): 380–397. doi:10.1093/gbe/evw307. PMC   5381670 . PMID   28204512.
  3. van Baren MJ, Brent MR (May 2006). "Iterative gene prediction and pseudogene removal improves genome annotation". Genome Research. 16 (5): 678–685. doi:10.1101/gr.4766206. PMC   1457044 . PMID   16651666.
  4. Kim MS, Pinto SM, Getnet D, Nirujogi RS, Manda SS, Chaerkady R, et al. (May 2014). "A draft map of the human proteome". Nature. 509 (7502): 575–581. Bibcode:2014Natur.509..575K. doi:10.1038/nature13302. PMC   4403737 . PMID   24870542.
  5. Max EE (1986). "Plagiarized Errors and Molecular Genetics". Creation Evolution Journal. 6 (3): 34–46.
  6. Chandrasekaran C, Betrán E (2008). "Origins of new genes and pseudogenes". Nature Education. 1 (1): 181.
  7. Jurka J (December 2004). "Evolutionary impact of human Alu repetitive elements". Current Opinion in Genetics & Development. 14 (6): 603–608. doi:10.1016/j.gde.2004.08.008. PMID   15531153.
  8. Dewannieux M, Heidmann T (2005). "LINEs, SINEs and processed pseudogenes: parasitic strategies for genome modeling". Cytogenetic and Genome Research. 110 (1–4): 35–48. doi:10.1159/000084936. PMID   16093656. S2CID   25083962.
  9. Dewannieux M, Esnault C, Heidmann T (September 2003). "LINE-mediated retrotransposition of marked Alu sequences". Nature Genetics. 35 (1): 41–48. doi:10.1038/ng1223. PMID   12897783. S2CID   32151696.
  10. Graur D, Shuali Y, Li WH (April 1989). "Deletions in processed pseudogenes accumulate faster in rodents than in humans". Journal of Molecular Evolution. 28 (4): 279–285. Bibcode:1989JMolE..28..279G. doi:10.1007/BF02103423. PMID   2499684. S2CID   22437436.
  11. Baertsch R, Diekhans M, Kent WJ, Haussler D, Brosius J (October 2008). "Retrocopy contributions to the evolution of the human genome". BMC Genomics. 9: 466. doi: 10.1186/1471-2164-9-466 . PMC   2584115 . PMID   18842134.
  12. Pavlícek A, Paces J, Zíka R, Hejnar J (October 2002). "Length distribution of long interspersed nucleotide elements (LINEs) and processed pseudogenes of human endogenous retroviruses: implications for retrotransposition and pseudogene detection". Gene. 300 (1–2): 189–194. doi:10.1016/S0378-1119(02)01047-8. PMID   12468100.
  13. Navarro FC, Galante PA (July 2015). "A Genome-Wide Landscape of Retrocopies in Primate Genomes". Genome Biology and Evolution. 7 (8): 2265–2275. doi:10.1093/gbe/evv142. PMC   4558860 . PMID   26224704.
  14. Schrider DR, Navarro FC, Galante PA, Parmigiani RB, Camargo AA, Hahn MW, de Souza SJ (2013-01-24). "Gene copy-number polymorphism caused by retrotransposition in humans". PLOS Genetics. 9 (1): e1003242. doi: 10.1371/journal.pgen.1003242 . PMC   3554589 . PMID   23359205.
  15. Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, et al. (June 2007). "Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution". Genome Research. 17 (6): 839–851. doi:10.1101/gr.5586307. PMC   1891343 . PMID   17568002.
  16. Max EE (2003-05-05). "Plagiarized Errors and Molecular Genetics". TalkOrigins Archive . Retrieved 2008-07-22.
  17. 1 2 Lynch M, Conery JS (November 2000). "The evolutionary fate and consequences of duplicate genes". Science. 290 (5494): 1151–1155. Bibcode:2000Sci...290.1151L. doi:10.1126/science.290.5494.1151. PMID   11073452.
  18. Walsh JB (January 1995). "How often do duplicated genes evolve new functions?". Genetics. 139 (1): 421–428. doi:10.1093/genetics/139.1.421. PMC   1206338 . PMID   7705642.
  19. Lynch M, O'Hely M, Walsh B, Force A (December 2001). "The probability of preservation of a newly arisen gene duplicate". Genetics. 159 (4): 1789–1804. doi:10.1093/genetics/159.4.1789. PMC   1461922 . PMID   11779815.
  20. Harrison PM, Hegyi H, Balasubramanian S, Luscombe NM, Bertone P, Echols N, et al. (February 2002). "Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22". Genome Research. 12 (2): 272–280. doi:10.1101/gr.207102. PMC   155275 . PMID   11827946.
  21. Zhang J (2003). "Evolution by gene duplication: an update". Trends in Ecology and Evolution. 18 (6): 292–298. doi:10.1016/S0169-5347(03)00033-8.
  22. Nishikimi M, Kawai T, Yagi K (October 1992). "Guinea pigs possess a highly mutated gene for L-gulono-gamma-lactone oxidase, the key enzyme for L-ascorbic acid biosynthesis missing in this species". The Journal of Biological Chemistry. 267 (30): 21967–21972. doi: 10.1016/S0021-9258(19)36707-9 . PMID   1400507.
  23. Nishikimi M, Fukuyama R, Minoshima S, Shimizu N, Yagi K (May 1994). "Cloning and chromosomal mapping of the human nonfunctional gene for L-gulono-gamma-lactone oxidase, the enzyme for L-ascorbic acid biosynthesis missing in man". The Journal of Biological Chemistry. 269 (18): 13685–13688. doi: 10.1016/S0021-9258(17)36884-9 . PMID   8175804.
  24. Xue Y, Daly A, Yngvadottir B, Liu M, Coop G, Kim Y, et al. (April 2006). "Spread of an inactive form of caspase-12 in humans is due to recent positive selection". American Journal of Human Genetics. 78 (4): 659–670. doi:10.1086/503116. PMC   1424700 . PMID   16532395.
  25. Abascal, Federico; Juan, David; Jungreis, Irwin; Kellis, Manolis; Martinez, Laura; Rigau, Maria; Rodriguez, Jose Manuel; Vazquez, Jesus; Tress, Michael L. (2018-08-21). "Loose ends: almost one in five human genes still have unresolved coding status". Nucleic Acids Research. 46 (14): 7070–7084. doi:10.1093/nar/gky587. ISSN   1362-4962. PMC   6101605 . PMID   29982784.
  26. Rausell, Antonio; Luo, Yufei; Lopez, Marie; Seeleuthner, Yoann; Rapaport, Franck; Favier, Antoine; Stenson, Peter D.; Cooper, David N.; Patin, Etienne; Casanova, Jean-Laurent; Quintana-Murci, Lluis; Abel, Laurent (2020-06-16). "Common homozygosity for predicted loss-of-function variants reveals both redundant and advantageous effects of dispensable human genes". Proceedings of the National Academy of Sciences of the United States of America. 117 (24): 13626–13636. doi:10.1073/pnas.1917993117. ISSN   1091-6490. PMC   7306792 . PMID   32487729.
  27. Lopes-Marques, Mónica; Peixoto, M. João; Cooper, David N.; Prata, M. João; Azevedo, Luísa; Castro, L. Filipe C. (2024-11-02). "Polymorphic pseudogenes in the human genome - a comprehensive assessment". Human Genetics. doi:10.1007/s00439-024-02715-9. ISSN   1432-1203.
  28. Cheetham SW, Faulkner GJ, Dinger ME (March 2020). "Overcoming challenges and dogmas to understand the functions of pseudogenes". Nature Reviews. Genetics. 21 (3): 191–201. doi:10.1038/s41576-019-0196-1. PMID   31848477. S2CID   209393216.
  29. Zerbino DR, Frankish A, Flicek P (August 2020). "Progress, Challenges, and Surprises in Annotating the Human Genome". Annual Review of Genomics and Human Genetics. 21 (1): 55–79. doi: 10.1146/annurev-genom-121119-083418 . PMC   7116059 . PMID   32421357.
  30. 1 2 Prieto-Godino LL, Rytz R, Bargeton B, Abuin L, Arguello JR, Peraro MD, Benton R (November 2016). "Olfactory receptor pseudo-pseudogenes". Nature. 539 (7627): 93–97. Bibcode:2016Natur.539...93P. doi:10.1038/nature19824. PMC   5164928 . PMID   27776356.
  31. Jeffs P, Ashburner M (May 1991). "Processed pseudogenes in Drosophila". Proceedings. Biological Sciences. 244 (1310): 151–159. Bibcode:1991RSPSB.244..151J. doi:10.1098/rspb.1991.0064. PMID   1679549. S2CID   1665885.
  32. Wang W, Zhang J, Alvarez C, Llopart A, Long M (September 2000). "The origin of the Jingwei gene and the complex modular structure of its parental gene, yellow emperor, in Drosophila melanogaster". Molecular Biology and Evolution. 17 (9): 1294–1301. doi: 10.1093/oxfordjournals.molbev.a026413 . PMID   10958846.
  33. Long M, Langley CH (April 1993). "Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila". Science. 260 (5104): 91–95. Bibcode:1993Sci...260...91L. doi:10.1126/science.7682012. PMID   7682012.
  34. Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu XJ, et al. (September 2012). "The GENCODE pseudogene resource". Genome Biology. 13 (9): R51. doi: 10.1186/gb-2012-13-9-r51 . PMC   3491395 . PMID   22951037.
  35. Wright JC, Mudge J, Weisser H, Barzine MP, Gonzalez JM, Brazma A, et al. (June 2016). "Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow". Nature Communications. 7: 11778. Bibcode:2016NatCo...711778W. doi:10.1038/ncomms11778. PMC   4895710 . PMID   27250503.
  36. Dierick HA, Mercer JF, Glover TW (October 1997). "A phosphoglycerate mutase brain isoform (PGAM 1) pseudogene is localized within the human Menkes disease gene (ATP7 A)". Gene. 198 (1–2): 37–41. doi:10.1016/s0378-1119(97)00289-8. PMID   9370262.
  37. Betrán E, Wang W, Jin L, Long M (May 2002). "Evolution of the phosphoglycerate mutase processed gene in human and chimpanzee revealing the origin of a new primate gene". Molecular Biology and Evolution. 19 (5): 654–663. doi: 10.1093/oxfordjournals.molbev.a004124 . PMID   11961099.
  38. Okuda H, Tsujimura A, Irie S, Yamamoto K, Fukuhara S, Matsuoka Y, et al. (2012). "A single nucleotide polymorphism within the novel sex-linked testis-specific retrotransposed PGAM4 gene influences human male fertility". PLOS ONE. 7 (5): e35195. Bibcode:2012PLoSO...735195O. doi: 10.1371/journal.pone.0035195 . PMC   3348931 . PMID   22590500.
  39. Belinky F, Ganguly I, Poliakov E, Yurchenko V, Rogozin IB (February 2021). "Analysis of Stop Codons within Prokaryotic Protein-Coding Genes Suggests Frequent Readthrough Events". International Journal of Molecular Sciences. 22 (4): 1876. doi: 10.3390/ijms22041876 . PMC   7918605 . PMID   33672790.
  40. Feng Y, Wang Z, Chien KY, Chen HL, Liang YH, Hua X, Chiu CH (May 2022). ""Pseudo-pseudogenes" in bacterial genomes: Proteogenomics reveals a wide but low protein expression of pseudogenes in Salmonella enterica". Nucleic Acids Research. 50 (9): 5158–5170. doi: 10.1093/nar/gkac302 . PMC   9122581 . PMID   35489061.
  41. Chan WL, Chang JG (2014). "Pseudogene-Derived Endogenous siRNAs and Their Function". Pseudogenes. Methods in Molecular Biology. Vol. 1167. pp. 227–39. doi:10.1007/978-1-4939-0835-6_15. ISBN   978-1-4939-0834-9. PMID   24823781.
  42. Chan WL, Yuo CY, Yang WK, Hung SY, Chang YS, Chiu CC, et al. (April 2013). "Transcribed pseudogene ψPPM1K generates endogenous siRNA to suppress oncogenic cell growth in hepatocellular carcinoma". Nucleic Acids Research. 41 (6): 3734–3747. doi:10.1093/nar/gkt047. PMC   3616710 . PMID   23376929.
  43. Roberts TC, Morris KV (December 2013). "Not so pseudo anymore: pseudogenes as therapeutic targets". Pharmacogenomics. 14 (16): 2023–2034. doi:10.2217/pgs.13.172. PMC   4068744 . PMID   24279857.
  44. Olovnikov I, Le Thomas A, Aravin AA (2014). "A Framework for piRNA Cluster Manipulation". PIWI-Interacting RNAs. Methods in Molecular Biology. Vol. 1093. pp. 47–58. doi:10.1007/978-1-62703-694-8_5. ISBN   978-1-62703-693-1. PMID   24178556.
  45. Siomi MC, Sato K, Pezic D, Aravin AA (April 2011). "PIWI-interacting small RNAs: the vanguard of genome defence". Nature Reviews. Molecular Cell Biology. 12 (4): 246–258. doi:10.1038/nrm3089. PMID   21427766. S2CID   5710813.
  46. Karreth FA, Reschke M, Ruocco A, Ng C, Chapuy B, Léopold V, et al. (April 2015). "The BRAF pseudogene functions as a competitive endogenous RNA and induces lymphoma in vivo". Cell. 161 (2): 319–332. doi:10.1016/j.cell.2015.02.043. PMC   6922011 . PMID   25843629.
  47. Dahia PL, FitzGerald MG, Zhang X, Marsh DJ, Zheng Z, Pietsch T, et al. (May 1998). "A highly conserved processed PTEN pseudogene is located on chromosome band 9p21". Oncogene. 16 (18): 2403–2406. doi: 10.1038/sj.onc.1201762 . PMID   9620558.
  48. Poliseno L, Salmena L, Zhang J, Carver B, Haveman WJ, Pandolfi PP (June 2010). "A coding-independent function of gene and pseudogene mRNAs regulates tumour biology". Nature. 465 (7301): 1033–1038. Bibcode:2010Natur.465.1033P. doi:10.1038/nature09144. PMC   3206313 . PMID   20577206.
  49. Balakirev ES, Ayala FJ (2003). "Pseudogenes: are they "junk" or functional DNA?". Annual Review of Genetics. 37: 123–151. doi:10.1146/annurev.genet.37.040103.103949. PMID   14616058.
  50. Goodhead I, Darby AC (February 2015). "Taking the pseudo out of pseudogenes" (PDF). Current Opinion in Microbiology. 23: 102–109. doi:10.1016/j.mib.2014.11.012. PMID   25461580.
  51. 1 2 3 Dagan T, Blekhman R, Graur D (February 2006). "The "domino theory" of gene death: gradual and mass gene extinction events in three lineages of obligate symbiotic bacterial pathogens". Molecular Biology and Evolution. 23 (2): 310–316. doi: 10.1093/molbev/msj036 . PMID   16237210.
  52. Malhotra S, Vedithi SC, Blundell TL (August 2017). "Decoding the similarities and differences among mycobacterial species". PLOS Neglected Tropical Diseases. 11 (8): e0005883. doi: 10.1371/journal.pntd.0005883 . PMC   5595346 . PMID   28854187.
  53. 1 2 Lopera J, Miller IJ, McPhail KL, Kwan JC (November 21, 2017). "Increased Biosynthetic Gene Dosage in a Genome-Reduced Defensive Bacterial Symbiont". mSystems. 2 (6): 1–18. doi:10.1128/msystems.00096-17. PMC   5698493 . PMID   29181447.
  54. Cole ST, Eiglmeier K, Parkhill J, James KD, Thomson NR, Wheeler PR, et al. (February 2001). "Massive gene decay in the leprosy bacillus". Nature. 409 (6823): 1007–1011. Bibcode:2001Natur.409.1007C. doi:10.1038/35059006. PMID   11234002. S2CID   4307207.

Further reading