Periannan Senapathy

Last updated

Periannan Senapathy
Dr.Periannan Senapathy.jpg
Born
Alma mater Loyola College
Madras University
Indian Institute of Science
Known for Genomics
Clinical Genomics
RNA Splicing
Split genes
Scientific career
Institutions National Institutes of Health
University of Wisconsin, Madison
Website Genome International Corporation

Periannan Senapathy is a molecular biologist, geneticist, author and entrepreneur. He is the founder, president and chief scientific officer at Genome International Corporation, a biotechnology, bioinformatics, and information technology firm based in Madison, Wisconsin, which develops computational genomics applications of next-generation DNA sequencing (NGS) and clinical decision support systems for analyzing patient genome data that aids in diagnosis and treatment of diseases.

Contents

Senapathy is known for his contributions in genetics, genomics and clinical genomics, especially in the biology of RNA splicing and the split structure of eukaryotic genes. [1] [2] [3] [4] [5] [6] [7] He developed the Shapiro & Senapathy algorithm (S&S) for predicting the splice sites, exons and genes of eukaryotes, which has become the primary methodology for discovering disease-causing mutations in splice junctions. The S&S has been implemented in many gene-finding and mutation detection tools that are used extensively in major clinical and research institutions around the world for uncovering mutations in thousands of patients with numerous diseases, including cancers and inherited disorders. [8] [9] [10] [11] [12] It is increasingly used in the Next Generation Sequencing era, as it is widely realized that >50% of all diseases and adverse drug reactions in humans and other animals possibly occur within the splicing regions of genes. [13] [14] [15] [16] [17] [18] [19] The S&S algorithm has been cited in ~6,000 publications on finding splicing mutations in thousands of cancer and inherited disorders.

Senapathy offered a new hypothesis on the origin of introns, split genes and splice junctions in eukaryotic genes. As the split structure of genes is central to eukaryotic biology, their origin has been a major question in biology. Senapathy proposed the "split gene theory," which states that the split structure arose due to the origin of split genes from random DNA sequences, and provided tangible evidence from genome sequences of several organisms. [1] [2] [4] [5] He also showed that the splice junctions of eukaryotic genes could have originated from the stop codon ends of the Open Reading Frames (ORFs) in random DNA sequences based on analysis of eukaryotic genomic DNA sequences. Marshall Nirenberg, the Nobel Laureate who had deciphered codons, communicated the papers to the PNAS. [1] [2] Senapathy has published his other scientific findings in journals including Science , Nucleic Acids Research, PNAS, Journal of Biological Chemistry, and Journal of Molecular Biology, and is the author of several patents in the genomics field.

Biography

Senapathy has a Ph.D. in molecular biology from the Indian Institute of Science, Bangalore, India. He spent twelve years in genome research for the National Institutes of Health's Laboratory of Molecular and Cell Biology (NIADDK) and the Laboratory of Statistical and Mathematical Methodology in the Division of Computer Research and Technology (DCRT) in Bethesda, Maryland (1980–87), and the Biotechnology Center and the Department of Genetics of the University of Wisconsin, Madison (1987–91). Senapathy founded Genome International in 1992 for developing computational biology research, products and services

Notable research contributions

Senapathy has provided major contributions in RNA splicing biology, impacting the understanding of the structure, function, and origin of the eukaryotic exons, introns, splice junctions, and split genes, and the applications of these findings in human medicine that has positively affected thousands of patients with hundreds of diseases including cancers and inherited disorders. His research is an example of the application of basic molecular biology research findings to human medicine with profound impact, and a variety of basic science and other practical applications in animals and plants.

Origin of split genes from random DNA sequences

The split gene theory answers major questions of why and how the split genes of eukaryotes originated. It states that if coding sequences for biological proteins originated from random primordial genetic sequences, the random occurrence of the 3 stop codons out of 64 codons would limit the open reading frames (ORFs) to a very short length of ~60 bases. Thus, coding sequences for biological proteins with average lengths of ~1,200 bases, and long coding sequences of 6,000 bases, can practically never occur in random sequences. Thus, genes had to occur in pieces in a split form, with short coding sequences (ORFs) that became exons, interrupted by very long random sequences that became introns. When the eukaryotic DNA was tested for ORF length distribution, it exactly matched that from random DNA, with very short ORFs that matched the lengths of exons, and very long introns as predicted, supporting the split gene theory. [1] [2] Thus, introns are relics left over from their random sequence origin, and thus are earmarked to be removed at the primary RNA stage, although incidentally they may have few genetic elements useful to the cell. The Nobel Laureate Marshall Nirenberg, who deciphered the codons, communicated the paper to the PNAS. [1] New Scientist covered this publication titled "A long explanation for introns". [20]

Noted molecular biologist and biophysicist Colin Blake from the Laboratory of Molecular Biophysics and Oxford Centre for Molecular Sciences, University of Oxford, commented on Senapathy's theory that: [21] "Recent work by Senapathy, when applied to RNA, comprehensively explains the origin of the segregated form of RNA into coding and non-coding regions. It also suggests why a splicing mechanism was developed at the start of primordial evolution. The presence of random sequence was therefore sufficient to create in the primordial ancestor the segregated form of RNA observed in the eukaryotic gene structure."

Origin of RNA splice junction signals from stop codons of ORFs

Senapathy's research also elucidates the origin of the splice junctions of eukaryotic genes, again the major questions of why and how the splice junction signals originated. Senapathy predicted that, if the split gene theory was true, the ends of these ORFs that had a stop codon would have become the ends of exons that would occur within introns, and that would define the splice junctions. Senapathy found that almost all splice junctions in eukaryotic genes contained stop codons exactly at the ends of introns, bordering the exons as predicted. [2] In fact, these stop codons were found to form the "canonical" AG:GT splicing sequence, with the three stop codons occurring as part of the strong consensus signals. Senapathy had observed that mutations in these stop codon bases within splice junctions were the cause of the majority of diseases caused by splicing mutations, emphasizing the importance of stop codons in the splice junctions. Thus, the basic split gene theory led to the hypothesis that the splice junctions originated from the stop codons. [2] Marshall Nirenberg supported the publication of this paper in the PNAS. New Scientist covered this publication titled "Exons, Introns and Evolution". [22]

Why exons are short and introns are long

Research based on the split gene theory sheds lights on other basic questions of exons and introns. The exons of eukaryotes are generally short (human exons average ~120 bases, and can be as short as 10 bases) and introns are usually very long (average of ~3,000 bases, and can be several hundred thousands bases long), for example genes RBFOX1, CNTNAP2, PTPRD and DLG2. Senapathy has provided a plausible answer to these questions, which has remained the only explanation so far. Based on the split gene theory, exons of eukaryotic genes, if they originated from random DNA sequences, have to match the lengths of ORFs from random sequence, and possibly should be around 100 bases (close to the median length of ORFs in random sequence). The genome sequences of living organisms, for example the human, exhibits exactly the same average lengths of 120 bases for exons, and the longest exons of 600 bases (with few exceptions), which is the same length as that of the longest random ORFs. In addition, the introns can be very long, based on the split gene theory, which is found to be true in eukaryotic organisms.

Why genomes are large

This work also explains why the genomes are very large, for example, the human genome with three billion bases, and why only a very small fraction of the human genome (~2%) codes for the proteins and other regulatory elements. [23] [24] If split genes originated from random primordial DNA sequences, it would contain a significant amount of DNA that would be represented by introns. Furthermore, a genome assembled from random DNA containing split genes would also include intergenic random DNA. Thus, the nascent genomes that originated from random DNA sequences had to be large, regardless of the complexity of the organism. The observation that the genomes of several organisms such as that of the onion (~16 billion bases [25] ) and salamander (~32 billion bases [26] ) are much larger than that of the human (~3 billion bases [23] [24] ) but the organisms are no more complex than human provides credence to this split gene theory. Furthermore, the findings that the genomes of several organisms are smaller, although they contain essentially the same number of genes as that of the human, such as those of the C. elegans (genome size ~100 million bases, ~19,000 genes) [27] and Arabidopsis (genome size ~125 million bases, ~25,000 genes), [28] adds support to this theory. The split gene theory predicts that the introns in the split genes in these genomes could be the "reduced" (or deleted) form compared to the larger genes with long introns, thus leading to reduced genomes. [1] [4] In fact, researchers have recently proposed that these smaller genomes are actually reduced genomes, which adds support to the split gene theory. [29]

Origin of the spliceosomal machinery and the eukaryotic cell nucleus

Senapathy's research also addresses the origin of the spliceosomal machinery that edits out the introns from the RNA transcripts of genes. If the split genes had originated from random DNA, then the introns would have become an unnecessary but integral part of the eukaryotic genes along with the splice junctions at their ends. The spliceosomal machinery would be required to remove them and to enable the short exons to be linearly spliced together as a contiguously coding mRNA that can be translated into a complete protein. Thus, the split gene theory shows that the whole spliceosomal machinery originated due to the origin of split genes from random DNA sequences, and to remove the unnecessary introns. [1] [2]

Senapathy had also proposed a plausible mechanistic and functional rationale why the eukaryotic nucleus originated, a major unanswered question in biology. [1] [2] If the transcripts of the split genes and the spliced mRNAs were present in a cell without a nucleus, the ribosomes would try to bind to both the un-spliced primary RNA transcript and the spliced mRNA, which would result in a molecular chaos. If a boundary had originated to separate the RNA splicing process from the mRNA translation, it can avoid this problem of molecular chaos. This is exactly what is found in eukaryotic cells, where the splicing of the primary RNA transcript occurs within the nucleus, and the spliced mRNA is transported to the cytoplasm, where the ribosomes translate them into proteins. The nuclear boundary provides a clear separation of the primary RNA splicing and the mRNA translation.

Origin of the eukaryotic cell

These investigations thus led to the possibility that primordial DNA with essentially random sequence gave rise to the complex structure of the split genes with exons, introns and splice junctions. They also predict that the cells that harbored these split genes had to be complex with a nuclear cytoplasmic boundary, and must have had a spliceosomal machinery. Thus, it was possible that the earliest cell was complex and eukaryotic. [1] [2] [4] [5] Surprisingly, findings from extensive comparative genomics research from several organisms over the past 15 years are showing overwhelmingly that the earliest organisms could have been highly complex and eukaryotic, and could have contained complex proteins, [30] [31] [32] [33] [34] [35] [36] [37] exactly as predicted by Senapathy's theory.

The spliceosome is a highly complex machinery within the eukaryotic cell, containing ~200 proteins and several SnRNPs. In their paper [34] "Complex spliceosomal organization ancestral to extant eukaryotes," molecular biologists Lesley Collins and David Penny state "We begin with the hypothesis that ... the spliceosome has increased in complexity throughout eukaryotic evolution. However, examination of the distribution of spliceosomal components indicates that not only was a spliceosome present in the eukaryotic ancestor but it also contained most of the key components found in today's eukaryotes. ... the last common ancestor of extant eukaryotes appears to show much of the molecular complexity seen today." This suggests that the earliest eukaryotic organisms were highly complex and contained sophisticated genes and proteins, as the split gene theory predicts.

The Shapiro-Senapathy algorithm

The split gene theory culminated in the Shapiro-Senapathy algorithm, which aids in the identification of splicing mutations that cause numerous diseases and adverse drug reactions. [3] [7] This algorithm is increasingly used in clinical practice and research not only to find mutations in known disease-causing genes in patients, but also to discover novel genes that are causal of different diseases. In addition, it is employed in finding the mechanism of aberrant splicing in individual patients as well as cohorts of patients with a particular disease. Furthermore, it is used in defining the cryptic splice sites and deducing the mechanisms by which mutations in them can affect normal splicing and lead to different diseases. It is also employed in addressing various questions in basic research in humans, animals and plants.

These contributions have impacted major questions in eukaryotic biology and their applications to human medicine. These applications may expand as the fields of clinical genomics and pharmacogenomics magnify their research with mega sequencing projects such as the All of Us project that will sequence a million individuals, and with the sequencing of millions of patients in clinical practice and research in the future.

Selected publications

Related Research Articles

<span class="mw-page-title-main">Exon</span> A region of a transcribed gene present in the final functional mRNA molecule

An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term exon refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature RNA. Just as the entire set of genes for a species constitutes the genome, the entire set of exons constitutes the exome.

An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.

<span class="mw-page-title-main">RNA splicing</span> Process in molecular biology

RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). It works by removing all the introns and splicing back together exons. For nuclear-encoded genes, splicing occurs in the nucleus either during or immediately after transcription. For those eukaryotic genes that contain introns, splicing is usually needed to create an mRNA molecule that can be translated into protein. For many eukaryotic introns, splicing occurs in a series of reactions which are catalyzed by the spliceosome, a complex of small nuclear ribonucleoproteins (snRNPs). There exist self-splicing introns, that is, ribozymes that can catalyze their own excision from their parent RNA molecule. The process of transcription, splicing and translation is called gene expression, the central dogma of molecular biology.

Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Regions that are completely nonfunctional are called junk DNA.

The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

<span class="mw-page-title-main">Alternative splicing</span> Process by which a gene can code for multiple proteins

Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene may be included within or excluded from the final RNA product of the gene. This means the exons are joined in different combinations, leading to different splice variants. In the case of protein-coding genes, the proteins translated from these splice variants may contain differences in their amino acid sequence and in their biological functions.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

Trans-splicing is a special form of RNA processing where exons from two different primary RNA transcripts are joined end to end and ligated. It is usually found in eukaryotes and mediated by the spliceosome, although some bacteria and archaea also have "half-genes" for tRNAs.

<span class="mw-page-title-main">Nonsense-mediated decay</span> Elimination of mRNA with premature stop codons in eukaryotes

Nonsense-mediated mRNA decay (NMD) is a surveillance pathway that exists in all eukaryotes. Its main function is to reduce errors in gene expression by eliminating mRNA transcripts that contain premature stop codons. Translation of these aberrant mRNAs could, in some cases, lead to deleterious gain-of-function or dominant-negative activity of the resulting proteins.

<span class="mw-page-title-main">Gene</span> Sequence of DNA or RNA that codes for an RNA or protein product

In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.

Gene structure is the organisation of specialised sequence elements within a gene. Genes contain most of the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequence determines the function of the gene. A gene is transcribed (copied) from DNA into RNA, which can either be non-coding (ncRNA) with a direct function, or an intermediate messenger (mRNA) that is then translated into protein. Each of these steps is controlled by specific sequence elements, or regions, within the gene. Every gene, therefore, requires multiple sequence elements to be functional. This includes the sequence that actually encodes the functional protein or ncRNA, as well as multiple regulatory sequence regions. These regions may be as short as a few base pairs, up to many thousands of base pairs long.

Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron structure. There are different mechanisms through which exon shuffling occurs: transposon mediated exon shuffling, crossover during sexual recombination of parental genomes and illegitimate recombination.

<span class="mw-page-title-main">Splice site mutation</span> Mutation at a location where intron splicing takes place

A splice site mutation is a genetic mutation that inserts, deletes or changes a number of nucleotides in the specific site at which splicing takes place during the processing of precursor messenger RNA into mature messenger RNA. Splice site consensus sequences that drive exon recognition are located at the very termini of introns. The deletion of the splicing site results in one or more introns remaining in mature mRNA and may lead to the production of abnormal proteins. When a splice site mutation occurs, the mRNA transcript possesses information from these introns that normally should not be included. Introns are supposed to be removed, while the exons are expressed.

<span class="mw-page-title-main">U2 spliceosomal RNA</span>

U2 spliceosomal snRNAs are a species of small nuclear RNA (snRNA) molecules found in the major spliceosomal (Sm) machinery of virtually all eukaryotic organisms. In vivo, U2 snRNA along with its associated polypeptides assemble to produce the U2 small nuclear ribonucleoprotein (snRNP), an essential component of the major spliceosomal complex. The major spliceosomal-splicing pathway is occasionally referred to as U2 dependent, based on a class of Sm intron—found in mRNA primary transcripts—that are recognized exclusively by the U2 snRNP during early stages of spliceosomal assembly. In addition to U2 dependent intron recognition, U2 snRNA has been theorized to serve a catalytic role in the chemistry of pre-RNA splicing as well. Similar to ribosomal RNAs (rRNAs), Sm snRNAs must mediate both RNA:RNA and RNA:protein contacts and hence have evolved specialized, highly conserved, primary and secondary structural elements to facilitate these types of interactions.

<span class="mw-page-title-main">Untranslated region</span> Non-coding regions on either end of mRNA

In molecular genetics, an untranslated region refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR, or if it is found on the 3' side, it is called the 3' UTR. mRNA is RNA that carries information from DNA to the ribosome, the site of protein synthesis (translation) within a cell. The mRNA is initially transcribed from the corresponding DNA sequence and then translated into protein. However, several regions of the mRNA are usually not translated into protein, including the 5' and 3' UTRs.

mRNA surveillance mechanisms are pathways utilized by organisms to ensure fidelity and quality of messenger RNA (mRNA) molecules. There are a number of surveillance mechanisms present within cells. These mechanisms function at various steps of the mRNA biogenesis pathway to detect and degrade transcripts that have not properly been processed.

A nested gene is a gene whose entire coding sequence lies within the bounds of a larger external gene. The coding sequence for a nested gene differs greatly from the coding sequence for its external host gene. Typically, nested genes and their host genes encode functionally unrelated proteins, and have different expression patterns in an organism.

Numerous key discoveries in biology have emerged from studies of RNA, including seminal work in the fields of biochemistry, genetics, microbiology, molecular biology, molecular evolution, and structural biology. As of 2010, 30 scientists have been awarded Nobel Prizes for experimental work that includes studies of RNA. Specific discoveries of high biological significance are discussed in this article.

<span class="mw-page-title-main">Shapiro–Senapathy algorithm</span>

The ShapiroSenapathy algorithm (S&S) is an algorithm for predicting splice junctions in genes of animals and plants. This algorithm has been used to discover disease-causing splice site mutations and cryptic splice sites.

The split gene theory is a theory of the origin of introns, long non-coding sequences in eukaryotic genes between the exons. The theory holds that the randomness of primordial DNA sequences would only permit small (< 600bp) open reading frames (ORFs), and that important intron structures and regulatory sequences are derived from stop codons. In this introns-first framework, the spliceosomal machinery and the nucleus evolved due to the necessity to join these ORFs into larger proteins, and that intronless bacterial genes are less ancestral than the split eukaryotic genes. The theory originated with Periannan Senapathy.

References

  1. 1 2 3 4 5 6 7 8 9 Senapathy, P (April 1986). "Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications". Proceedings of the National Academy of Sciences of the United States of America. 83 (7): 2133–2137. Bibcode:1986PNAS...83.2133S. doi: 10.1073/pnas.83.7.2133 . ISSN   0027-8424. PMC   323245 . PMID   3457379.
  2. 1 2 3 4 5 6 7 8 9 Senapathy, P (February 1988). "Possible evolution of splice-junction signals in eukaryotic genes from stop codons". Proceedings of the National Academy of Sciences of the United States of America. 85 (4): 1129–1133. Bibcode:1988PNAS...85.1129S. doi: 10.1073/pnas.85.4.1129 . ISSN   0027-8424. PMC   279719 . PMID   3422483.
  3. 1 2 Shapiro, M. B.; Senapathy, P. (11 September 1987). "RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression". Nucleic Acids Research. 15 (17): 7155–7174. doi:10.1093/nar/15.17.7155. ISSN   0305-1048. PMC   306199 . PMID   3658675.
  4. 1 2 3 4 Senapathy, Periannan; Singh, Chandan Kumar; Bhasi, Ashwini; Regulapati, Rahul (20 October 2008). "Origination of the Split Structure of Spliceosomal Genes from Random Genetic Sequences". PLOS ONE. 3 (10): e3456. Bibcode:2008PLoSO...3.3456R. doi: 10.1371/journal.pone.0003456 . ISSN   1932-6203. PMC   2565106 . PMID   18941625.
  5. 1 2 3 Senapathy, P. (2 June 1995). "Introns and the origin of protein-coding genes". Science. 268 (5215): 1366–1367. Bibcode:1995Sci...268.1366S. doi: 10.1126/science.7761858 . ISSN   1095-9203. PMID   7761858.
  6. Harris, N L; Senapathy, P (25 May 1990). "Distribution and consensus of branch point signals in eukaryotic genes: a computerized statistical analysis". Nucleic Acids Research. 18 (10): 3015–3019. doi:10.1093/nar/18.10.3015. ISSN   0305-1048. PMC   330832 . PMID   2349097.
  7. 1 2 Senapathy, P.; Shapiro, M. B.; Harris, N. L. (1990). [16] Splice junctions, branch point sites, and exons: Sequence statistics, identification, and applications to genome project. Methods in Enzymology. Vol. 183. pp.  252–278. doi:10.1016/0076-6879(90)83018-5. ISBN   9780121820848. ISSN   0076-6879. PMID   2314278.
  8. Béroud, Christophe; Claustres, Mireille; Collod-Béroud, Gwenaëlle; Lalande, Marine; Hamroun, Dalil; Desmet, François-Olivier (1 May 2009). "Human Splicing Finder: an online bioinformatics tool to predict splicing signals". Nucleic Acids Research. 37 (9): e67. doi:10.1093/nar/gkp215. ISSN   0305-1048. PMC   2685110 . PMID   19339519.
  9. "Splice-Site Analyzer Tool". ibis.tau.ac.il. Retrieved 5 December 2018.
  10. Buratti, Emanuele; Chivers, Martin; Hwang, Gyulin; Vorechovsky, Igor (January 2011). "DBASS3 and DBASS5: databases of aberrant 3'- and 5'-splice sites". Nucleic Acids Research. 39 (Database issue): D86–91. doi:10.1093/nar/gkq887. ISSN   1362-4962. PMC   3013770 . PMID   20929868.
  11. Houdayer, Claude (2011). "In Silico Prediction of Splice-Affecting Nucleotide Variants". In Silico Tools for Gene Discovery. Methods in Molecular Biology. Vol. 760. pp. 269–281. doi:10.1007/978-1-61779-176-5_17. ISBN   978-1-61779-175-8. PMID   21780003.
  12. Schwartz, Schraga; Hall, Eitan; Ast, Gil (July 2009). "SROOGLE: webserver for integrative, user-friendly visualization of splicing signals". Nucleic Acids Research. 37 (Web Server issue): W189–192. doi:10.1093/nar/gkp320. ISSN   1362-4962. PMC   2703896 . PMID   19429896.
  13. López-Bigas, Núria; Audit, Benjamin; Ouzounis, Christos; Parra, Genís; Guigó, Roderic (28 March 2005). "Are splicing mutations the most frequent cause of hereditary disease?". FEBS Letters. 579 (9): 1900–1903. doi: 10.1016/j.febslet.2005.02.047 . ISSN   1873-3468. PMID   15792793.
  14. Estivill, Xavier; Lázaro, Conxi; Gaona, Antonia; Kruyer, Helena; García, Judit; Serra, Eduard; Ars, Elisabet (22 January 2000). "Mutations affecting mRNA splicing are the most common molecular defects in patients with neurofibromatosis type 1". Human Molecular Genetics. 9 (2): 237–247. doi: 10.1093/hmg/9.2.237 . ISSN   0964-6906. PMID   10607834.
  15. Concannon, Patrick; Gatti, Richard A.; Bernatowska, Eva; Sanal, Özden; Chessa, Luciana; Tolun, Asli; Önengüt, Suna; Liang, Teresa; Becker-Catania, Sara (1 June 1999). "Splicing Defects in the Ataxia-Telangiectasia Gene, ATM: Underlying Mutations and Consequences". The American Journal of Human Genetics. 64 (6): 1617–1631. doi:10.1086/302418. ISSN   1537-6605. PMC   1377904 . PMID   10330348.
  16. Lázaro, C.; Estivill, X.; Ravella, A.; Serra, E.; Pros, E.; Morell, M.; Kruyer, H.; Ars, E. (1 June 2003). "Recurrent mutations in the NF1 gene are common among neurofibromatosis type 1 patients". Journal of Medical Genetics. 40 (6): e82. doi:10.1136/jmg.40.6.e82. ISSN   1468-6244. PMC   1735494 . PMID   12807981.
  17. Bozon, Dominique; Rousson, Robert; Rouvet, Isabelle; Bonnet, Véronique; Albuisson, Juliette; Millat, Gilles; Crehalet, Hervé (5 June 2012). "Combined use of in silico and in vitro splicing assays for interpretation of genomic variants of unknown significance in cardiomyopathies and channelopathies". Cardiogenetics. 2 (1): e6. doi: 10.4081/cardiogenetics.2012.e6 . ISSN   2035-8148.
  18. Schmutzler, Rita K.; Meindl, Alfons; Hahnen, Eric; Rhiem, Kerstin; Arnold, Norbert; Kast, Karin; Köhler, Juliane; Engert, Stefanie; Weber, Ute (11 December 2012). "Analysis of 30 Putative BRCA1 Splicing Mutations in Hereditary Breast and Ovarian Cancer Families Identifies Exonic Splice Site Mutations That Escape In Silico Prediction". PLOS ONE. 7 (12): e50800. Bibcode:2012PLoSO...750800W. doi: 10.1371/journal.pone.0050800 . ISSN   1932-6203. PMC   3519833 . PMID   23239986.
  19. Barta, Andrea; Schumperli, Daniel (2010). "Editorial on alternative splicing and disease". RNA Biology. 7 (4): 388–389. doi: 10.4161/rna.7.4.12818 . PMID   21140604.
  20. New Scientist. Reed Business Information. 26 June 1986.
  21. Proteins, Exons, and Molecular Evolution, S.K. Holland and C.C.F. Blake, in Stone, Edwin M; Schwartz, Robert Joel, ed (1990). Intervening sequences in evolution and development. New York : Oxford University Press. ISBN   978-0195043372.{{cite book}}: |first2= has generic name (help)CS1 maint: multiple names: authors list (link)
  22. New Scientist. Reed Business Information. 31 March 1988.
  23. 1 2 Lander, E. S.; Linton, L. M.; Birren, B.; Nusbaum, C.; Zody, M. C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M. (15 February 2001). "Initial sequencing and analysis of the human genome" (PDF). Nature. 409 (6822): 860–921. Bibcode:2001Natur.409..860L. doi: 10.1038/35057062 . ISSN   0028-0836. PMID   11237011.
  24. 1 2 Venter, J. C.; Adams, M. D.; Myers, E. W.; Li, P. W.; Mural, R. J.; Sutton, G. G.; Smith, H. O.; Yandell, M.; Evans, C. A. (16 February 2001). "The sequence of the human genome". Science. 291 (5507): 1304–1351. Bibcode:2001Sci...291.1304V. doi: 10.1126/science.1058040 . ISSN   0036-8075. PMID   11181995.
  25. Jo, Jinkwan; Purushotham, Preethi M.; Han, Koeun; Lee, Heung-Ryul; Nah, Gyoungju; Kang, Byoung-Cheorl (14 September 2017). "Development of a Genetic Map for Onion (Allium cepa L.) Using Reference-Free Genotyping-by-Sequencing and SNP Assays". Frontiers in Plant Science. 8: 1606. doi: 10.3389/fpls.2017.01606 . ISSN   1664-462X. PMC   5604068 . PMID   28959273.
  26. Keinath, Melissa C.; Timoshevskiy, Vladimir A.; Timoshevskaya, Nataliya Y.; Tsonis, Panagiotis A.; Voss, S. Randal; Smith, Jeramiah J. (10 November 2015). "Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing". Scientific Reports. 5: 16413. Bibcode:2015NatSR...516413K. doi:10.1038/srep16413. ISSN   2045-2322. PMC   4639759 . PMID   26553646.
  27. Consortium*, The C. elegans Sequencing (11 December 1998). "Genome Sequence of the Nematode C. elegans: A Platform for Investigating Biology". Science. 282 (5396): 2012–2018. Bibcode:1998Sci...282.2012.. doi:10.1126/science.282.5396.2012. ISSN   1095-9203. PMID   9851916. S2CID   16873716.
  28. Arabidopsis Genome Initiative (14 December 2000). "Analysis of the genome sequence of the flowering plant Arabidopsis thaliana". Nature. 408 (6814): 796–815. Bibcode:2000Natur.408..796T. doi: 10.1038/35048692 . ISSN   0028-0836. PMID   11130711.
  29. Bennetzen, Jeffrey L.; Brown, James K. M.; Devos, Katrien M. (1 July 2002). "Genome Size Reduction through Illegitimate Recombination Counteracts Genome Expansion in Arabidopsis". Genome Research. 12 (7): 1075–1079. doi:10.1101/gr.132102. ISSN   1549-5469. PMC   186626 . PMID   12097344.
  30. Kurland, C. G.; Canbäck, B.; Berg, O. G. (December 2007). "The origins of modern proteomes". Biochimie. 89 (12): 1454–1463. doi:10.1016/j.biochi.2007.09.004. ISSN   0300-9084. PMID   17949885.
  31. Caetano-Anollés, Gustavo; Caetano-Anollés, Derek (July 2003). "An evolutionarily structured universe of protein architecture". Genome Research. 13 (7): 1563–1571. doi:10.1101/gr.1161903. ISSN   1088-9051. PMC   403752 . PMID   12840035.
  32. Glansdorff, Nicolas; Xu, Ying; Labedan, Bernard (9 July 2008). "The last universal common ancestor: emergence, constitution and genetic legacy of an elusive forerunner". Biology Direct. 3: 29. doi: 10.1186/1745-6150-3-29 . ISSN   1745-6150. PMC   2478661 . PMID   18613974.
  33. Kurland, C. G.; Collins, L. J.; Penny, D. (19 May 2006). "Genomics and the irreducible nature of eukaryote cells". Science. 312 (5776): 1011–1014. Bibcode:2006Sci...312.1011K. doi:10.1126/science.1121674. ISSN   1095-9203. PMID   16709776. S2CID   30768101.
  34. 1 2 Collins, Lesley; Penny, David (April 2005). "Complex spliceosomal organization ancestral to extant eukaryotes". Molecular Biology and Evolution. 22 (4): 1053–1066. doi: 10.1093/molbev/msi091 . ISSN   0737-4038. PMID   15659557.
  35. Poole, A. M.; Jeffares, D. C.; Penny, D. (January 1998). "The path from the RNA world". Journal of Molecular Evolution. 46 (1): 1–17. Bibcode:1998JMolE..46....1P. doi:10.1007/PL00006275. ISSN   0022-2844. PMID   9419221. S2CID   17968659.
  36. Penny, David; Collins, Lesley J.; Daly, Toni K.; Cox, Simon J. (December 2014). "The relative ages of eukaryotes and akaryotes". Journal of Molecular Evolution. 79 (5–6): 228–239. Bibcode:2014JMolE..79..228P. doi:10.1007/s00239-014-9643-y. ISSN   1432-1432. PMID   25179144. S2CID   17512331.
  37. Fuerst, John A.; Sagulenko, Evgeny (4 May 2012). "Keys to Eukaryality: Planctomycetes and Ancestral Evolution of Cellular Complexity". Frontiers in Microbiology. 3: 167. doi: 10.3389/fmicb.2012.00167 . ISSN   1664-302X. PMC   3343278 . PMID   22586422.
  38. Senapathy, Periannan (1994). Independent Birth of Organisms: A New Theory that Distinct Organisms Arose Independently from the Primordial Pond, Showing that Evolutionary Theories are Fundamentally Incorrect. Genome Press. ISBN   0964130408.