List of sequenced eukaryotic genomes

Last updated

Saccharomyces cerevisiae was the first eukaryotic organism to have its complete genome sequence determined. S cerevisiae under DIC microscopy.jpg
Saccharomyces cerevisiae was the first eukaryotic organism to have its complete genome sequence determined.

This list of "sequenced" eukaryotic genomes contains all the eukaryotes known to have publicly available complete nuclear and organelle genome sequences that have been sequenced, assembled, annotated and published; draft genomes are not included, nor are organelle-only sequences.

Contents

DNA was first sequenced in 1977. The first free-living organism to have its genome completely sequenced was the bacterium Haemophilus influenzae , in 1995. In 1996 Saccharomyces cerevisiae (baker's yeast) was the first eukaryote genome sequence to be released and in 1998 the first genome sequence for a multicellular eukaryote, Caenorhabditis elegans , was released.

Protists

Following are the nine earliest sequenced genomes of protists. For a more complete list, see the List of sequenced protist genomes.

OrganismTypeRelevanceGenome sizeNumber of genes predictedOrganizationYear of completion
Guillardia theta Cryptomonad Model organism0.551 Mb
(nucleomorph genome only)
465, [1] 513, 598 (UniProt)Canadian Institute of Advanced Research, Philipps-University Marburg and the University of British Columbia 2001 [1]
Plasmodium falciparum
Clone:3D7
Apicomplexan Human pathogen (malaria)22.9 Mb5,268 [2] Malaria Genome Project Consortium2002 [2]
Plasmodium yoelii yoelii
Strain:17XNL
Apicomplexan Rodent pathogen (malaria)23.1 Mb5,878 [3] TIGR and NMRC2002 [3]
Cryptosporidium hominis
Strain:TU502
Apicomplexan Human pathogen10.4 Mb3,994 [4] Virginia Commonwealth University 2004 [4]
Cryptosporidium parvum
C- or genotype 2 isolate
Apicomplexan Human pathogen16.5 Mb3,807 [5] UCSF and University of Minnesota2004 [5]
Thalassiosira pseudonana
Strain:CCMP 1335
Diatom Model organism34.5 Mb11,242 [6] Joint Genome Institute and the University of Washington 2004 [6]
Trypanosoma cruzi
Strain:CL-Brener
Kinetoplastid Human Pathogen67 Mb22,570 [7] The Institute for Genome Research (TIGR) and Karolinska Institutet (KI) and Seattle Biomedical Research Institute (SBRI)2005 [7]
Trypanosoma brucei
Clone:TREU 927/4
Kinetoplastid Human Pathogen26 Mb9,068 [8] Wellcome Trust Sanger Institute and The Institute for Genome Research (TIGR)2005 [8]
Leishmania major
Strain: Friedlin
Kinetoplastid Human Pathogen32.8 Mb8,272 [9] Wellcome Trust Sanger Institute and Seattle Biomedical Research Institute (SBRI)2005 [9]

Plants

Following are the five earliest sequenced genomes of plants. For a more complete list, see the List of sequenced plant genomes.

OrganismTypeRelevanceGenome sizeNumber of chromosomesNumber of genes predictedOrganizationYear of completion
Arabidopsis thaliana
Ecotype:Columbia
Wild mustard Thale Cress Model plant135 Mb [10] 525,498, [11] 27,400, [12] 31,670 (UniProt)Arabidopsis Genome Initiative [13] 2000 [11]
Cyanidioschyzon merolae
Strain:10D
Red algae Simple eukaryote 16.5 Mb205,331 [14] University of Tokyo, Rikkyo University, Saitama University and Kumamoto University2004 [14]
Oryza sativa
ssp indica
Rice Crop and model organism420 Mb1232-50,000 [15] Beijing Genomics Institute, Zhejiang University and the Chinese Academy of Sciences2002 [15]
Ostreococcus tauri Green algae Simple eukaryote, small genome12.6 Mb7,969 (UniProt)Laboratoire Arago2006 [16]
Populus trichocarpa Balsam poplar or Black Cottonwood Carbon sequestration, model tree, commercial use (timber), and comparison to A. thaliana550 Mb1945,555 [17] The International Poplar Genome Consortium2006 [17]

Fungi

Following are the five earliest sequenced genomes of fungi. For a more complete list, see the List of sequenced fungi genomes.

OrganismTypeRelevanceGenome sizeNumber of genes predictedOrganizationYear of completion
Saccharomyces cerevisiae
Strain:S288C
Saccharomycetes Baker's Yeast; Model eukaryote12.1 Mb6,294 [18] International Collaboration for the Yeast Genome Sequencing [19] 1996 [18]
Encephalitozoon cuniculi Microsporidium Human pathogen2.9 Mb1,997 [20] Genoscope and Université Blaise Pascal 2001 [20]
Schizosaccharomyces pombe
Strain:972h-
Schizosaccharomycetes Model eukaryote14 Mb4,824 [21] Sanger Institute and Cold Spring Harbor Laboratory 2002 [21]
Neurospora crassa Sordariomycetes Model eukaryote40 Mb10,082 [22] Broad Institute, Oregon Health and Science University, University of Kentucky, and the University of Kansas 2003 [22]
Phanerochaete chrysosporium
Strain:RP78
Agaricomycetes Wood rotting fungus, use in mycoremediation 30 Mb11,777 [23] Joint Genome Institute 2004 [23]

Animals

Following are the five earliest sequenced genomes of animals. For a more complete list, see the List of sequenced animal genomes.

OrganismTypeRelevanceGenome sizeNumber of genes predictedOrganizationYear of completion
Caenorhabditis elegans
Strain:Bristol N2
Nematode Model animal100 Mb19,000 [24] Washington University and the Sanger Institute1998 [24]
Drosophila melanogaster Fruit flyModel animal165 Mb13,600 [25] Celera, UC Berkeley, Baylor College of Medicine, European DGP2000 [25]
Anopheles gambiae
Strain: PEST
Mosquito Vector of malaria 278 Mb 13,683 [26] Celera Genomics and Genoscope 2002 [26]
Takifugu rubripes Puffer fish Vertebrate with small genome390 Mb22–29,000 [27] International Fugu Genome Consortium [28] 2002 [29]
Homo sapiens Human3.2 Gb [30] 18,826 (CCDS consortium) Human Genome Project Consortium and Celera GenomicsDraft 2001 [31] [32]
Complete 2006 [33]

See also

Related Research Articles

<span class="mw-page-title-main">Genome</span> All genetic material of an organism

In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

Heterochromatin is a tightly packed form of DNA or condensed DNA, which comes in multiple varieties. These varieties lie on a continuum between the two extremes of constitutive heterochromatin and facultative heterochromatin. Both play a role in the expression of genes. Because it is tightly packed, it was thought to be inaccessible to polymerases and therefore not transcribed; however, according to Volpe et al. (2002), and many other papers since, much of this DNA is in fact transcribed, but it is continuously turned over via RNA-induced transcriptional silencing (RITS). Recent studies with electron microscopy and OsO4 staining reveal that the dense packing is not due to the chromatin.

Small RNA (sRNA) are polymeric RNA molecules that are less than 200 nucleotides in length, and are usually non-coding. RNA silencing is often a function of these molecules, with the most common and well-studied example being RNA interference (RNAi), in which endogenously expressed microRNA (miRNA) or exogenously derived small interfering RNA (siRNA) induces the degradation of complementary messenger RNA. Other classes of small RNA have been identified, including piwi-interacting RNA (piRNA) and its subspecies repeat associated small interfering RNA (rasiRNA). Small RNA "is unable to induce RNAi alone, and to accomplish the task it must form the core of the RNA–protein complex termed the RNA-induced silencing complex (RISC), specifically with Argonaute protein".

<span class="mw-page-title-main">Comparative genomics</span> Field of biological research

Comparative genomics is a branch of biological research that examines genome sequences across a spectrum of species, spanning from humans and mice to a diverse array of organisms from bacteria to chimpanzees. This large-scale holistic approach compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes. Comparison of whole genome sequences provides a highly detailed view of how organisms are related to each other at the gene level. By comparing whole genome sequences, researchers gain insights into genetic relationships between organisms and study evolutionary changes. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved or common among species, as well as genes that give unique characteristics of each organism. Moreover, these studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms.

<span class="mw-page-title-main">Chimpanzee genome project</span> Effort to determine the DNA sequence of the chimpanzee genome

The Chimpanzee Genome Project was an effort to determine the DNA sequence of the chimpanzee genome. Sequencing began in 2005 and by 2013 twenty-four individual chimpanzees had been sequenced. This project was folded into the Great Ape Genome Project.

Gerald Mayer Rubin is an American biologist, notable for pioneering the use of transposable P elements in genetics, and for leading the public project to sequence the Drosophila melanogaster genome. Related to his genomics work, Rubin's lab is notable for development of genetic and genomics tools and studies of signal transduction and gene regulation. Rubin also served as a vice president of the Howard Hughes Medical Institute (2003-2020) and founding executive director of its Janelia Research Campus.

<span class="mw-page-title-main">Reticulon 4 receptor</span> Protein-coding gene in the species Homo sapiens

Reticulon 4 receptor (RTN4R) also known as Nogo-66 Receptor (NgR) or Nogo receptor 1 is a protein which in humans is encoded by the RTN4R gene. This gene encodes the receptor for reticulon 4, oligodendrocytemyelin glycoprotein and myelin-associated glycoprotein. This receptor mediates axonal growth inhibition and may play a role in regulating axonal regeneration and plasticity in the adult central nervous system.

<span class="mw-page-title-main">SGK2</span> Protein-coding gene in the species Homo sapiens

Serine/threonine-protein kinase Sgk2 is an enzyme that in humans is encoded by the SGK2 gene.

<span class="mw-page-title-main">Whole genome sequencing</span> Determining nearly the entirety of the DNA sequence of an organisms genome at a single time

Whole genome sequencing (WGS) is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast.

Cancer genome sequencing is the whole genome sequencing of a single, homogeneous or heterogeneous group of cancer cells. It is a biochemical laboratory method for the characterization and identification of the DNA or RNA sequences of cancer cell(s).

<span class="mw-page-title-main">Reference genome</span> Digital nucleic acid sequence database

A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead, a reference provides a haploid mosaic of different DNA sequences from each donor. For example, one of the most recent human reference genomes, assembly GRCh38/hg38, is derived from >60 genomic clone libraries. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser.

Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.

Arabidopsis thaliana is a first class model organism and the single most important species for fundamental research in plant molecular genetics.

A plant genome assembly represents the complete genomic sequence of a plant species, which is assembled into chromosomes and other organelles by using DNA fragments that are obtained from different types of sequencing technology.

(Thomas) Martin Embley is a professor at Newcastle University who has made contributions to our understanding of the origin of eukaryotes and the evolution of organelles such as mitochondria, mitosomes and hydrogenosomes, that are found in parasitic protists.

<span class="mw-page-title-main">MNase-seq</span> Method used to analyse protein interactions with DNA

MNase-seq, short for micrococcal nuclease digestion with deep sequencing, is a molecular biological technique that was first pioneered in 2006 to measure nucleosome occupancy in the C. elegans genome, and was subsequently applied to the human genome in 2008. Though, the term ‘MNase-seq’ had not been coined until a year later, in 2009. Briefly, this technique relies on the use of the non-specific endo-exonuclease micrococcal nuclease, an enzyme derived from the bacteria Staphylococcus aureus, to bind and cleave protein-unbound regions of DNA on chromatin. DNA bound to histones or other chromatin-bound proteins may remain undigested. The uncut DNA is then purified from the proteins and sequenced through one or more of the various Next-Generation sequencing methods.

References

  1. 1 2 Douglas S, Zauner S, Fraunholz M, et al. (April 2001). "The highly reduced genome of an enslaved algal nucleus". Nature . 410 (6832): 1091–6. Bibcode:2001Natur.410.1091D. doi: 10.1038/35074092 . PMID   11323671.
  2. 1 2 Gardner MJ, Hall N, Fung E, et al. (October 2002). "Genome sequence of the human malaria parasite Plasmodium falciparum". Nature . 419 (6906): 498–511. Bibcode:2002Natur.419..498G. doi:10.1038/nature01097. PMC   3836256 . PMID   12368864.
  3. 1 2 Carlton JM, Angiuoli SV, Suh BB, et al. (October 2002). "Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii". Nature . 419 (6906): 512–9. Bibcode:2002Natur.419..512C. doi: 10.1038/nature01099 . PMID   12368865.
  4. 1 2 Xu P, Widmer G, Wang Y, et al. (October 2004). "The genome of Cryptosporidium hominis". Nature . 431 (7012): 1107–12. Bibcode:2004Natur.431.1107X. doi: 10.1038/nature02977 . PMID   15510150.
  5. 1 2 Abrahamsen MS, Templeton TJ, Enomoto S, et al. (April 2004). "Complete genome sequence of the apicomplexan, Cryptosporidium parvum". Science . 304 (5669): 441–5. Bibcode:2004Sci...304..441A. doi:10.1126/science.1094786. PMID   15044751. S2CID   26434820.
  6. 1 2 Armbrust EV, Berges JA, Bowler C, et al. (October 2004). "The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism". Science . 306 (5693): 79–86. Bibcode:2004Sci...306...79A. CiteSeerX   10.1.1.690.4884 . doi:10.1126/science.1101156. PMID   15459382. S2CID   8593895.
  7. 1 2 El-Sayed NM, Myler P, Bartholomeu DC, et al. (July 2005). "The Genome Sequence of Trypanosoma cruzi, Etiologic Agent of Chagas Disease". Science . 309 (5733): 409–415. Bibcode:2005Sci...309..409E. doi:10.1126/science.1112631. hdl: 11336/80500 . PMID   16020725. S2CID   3830267.
  8. 1 2 Berriman M, Ghedin E, Hertz-Fowler CH, et al. (July 2005). "The genome of the African trypanosome Trypanosoma brucei". Science . 309 (5733): 416–422. Bibcode:2005Sci...309..416B. doi:10.1126/science.1112642. PMID   16020726. S2CID   18649858.
  9. 1 2 Ivens AC, Peacock CS, Worthey EA, et al. (July 2005). "The genome of the kinetoplastid parasite, Leishmania major". Science . 309 (5733): 436–442. Bibcode:2005Sci...309..436I. doi:10.1126/science.1112680. PMC   1470643 . PMID   16020728.
  10. "TAIR - Genome Assembly".
  11. 1 2 The Arabidopsis Genome Initiative (December 2000). "Analysis of the genome sequence of the flowering plant Arabidopsis thaliana". Nature . 408 (6814): 796–815. Bibcode:2000Natur.408..796T. doi: 10.1038/35048692 . PMID   11130711.
  12. Ensembl entry
  13. Arabidopsis Genome Initiative Archived 2006-02-07 at the Wayback Machine
  14. 1 2 Matsuzaki M, Misumi O, Shin-I T, et al. (April 2004). "Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D". Nature . 428 (6983): 653–7. Bibcode:2004Natur.428..653M. doi: 10.1038/nature02398 . PMID   15071595.
  15. 1 2 Goff SA, Ricke D, Lan TH, et al. (April 2002). "A draft sequence of the rice genome (Oryza sativa L. ssp. japonica)". Science . 296 (5565): 92–100. Bibcode:2002Sci...296...92G. doi:10.1126/science.1068275. PMID   11935018. S2CID   2960202.
  16. Derelle E, Ferraz C, Rombauts S, et al. (August 2006). "Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features". PNAS. 103 (31): 11647–52. Bibcode:2006PNAS..10311647D. doi: 10.1073/pnas.0604795103 . PMC   1544224 . PMID   16868079.
  17. 1 2 Tuskan GA, Difazio S, Jansson S, et al. (September 2006). "The genome of black cottonwood, Populus trichocarpa (Torr. & Gray)". Science . 313 (5793): 1596–604. Bibcode:2006Sci...313.1596T. doi:10.1126/science.1128691. PMID   16973872. S2CID   7717980.
  18. 1 2 Goffeau A, Barrell BG, Bussey H, et al. (October 1996). "Life with 6000 genes". Science . 274 (5287): 546, 563–7. Bibcode:1996Sci...274..546G. doi:10.1126/science.274.5287.546. PMID   8849441. S2CID   16763139.
  19. International Collaboration for the Yeast Genome Sequencing Archived 2007-09-27 at the Wayback Machine
  20. 1 2 Katinka MD, Duprat S, Cornillot E, et al. (November 2001). "Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi". Nature . 414 (6862): 450–3. Bibcode:2001Natur.414..450K. doi: 10.1038/35106579 . PMID   11719806.
  21. 1 2 Wood V, Gwilliam R, Rajandream MA, et al. (February 2002). "The genome sequence of Schizosaccharomyces pombe". Nature . 415 (6874): 871–80. doi: 10.1038/nature724 . PMID   11859360.
  22. 1 2 Galagan JE, Calvo SE, Borkovich KA, et al. (April 2003). "The genome sequence of the filamentous fungus Neurospora crassa". Nature . 422 (6934): 859–68. Bibcode:2003Natur.422..859G. doi: 10.1038/nature01554 . PMID   12712197.
  23. 1 2 Martinez, Diego; Larrondo, Luis F; Putnam, Nik; Gelpke, Maarten D Sollewijn; Huang, Katherine; Chapman, Jarrod; Helfenbein, Kevin G; Ramaiya, Preethi; et al. (2004). "Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78". Nature Biotechnology. 22 (6): 695–700. doi: 10.1038/nbt967 . PMID   15122302.
  24. 1 2 C. elegans Sequencing Consortium (December 1998). "Genome sequence of the nematode C. elegans: a platform for investigating biology". Science . 282 (5396): 2012–8. Bibcode:1998Sci...282.2012.. doi:10.1126/science.282.5396.2012. PMID   9851916.
  25. 1 2 Adams MD, Celniker SE, Holt RA, et al. (March 2000). "The genome sequence of Drosophila melanogaster". Science . 287 (5461): 2185–95. Bibcode:2000Sci...287.2185.. CiteSeerX   10.1.1.549.8639 . doi:10.1126/science.287.5461.2185. PMID   10731132.
  26. 1 2 Holt RA, Subramanian GM, Halpern A, et al. (October 2002). "The genome sequence of the malaria mosquito Anopheles gambiae". Science . 298 (5591): 129–49. Bibcode:2002Sci...298..129H. CiteSeerX   10.1.1.149.9058 . doi:10.1126/science.1076181. PMID   12364791. S2CID   4512225.H
  27. International Fugu Genome Consortium. Forth Genome Assembly Archived 2012-02-05 at the Wayback Machine
  28. International Fugu Genome Consortium Archived 2012-02-05 at the Wayback Machine
  29. Aparicio S, Chapman J, Stupka E, et al. (August 2002). "Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes". Science . 297 (5585): 1301–10. Bibcode:2002Sci...297.1301A. doi:10.1126/science.1072104. PMID   12142439. S2CID   10310355.
  30. Human Genome Sequencing Consortium, International (October 2004). "Finishing the euchromatic sequence of the human genome". Nature . 431 (7011): 931–45. Bibcode:2004Natur.431..931H. doi: 10.1038/nature03001 . PMID   15496913.
  31. McPherson JD, Marra M, Hillier L, et al. (February 2001). "A physical map of the human genome". Nature . 409 (6822): 934–41. Bibcode:2001Natur.409..934M. doi: 10.1038/35057157 . PMID   11237014.
  32. Venter JC, Adams MD, Myers EW, et al. (February 2001). "The sequence of the human genome". Science . 291 (5507): 1304–51. Bibcode:2001Sci...291.1304V. doi:10.1126/science.1058040. PMID   11181995.
  33. Gregory SG, Barlow KF, McLay KE, et al. (May 2006). "The DNA sequence and biological annotation of human chromosome 1". Nature . 441 (7091): 315–21. Bibcode:2006Natur.441..315G. doi: 10.1038/nature04727 . PMID   16710414.