List of sequenced eukaryotic genomes

Last updated

Saccharomyces cerevisiae was the first eukaryotic organism to have its complete genome sequence determined. S cerevisiae under DIC microscopy.jpg
Saccharomyces cerevisiae was the first eukaryotic organism to have its complete genome sequence determined.

This list of "sequenced" eukaryotic genomes contains all the eukaryotes known to have publicly available complete nuclear and organelle genome sequences that have been sequenced, assembled, annotated and published; draft genomes are not included, nor are organelle-only sequences.

Contents

DNA was first sequenced in 1977. The first free-living organism to have its genome completely sequenced was the bacterium Haemophilus influenzae , in 1995. In 1996 Saccharomyces cerevisiae (baker's yeast) was the first eukaryote genome sequence to be released and in 1998 the first genome sequence for a multicellular eukaryote, Caenorhabditis elegans , was released.

Protists

Following are the nine earliest sequenced genomes of protists. For a more complete list, see the List of sequenced protist genomes.

OrganismTypeRelevanceGenome sizeNumber of genes predictedOrganizationYear of completion
Guillardia theta Cryptomonad Model organism0.551 Mb
(nucleomorph genome only)
465, [1] 513, 598 (UniProt)Canadian Institute of Advanced Research, Philipps-University Marburg and the University of British Columbia 2001 [1]
Plasmodium falciparum
Clone:3D7
Apicomplexan Human pathogen (malaria)22.9 Mb5,268 [2] Malaria Genome Project Consortium2002 [2]
Plasmodium yoelii yoelii
Strain:17XNL
Apicomplexan Rodent pathogen (malaria)23.1 Mb5,878 [3] TIGR and NMRC2002 [3]
Cryptosporidium hominis
Strain:TU502
Apicomplexan Human pathogen10.4 Mb3,994 [4] Virginia Commonwealth University 2004 [4]
Cryptosporidium parvum
C- or genotype 2 isolate
Apicomplexan Human pathogen16.5 Mb3,807 [5] UCSF and University of Minnesota2004 [5]
Thalassiosira pseudonana
Strain:CCMP 1335
Diatom Model organism34.5 Mb11,242 [6] Joint Genome Institute and the University of Washington 2004 [6]
Trypanosoma cruzi
Strain:CL-Brener
Kinetoplastid Human Pathogen67 Mb22,570 [7] The Institute for Genome Research (TIGR) and Karolinska Institutet (KI) and Seattle Biomedical Research Institute (SBRI)2005 [7]
Trypanosoma brucei
Clone:TREU 927/4
Kinetoplastid Human Pathogen26 Mb9,068 [8] Wellcome Trust Sanger Institute and The Institute for Genome Research (TIGR)2005 [8]
Leishmania major
Strain: Friedlin
Kinetoplastid Human Pathogen32.8 Mb8,272 [9] Wellcome Trust Sanger Institute and Seattle Biomedical Research Institute (SBRI)2005 [9]

Plants

Following are the five earliest sequenced genomes of plants. For a more complete list, see the List of sequenced plant genomes.

OrganismTypeRelevanceGenome sizeNumber of chromosomesNumber of genes predictedOrganizationYear of completion
Arabidopsis thaliana
Ecotype:Columbia
Wild mustard Thale Cress Model plant135 Mb [10] 525,498, [11] 27,400, [12] 31,670 (UniProt)Arabidopsis Genome Initiative [13] 2000 [11]
Cyanidioschyzon merolae
Strain:10D
Red algae Simple eukaryote 16.5 Mb205,331 [14] University of Tokyo, Rikkyo University, Saitama University and Kumamoto University2004 [14]
Oryza sativa
ssp indica
Rice Crop and model organism420 Mb1232-50,000 [15] Beijing Genomics Institute, Zhejiang University and the Chinese Academy of Sciences2002 [15]
Ostreococcus tauri Green algae Simple eukaryote, small genome12.6 Mb7,969 (UniProt)Laboratoire Arago2006 [16]
Populus trichocarpa Balsam poplar or Black Cottonwood Carbon sequestration, model tree, commercial use (timber), and comparison to A. thaliana550 Mb1945,555 [17] The International Poplar Genome Consortium2006 [17]

Fungi

Following are the five earliest sequenced genomes of fungi. For a more complete list, see the List of sequenced fungi genomes.

OrganismTypeRelevanceGenome sizeNumber of genes predictedOrganizationYear of completion
Saccharomyces cerevisiae
Strain:S288C
Saccharomycetes Baker's Yeast; Model eukaryote12.1 Mb6,294 [18] International Collaboration for the Yeast Genome Sequencing [19] 1996 [18]
Encephalitozoon cuniculi Microsporidium Human pathogen2.9 Mb1,997 [20] Genoscope and Université Blaise Pascal 2001 [20]
Schizosaccharomyces pombe
Strain:972h-
Schizosaccharomycetes Model eukaryote14 Mb4,824 [21] Sanger Institute and Cold Spring Harbor Laboratory 2002 [21]
Neurospora crassa Sordariomycetes Model eukaryote40 Mb10,082 [22] Broad Institute, Oregon Health and Science University, University of Kentucky, and the University of Kansas 2003 [22]
Phanerochaete chrysosporium
Strain:RP78
Agaricomycetes Wood rotting fungus, use in mycoremediation 30 Mb11,777 [23] Joint Genome Institute 2004 [23]

Animals

Following are the five earliest sequenced genomes of animals. For a more complete list, see the List of sequenced animal genomes.

OrganismTypeRelevanceGenome sizeNumber of genes predictedOrganizationYear of completion
Caenorhabditis elegans
Strain:Bristol N2
Nematode Model animal100 Mb19,000 [24] Washington University and the Sanger Institute1998 [24]
Drosophila melanogaster Fruit flyModel animal165 Mb13,600 [25] Celera, UC Berkeley, Baylor College of Medicine, European DGP2000 [25]
Anopheles gambiae
Strain: PEST
Mosquito Vector of malaria 278 Mb 13,683 [26] Celera Genomics and Genoscope 2002 [26]
Takifugu rubripes Puffer fish Vertebrate with small genome390 Mb22–29,000 [27] International Fugu Genome Consortium [28] 2002 [29]
Homo sapiens Human3.2 Gb [30] 18,826 (CCDS consortium) Human Genome Project Consortium and Celera GenomicsDraft 2001 [31] [32]
Complete 2006 [33]

See also

Related Research Articles

<span class="mw-page-title-main">Genome</span> All genetic material of an organism

In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.

In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

<span class="mw-page-title-main">Comparative genomics</span>

Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural landmarks. In this branch of genomics, whole or large parts of genomes resulting from genome projects are compared to study basic biological similarities and differences as well as evolutionary relationships between organisms. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, comparative genomic approaches start with making some form of alignment of genome sequences and looking for orthologous sequences in the aligned genomes and checking to what extent those sequences are conserved. Based on these, genome and molecular evolution are inferred and this may in turn be put in the context of, for example, phenotypic evolution or population genetics.

<span class="mw-page-title-main">Viridiplantae</span> Clade of archaeplastids including green algae and the land plants

Viridiplantae constitute a clade of eukaryotic organisms that comprises approximately 450,000–500,000 species that play important roles in both terrestrial and aquatic ecosystems. They include the green algae, which are primarily aquatic, and the land plants (embryophytes), which emerged from within them. Green algae traditionally excludes the land plants, rendering them a paraphyletic group. However it is accurate to think of land plants as a kind of alga. Since the realization that the embryophytes emerged from within the green algae, some authors are starting to include them. They have cells with cellulose in their cell walls, and primary chloroplasts derived from endosymbiosis with cyanobacteria that contain chlorophylls a and b and lack phycobilins. Corroborating this, a basal phagotroph archaeplastida group has been found in the Rhodelphydia.

The Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) was established by Richard A. Gibbs in 1996 when Baylor College of Medicine was chosen as one of six worldwide sites to complete the final phase of the international Human Genome Project. Gibbs is the current director of the BCM-HGSC.

Gerald Mayer Rubin is an American biologist, notable for pioneering the use of transposable P elements in genetics, and for leading the public project to sequence the Drosophila melanogaster genome. Related to his genomics work, Rubin's lab is notable for development of genetic and genomics tools and studies of signal transduction and gene regulation. Rubin also serves as a vice president of the Howard Hughes Medical Institute and executive director of the Janelia Research Campus.

<span class="mw-page-title-main">UNKL</span> Protein-coding gene in the species Homo sapiens

RING finger protein unkempt-like is a protein that in humans is encoded by the UNKL gene.

<span class="mw-page-title-main">Whole genome sequencing</span> Determining nearly the entirety of the DNA sequence of an organisms genome at a single time

Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast.

Cancer genome sequencing is the whole genome sequencing of a single, homogeneous or heterogeneous group of cancer cells. It is a biochemical laboratory method for the characterization and identification of the DNA or RNA sequences of cancer cell(s).

<span class="mw-page-title-main">Reference genome</span>

A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead a reference provides a haploid mosaic of different DNA sequences from each donor. For example, one of the most recent human reference genomes, assembly GRCh38/hg38, is derived from >60 genomic clone libraries. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser.

Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.

Arabidopsis thaliana is a first class model organism and the single most important species for fundamental research in plant molecular genetics.

A plant genome assembly represents the complete genomic sequence of a plant species, which is assembled into chromosomes and other organelles by using DNA fragments that are obtained from different types of sequencing technology.

The G-value paradox arises from the lack of correlation between the number of protein-coding genes among eukaryotes and their relative biological complexity. The microscopic nematode Caenorhabditis elegans, for example, is composed of only a thousand cells but has about the same number of genes as a human. Researchers suggest resolution of the paradox may lie in mechanisms such as alternative splicing and complex gene regulation that make the genes of humans and other complex eukaryotes relatively more productive.

<span class="mw-page-title-main">MNase-seq</span> Sk kasid Youtuber

MNase-seq, short for micrococcal nuclease digestion with deep sequencing, is a molecular biological technique that was first pioneered in 2006 to measure nucleosome occupancy in the C. elegans genome, and was subsequently applied to the human genome in 2008. Though, the term ‘MNase-seq’ had not been coined until a year later, in 2009. Briefly, this technique relies on the use of the non-specific endo-exonuclease micrococcal nuclease, an enzyme derived from the bacteria Staphylococcus aureus, to bind and cleave protein-unbound regions of DNA on chromatin. DNA bound to histones or other chromatin-bound proteins may remain undigested. The uncut DNA is then purified from the proteins and sequenced through one or more of the various Next-Generation sequencing methods.

References

  1. 1 2 Douglas S, Zauner S, Fraunholz M, et al. (April 2001). "The highly reduced genome of an enslaved algal nucleus". Nature . 410 (6832): 1091–6. Bibcode:2001Natur.410.1091D. doi: 10.1038/35074092 . PMID   11323671.
  2. 1 2 Gardner MJ, Hall N, Fung E, et al. (October 2002). "Genome sequence of the human malaria parasite Plasmodium falciparum". Nature . 419 (6906): 498–511. Bibcode:2002Natur.419..498G. doi:10.1038/nature01097. PMC   3836256 . PMID   12368864.
  3. 1 2 Carlton JM, Angiuoli SV, Suh BB, et al. (October 2002). "Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii". Nature . 419 (6906): 512–9. Bibcode:2002Natur.419..512C. doi: 10.1038/nature01099 . PMID   12368865.
  4. 1 2 Xu P, Widmer G, Wang Y, et al. (October 2004). "The genome of Cryptosporidium hominis". Nature . 431 (7012): 1107–12. Bibcode:2004Natur.431.1107X. doi: 10.1038/nature02977 . PMID   15510150.
  5. 1 2 Abrahamsen MS, Templeton TJ, Enomoto S, et al. (April 2004). "Complete genome sequence of the apicomplexan, Cryptosporidium parvum". Science . 304 (5669): 441–5. Bibcode:2004Sci...304..441A. doi:10.1126/science.1094786. PMID   15044751. S2CID   26434820.
  6. 1 2 Armbrust EV, Berges JA, Bowler C, et al. (October 2004). "The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism". Science . 306 (5693): 79–86. Bibcode:2004Sci...306...79A. CiteSeerX   10.1.1.690.4884 . doi:10.1126/science.1101156. PMID   15459382. S2CID   8593895.
  7. 1 2 El-Sayed NM, Myler P, Bartholomeu DC, et al. (July 2005). "The Genome Sequence of Trypanosoma cruzi, Etiologic Agent of Chagas Disease". Science . 309 (5733): 409–415. doi:10.1126/science.1112631. hdl: 11336/80500 . PMID   16020725. S2CID   3830267.
  8. 1 2 Berriman M, Ghedin E, Hertz-Fowler CH, et al. (July 2005). "The genome of the African trypanosome Trypanosoma brucei". Science . 309 (5733): 416–422. doi:10.1126/science.1112642. PMID   16020726. S2CID   18649858.
  9. 1 2 Ivens AC, Peacock CS, Worthey EA, et al. (July 2005). "The genome of the kinetoplastid parasite, Leishmania major". Science . 309 (5733): 436–442. doi:10.1126/science.1112680. PMC   1470643 . PMID   16020728.
  10. "TAIR - Genome Assembly".
  11. 1 2 The Arabidopsis Genome Initiative (December 2000). "Analysis of the genome sequence of the flowering plant Arabidopsis thaliana". Nature . 408 (6814): 796–815. Bibcode:2000Natur.408..796T. doi: 10.1038/35048692 . PMID   11130711.
  12. Ensembl entry
  13. Arabidopsis Genome Initiative Archived 2006-02-07 at the Wayback Machine
  14. 1 2 Matsuzaki M, Misumi O, Shin-I T, et al. (April 2004). "Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D". Nature . 428 (6983): 653–7. Bibcode:2004Natur.428..653M. doi: 10.1038/nature02398 . PMID   15071595.
  15. 1 2 Goff SA, Ricke D, Lan TH, et al. (April 2002). "A draft sequence of the rice genome (Oryza sativa L. ssp. japonica)". Science . 296 (5565): 92–100. Bibcode:2002Sci...296...92G. doi:10.1126/science.1068275. PMID   11935018. S2CID   2960202.
  16. Derelle E, Ferraz C, Rombauts S, et al. (August 2006). "Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features". PNAS. 103 (31): 11647–52. Bibcode:2006PNAS..10311647D. doi: 10.1073/pnas.0604795103 . PMC   1544224 . PMID   16868079.
  17. 1 2 Tuskan GA, Difazio S, Jansson S, et al. (September 2006). "The genome of black cottonwood, Populus trichocarpa (Torr. & Gray)". Science . 313 (5793): 1596–604. Bibcode:2006Sci...313.1596T. doi:10.1126/science.1128691. PMID   16973872. S2CID   7717980.
  18. 1 2 Goffeau A, Barrell BG, Bussey H, et al. (October 1996). "Life with 6000 genes". Science . 274 (5287): 546, 563–7. Bibcode:1996Sci...274..546G. doi:10.1126/science.274.5287.546. PMID   8849441. S2CID   16763139.
  19. International Collaboration for the Yeast Genome Sequencing Archived 2007-09-27 at the Wayback Machine
  20. 1 2 Katinka MD, Duprat S, Cornillot E, et al. (November 2001). "Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi". Nature . 414 (6862): 450–3. Bibcode:2001Natur.414..450K. doi: 10.1038/35106579 . PMID   11719806.
  21. 1 2 Wood V, Gwilliam R, Rajandream MA, et al. (February 2002). "The genome sequence of Schizosaccharomyces pombe". Nature . 415 (6874): 871–80. doi: 10.1038/nature724 . PMID   11859360.
  22. 1 2 Galagan JE, Calvo SE, Borkovich KA, et al. (April 2003). "The genome sequence of the filamentous fungus Neurospora crassa". Nature . 422 (6934): 859–68. Bibcode:2003Natur.422..859G. doi: 10.1038/nature01554 . PMID   12712197.
  23. 1 2 Martinez, Diego; Larrondo, Luis F; Putnam, Nik; Gelpke, Maarten D Sollewijn; Huang, Katherine; Chapman, Jarrod; Helfenbein, Kevin G; Ramaiya, Preethi; et al. (2004). "Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78". Nature Biotechnology. 22 (6): 695–700. doi: 10.1038/nbt967 . PMID   15122302.
  24. 1 2 C. elegans Sequencing Consortium (December 1998). "Genome sequence of the nematode C. elegans: a platform for investigating biology". Science . 282 (5396): 2012–8. Bibcode:1998Sci...282.2012.. doi:10.1126/science.282.5396.2012. PMID   9851916.
  25. 1 2 Adams MD, Celniker SE, Holt RA, et al. (March 2000). "The genome sequence of Drosophila melanogaster". Science . 287 (5461): 2185–95. Bibcode:2000Sci...287.2185.. CiteSeerX   10.1.1.549.8639 . doi:10.1126/science.287.5461.2185. PMID   10731132.
  26. 1 2 Holt RA, Subramanian GM, Halpern A, et al. (October 2002). "The genome sequence of the malaria mosquito Anopheles gambiae". Science . 298 (5591): 129–49. Bibcode:2002Sci...298..129H. CiteSeerX   10.1.1.149.9058 . doi:10.1126/science.1076181. PMID   12364791. S2CID   4512225.H
  27. International Fugu Genome Consortium. Forth Genome Assembly Archived 2012-02-05 at the Wayback Machine
  28. International Fugu Genome Consortium Archived 2012-02-05 at the Wayback Machine
  29. Aparicio S, Chapman J, Stupka E, et al. (August 2002). "Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes". Science . 297 (5585): 1301–10. Bibcode:2002Sci...297.1301A. doi:10.1126/science.1072104. PMID   12142439. S2CID   10310355.
  30. Human Genome Sequencing Consortium, International (October 2004). "Finishing the euchromatic sequence of the human genome". Nature . 431 (7011): 931–45. Bibcode:2004Natur.431..931H. doi: 10.1038/nature03001 . PMID   15496913.
  31. McPherson JD, Marra M, Hillier L, et al. (February 2001). "A physical map of the human genome". Nature . 409 (6822): 934–41. Bibcode:2001Natur.409..934M. doi: 10.1038/35057157 . PMID   11237014.
  32. Venter JC, Adams MD, Myers EW, et al. (February 2001). "The sequence of the human genome". Science . 291 (5507): 1304–51. Bibcode:2001Sci...291.1304V. doi:10.1126/science.1058040. PMID   11181995.
  33. Gregory SG, Barlow KF, McLay KE, et al. (May 2006). "The DNA sequence and biological annotation of human chromosome 1". Nature . 441 (7091): 315–21. Bibcode:2006Natur.441..315G. doi: 10.1038/nature04727 . PMID   16710414.