This list of "sequenced" eukaryotic genomes contains all the eukaryotes known to have publicly available complete nuclear and organelle genome sequences that have been sequenced, assembled, annotated and published; draft genomes are not included, nor are organelle-only sequences.
DNA was first sequenced in 1977. The first free-living organism to have its genome completely sequenced was the bacterium Haemophilus influenzae , in 1995. In 1996 Saccharomyces cerevisiae (baker's yeast) was the first eukaryote genome sequence to be released and in 1998 the first genome sequence for a multicellular eukaryote, Caenorhabditis elegans , was released.
Following are the nine earliest sequenced genomes of protists. For a more complete list, see the List of sequenced protist genomes.
Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
---|---|---|---|---|---|---|
Guillardia theta | Cryptomonad | Model organism | 0.551 Mb (nucleomorph genome only) | 465, [1] 513, 598 (UniProt) | Canadian Institute of Advanced Research, Philipps-University Marburg and the University of British Columbia | 2001 [1] |
Plasmodium falciparum Clone:3D7 | Apicomplexan | Human pathogen (malaria) | 22.9 Mb | 5,268 [2] | Malaria Genome Project Consortium | 2002 [2] |
Plasmodium yoelii yoelii Strain:17XNL | Apicomplexan | Rodent pathogen (malaria) | 23.1 Mb | 5,878 [3] | TIGR and NMRC | 2002 [3] |
Cryptosporidium hominis Strain:TU502 | Apicomplexan | Human pathogen | 10.4 Mb | 3,994 [4] | Virginia Commonwealth University | 2004 [4] |
Cryptosporidium parvum C- or genotype 2 isolate | Apicomplexan | Human pathogen | 16.5 Mb | 3,807 [5] | UCSF and University of Minnesota | 2004 [5] |
Thalassiosira pseudonana Strain:CCMP 1335 | Diatom | Model organism | 34.5 Mb | 11,242 [6] | Joint Genome Institute and the University of Washington | 2004 [6] |
Trypanosoma cruzi Strain:CL-Brener | Kinetoplastid | Human Pathogen | 67 Mb | 22,570 [7] | The Institute for Genome Research (TIGR) and Karolinska Institutet (KI) and Seattle Biomedical Research Institute (SBRI) | 2005 [7] |
Trypanosoma brucei Clone:TREU 927/4 | Kinetoplastid | Human Pathogen | 26 Mb | 9,068 [8] | Wellcome Trust Sanger Institute and The Institute for Genome Research (TIGR) | 2005 [8] |
Leishmania major Strain: Friedlin | Kinetoplastid | Human Pathogen | 32.8 Mb | 8,272 [9] | Wellcome Trust Sanger Institute and Seattle Biomedical Research Institute (SBRI) | 2005 [9] |
Following are the five earliest sequenced genomes of plants. For a more complete list, see the List of sequenced plant genomes.
Organism | Type | Relevance | Genome size | Number of chromosomes | Number of genes predicted | Organization | Year of completion |
---|---|---|---|---|---|---|---|
Arabidopsis thaliana Ecotype:Columbia | Wild mustard Thale Cress | Model plant | 135 Mb [10] | 5 | 25,498, [11] 27,400, [12] 31,670 (UniProt) | Arabidopsis Genome Initiative [13] | 2000 [11] |
Cyanidioschyzon merolae Strain:10D | Red algae | Simple eukaryote | 16.5 Mb | 20 | 5,331 [14] | University of Tokyo, Rikkyo University, Saitama University and Kumamoto University | 2004 [14] |
Oryza sativa ssp indica | Rice | Crop and model organism | 420 Mb | 12 | 32-50,000 [15] | Beijing Genomics Institute, Zhejiang University and the Chinese Academy of Sciences | 2002 [15] |
Ostreococcus tauri | Green algae | Simple eukaryote, small genome | 12.6 Mb | 7,969 (UniProt) | Laboratoire Arago | 2006 [16] | |
Populus trichocarpa | Balsam poplar or Black Cottonwood | Carbon sequestration, model tree, commercial use (timber), and comparison to A. thaliana | 550 Mb | 19 | 45,555 [17] | The International Poplar Genome Consortium | 2006 [17] |
Following are the five earliest sequenced genomes of fungi. For a more complete list, see the List of sequenced fungi genomes.
Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
---|---|---|---|---|---|---|
Saccharomyces cerevisiae Strain:S288C | Saccharomycetes | Baker's Yeast; Model eukaryote | 12.1 Mb | 6,294 [18] | International Collaboration for the Yeast Genome Sequencing [19] | 1996 [18] |
Encephalitozoon cuniculi | Microsporidium | Human pathogen | 2.9 Mb | 1,997 [20] | Genoscope and Université Blaise Pascal | 2001 [20] |
Schizosaccharomyces pombe Strain:972h- | Schizosaccharomycetes | Model eukaryote | 14 Mb | 4,824 [21] | Sanger Institute and Cold Spring Harbor Laboratory | 2002 [21] |
Neurospora crassa | Sordariomycetes | Model eukaryote | 40 Mb | 10,082 [22] | Broad Institute, Oregon Health and Science University, University of Kentucky, and the University of Kansas | 2003 [22] |
Phanerochaete chrysosporium Strain:RP78 | Agaricomycetes | Wood rotting fungus, use in mycoremediation | 30 Mb | 11,777 [23] | Joint Genome Institute | 2004 [23] |
Following are the five earliest sequenced genomes of animals. For a more complete list, see the List of sequenced animal genomes.
Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
---|---|---|---|---|---|---|
Caenorhabditis elegans Strain:Bristol N2 | Nematode | Model animal | 100 Mb | 19,000 [24] | Washington University and the Sanger Institute | 1998 [24] |
Drosophila melanogaster | Fruit fly | Model animal | 165 Mb | 13,600 [25] | Celera, UC Berkeley, Baylor College of Medicine, European DGP | 2000 [25] |
Anopheles gambiae Strain: PEST | Mosquito | Vector of malaria | 278 Mb | 13,683 [26] | Celera Genomics and Genoscope | 2002 [26] |
Takifugu rubripes | Puffer fish | Vertebrate with small genome | 390 Mb | 22–29,000 [27] | International Fugu Genome Consortium [28] | 2002 [29] |
Homo sapiens | Human | 3.2 Gb [30] | 18,826 (CCDS consortium) | Human Genome Project Consortium and Celera Genomics | Draft 2001 [31] [32] Complete 2006 [33] |
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.
In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun.
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.
Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural landmarks. In this branch of genomics, whole or large parts of genomes resulting from genome projects are compared to study basic biological similarities and differences as well as evolutionary relationships between organisms. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, comparative genomic approaches start with making some form of alignment of genome sequences and looking for orthologous sequences in the aligned genomes and checking to what extent those sequences are conserved. Based on these, genome and molecular evolution are inferred and this may in turn be put in the context of, for example, phenotypic evolution or population genetics.
Viridiplantae constitute a clade of eukaryotic organisms that comprises approximately 450,000–500,000 species that play important roles in both terrestrial and aquatic ecosystems. They include the green algae, which are primarily aquatic, and the land plants (embryophytes), which emerged from within them. Green algae traditionally excludes the land plants, rendering them a paraphyletic group. However it is accurate to think of land plants as a kind of alga. Since the realization that the embryophytes emerged from within the green algae, some authors are starting to include them. They have cells with cellulose in their cell walls, and primary chloroplasts derived from endosymbiosis with cyanobacteria that contain chlorophylls a and b and lack phycobilins. Corroborating this, a basal phagotroph archaeplastida group has been found in the Rhodelphydia.
The Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) was established by Richard A. Gibbs in 1996 when Baylor College of Medicine was chosen as one of six worldwide sites to complete the final phase of the international Human Genome Project. Gibbs is the current director of the BCM-HGSC.
Gerald Mayer Rubin is an American biologist, notable for pioneering the use of transposable P elements in genetics, and for leading the public project to sequence the Drosophila melanogaster genome. Related to his genomics work, Rubin's lab is notable for development of genetic and genomics tools and studies of signal transduction and gene regulation. Rubin also serves as a vice president of the Howard Hughes Medical Institute and executive director of the Janelia Research Campus.
RING finger protein unkempt-like is a protein that in humans is encoded by the UNKL gene.
Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast.
Cancer genome sequencing is the whole genome sequencing of a single, homogeneous or heterogeneous group of cancer cells. It is a biochemical laboratory method for the characterization and identification of the DNA or RNA sequences of cancer cell(s).
A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead a reference provides a haploid mosaic of different DNA sequences from each donor. For example, one of the most recent human reference genomes, assembly GRCh38/hg38, is derived from >60 genomic clone libraries. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser.
Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.
Arabidopsis thaliana is a first class model organism and the single most important species for fundamental research in plant molecular genetics.
A plant genome assembly represents the complete genomic sequence of a plant species, which is assembled into chromosomes and other organelles by using DNA fragments that are obtained from different types of sequencing technology.
The G-value paradox arises from the lack of correlation between the number of protein-coding genes among eukaryotes and their relative biological complexity. The microscopic nematode Caenorhabditis elegans, for example, is composed of only a thousand cells but has about the same number of genes as a human. Researchers suggest resolution of the paradox may lie in mechanisms such as alternative splicing and complex gene regulation that make the genes of humans and other complex eukaryotes relatively more productive.
MNase-seq, short for micrococcal nuclease digestion with deep sequencing, is a molecular biological technique that was first pioneered in 2006 to measure nucleosome occupancy in the C. elegans genome, and was subsequently applied to the human genome in 2008. Though, the term ‘MNase-seq’ had not been coined until a year later, in 2009. Briefly, this technique relies on the use of the non-specific endo-exonuclease micrococcal nuclease, an enzyme derived from the bacteria Staphylococcus aureus, to bind and cleave protein-unbound regions of DNA on chromatin. DNA bound to histones or other chromatin-bound proteins may remain undigested. The uncut DNA is then purified from the proteins and sequenced through one or more of the various Next-Generation sequencing methods.