This list of "sequenced" eukaryotic genomes contains all the eukaryotes known to have publicly available complete nuclear and organelle genome sequences that have been sequenced, assembled, annotated and published; draft genomes are not included, nor are organelle-only sequences.
DNA was first sequenced in 1977. The first free-living organism to have its genome completely sequenced was the bacterium Haemophilus influenzae , in 1995. In 1996 Saccharomyces cerevisiae (baker's yeast) was the first eukaryote genome sequence to be released and in 1998 the first genome sequence for a multicellular eukaryote, Caenorhabditis elegans , was released.
Following are the nine earliest sequenced genomes of protists. For a more complete list, see the List of sequenced protist genomes.
Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
---|---|---|---|---|---|---|
Guillardia theta | Cryptomonad | Model organism | 0.551 Mb (nucleomorph genome only) | 465, [1] 513, 598 (UniProt) | Canadian Institute of Advanced Research, Philipps-University Marburg and the University of British Columbia | 2001 [1] |
Plasmodium falciparum Clone:3D7 | Apicomplexan | Human pathogen (malaria) | 22.9 Mb | 5,268 [2] | Malaria Genome Project Consortium | 2002 [2] |
Plasmodium yoelii yoelii Strain:17XNL | Apicomplexan | Rodent pathogen (malaria) | 23.1 Mb | 5,878 [3] | TIGR and NMRC | 2002 [3] |
Cryptosporidium hominis Strain:TU502 | Apicomplexan | Human pathogen | 10.4 Mb | 3,994 [4] | Virginia Commonwealth University | 2004 [4] |
Cryptosporidium parvum C- or genotype 2 isolate | Apicomplexan | Human pathogen | 16.5 Mb | 3,807 [5] | UCSF and University of Minnesota | 2004 [5] |
Thalassiosira pseudonana Strain:CCMP 1335 | Diatom | Model organism | 34.5 Mb | 11,242 [6] | Joint Genome Institute and the University of Washington | 2004 [6] |
Trypanosoma cruzi Strain:CL-Brener | Kinetoplastid | Human Pathogen | 67 Mb | 22,570 [7] | The Institute for Genome Research (TIGR) and Karolinska Institutet (KI) and Seattle Biomedical Research Institute (SBRI) | 2005 [7] |
Trypanosoma brucei Clone:TREU 927/4 | Kinetoplastid | Human Pathogen | 26 Mb | 9,068 [8] | Wellcome Trust Sanger Institute and The Institute for Genome Research (TIGR) | 2005 [8] |
Leishmania major Strain: Friedlin | Kinetoplastid | Human Pathogen | 32.8 Mb | 8,272 [9] | Wellcome Trust Sanger Institute and Seattle Biomedical Research Institute (SBRI) | 2005 [9] |
Following are the five earliest sequenced genomes of plants. For a more complete list, see the List of sequenced plant genomes.
Organism | Type | Relevance | Genome size | Number of chromosomes | Number of genes predicted | Organization | Year of completion |
---|---|---|---|---|---|---|---|
Arabidopsis thaliana Ecotype:Columbia | Wild mustard Thale Cress | Model plant | 135 Mb [10] | 5 | 25,498, [11] 27,400, [12] 31,670 (UniProt) | Arabidopsis Genome Initiative [13] | 2000 [11] |
Cyanidioschyzon merolae Strain:10D | Red algae | Simple eukaryote | 16.5 Mb | 20 | 5,331 [14] | University of Tokyo, Rikkyo University, Saitama University and Kumamoto University | 2004 [14] |
Oryza sativa ssp indica | Rice | Crop and model organism | 420 Mb | 12 | 32-50,000 [15] | Beijing Genomics Institute, Zhejiang University and the Chinese Academy of Sciences | 2002 [15] |
Ostreococcus tauri | Green algae | Simple eukaryote, small genome | 12.6 Mb | 7,969 (UniProt) | Laboratoire Arago | 2006 [16] | |
Populus trichocarpa | Balsam poplar or Black Cottonwood | Carbon sequestration, model tree, commercial use (timber), and comparison to A. thaliana | 550 Mb | 19 | 45,555 [17] | The International Poplar Genome Consortium | 2006 [17] |
Following are the five earliest sequenced genomes of fungi. For a more complete list, see the List of sequenced fungi genomes.
Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
---|---|---|---|---|---|---|
Saccharomyces cerevisiae Strain:S288C | Saccharomycetes | Baker's Yeast; Model eukaryote | 12.1 Mb | 6,294 [18] | International Collaboration for the Yeast Genome Sequencing [19] | 1996 [18] |
Encephalitozoon cuniculi | Microsporidium | Human pathogen | 2.9 Mb | 1,997 [20] | Genoscope and Université Blaise Pascal | 2001 [20] |
Schizosaccharomyces pombe Strain:972h- | Schizosaccharomycetes | Model eukaryote | 14 Mb | 4,824 [21] | Sanger Institute and Cold Spring Harbor Laboratory | 2002 [21] |
Neurospora crassa | Sordariomycetes | Model eukaryote | 40 Mb | 10,082 [22] | Broad Institute, Oregon Health and Science University, University of Kentucky, and the University of Kansas | 2003 [22] |
Phanerochaete chrysosporium Strain:RP78 | Agaricomycetes | Wood rotting fungus, use in mycoremediation | 30 Mb | 11,777 [23] | Joint Genome Institute | 2004 [23] |
Following are the five earliest sequenced genomes of animals. For a more complete list, see the List of sequenced animal genomes.
Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
---|---|---|---|---|---|---|
Caenorhabditis elegans Strain:Bristol N2 | Nematode | Model animal | 100 Mb | 19,000 [24] | Washington University and the Sanger Institute | 1998 [24] |
Drosophila melanogaster | Fruit fly | Model animal | 165 Mb | 13,600 [25] | Celera, UC Berkeley, Baylor College of Medicine, European DGP | 2000 [25] |
Anopheles gambiae Strain: PEST | Mosquito | Vector of malaria | 278 Mb | 13,683 [26] | Celera Genomics and Genoscope | 2002 [26] |
Takifugu rubripes | Puffer fish | Vertebrate with small genome | 390 Mb | 22–29,000 [27] | International Fugu Genome Consortium [28] | 2002 [29] |
Homo sapiens | Human | 3.2 Gb [30] | 18,826 (CCDS consortium) | Human Genome Project Consortium and Celera Genomics | Draft 2001 [31] [32] Complete 2006 [33] |
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.
Heterochromatin is a tightly packed form of DNA or condensed DNA, which comes in multiple varieties. These varieties lie on a continuum between the two extremes of constitutive heterochromatin and facultative heterochromatin. Both play a role in the expression of genes. Because it is tightly packed, it was thought to be inaccessible to polymerases and therefore not transcribed; however, according to Volpe et al. (2002), and many other papers since, much of this DNA is in fact transcribed, but it is continuously turned over via RNA-induced transcriptional silencing (RITS). Recent studies with electron microscopy and OsO4 staining reveal that the dense packing is not due to the chromatin.
Small RNA (sRNA) are polymeric RNA molecules that are less than 200 nucleotides in length, and are usually non-coding. RNA silencing is often a function of these molecules, with the most common and well-studied example being RNA interference (RNAi), in which endogenously expressed microRNA (miRNA) or exogenously derived small interfering RNA (siRNA) induces the degradation of complementary messenger RNA. Other classes of small RNA have been identified, including piwi-interacting RNA (piRNA) and its subspecies repeat associated small interfering RNA (rasiRNA). Small RNA "is unable to induce RNAi alone, and to accomplish the task it must form the core of the RNA–protein complex termed the RNA-induced silencing complex (RISC), specifically with Argonaute protein".
Comparative genomics is a branch of biological research that examines genome sequences across a spectrum of species, spanning from humans and mice to a diverse array of organisms from bacteria to chimpanzees. This large-scale holistic approach compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes. Comparison of whole genome sequences provides a highly detailed view of how organisms are related to each other at the gene level. By comparing whole genome sequences, researchers gain insights into genetic relationships between organisms and study evolutionary changes. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved or common among species, as well as genes that give unique characteristics of each organism. Moreover, these studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms.
The Chimpanzee Genome Project was an effort to determine the DNA sequence of the chimpanzee genome. Sequencing began in 2005 and by 2013 twenty-four individual chimpanzees had been sequenced. This project was folded into the Great Ape Genome Project.
Gerald Mayer Rubin is an American biologist, notable for pioneering the use of transposable P elements in genetics, and for leading the public project to sequence the Drosophila melanogaster genome. Related to his genomics work, Rubin's lab is notable for development of genetic and genomics tools and studies of signal transduction and gene regulation. Rubin also served as a vice president of the Howard Hughes Medical Institute (2003-2020) and founding executive director of its Janelia Research Campus.
Reticulon 4 receptor (RTN4R) also known as Nogo-66 Receptor (NgR) or Nogo receptor 1 is a protein which in humans is encoded by the RTN4R gene. This gene encodes the receptor for reticulon 4, oligodendrocytemyelin glycoprotein and myelin-associated glycoprotein. This receptor mediates axonal growth inhibition and may play a role in regulating axonal regeneration and plasticity in the adult central nervous system.
Serine/threonine-protein kinase Sgk2 is an enzyme that in humans is encoded by the SGK2 gene.
Whole genome sequencing (WGS) is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast.
Cancer genome sequencing is the whole genome sequencing of a single, homogeneous or heterogeneous group of cancer cells. It is a biochemical laboratory method for the characterization and identification of the DNA or RNA sequences of cancer cell(s).
A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead, a reference provides a haploid mosaic of different DNA sequences from each donor. For example, one of the most recent human reference genomes, assembly GRCh38/hg38, is derived from >60 genomic clone libraries. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser.
Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.
Arabidopsis thaliana is a first class model organism and the single most important species for fundamental research in plant molecular genetics.
A plant genome assembly represents the complete genomic sequence of a plant species, which is assembled into chromosomes and other organelles by using DNA fragments that are obtained from different types of sequencing technology.
(Thomas) Martin Embley is a professor at Newcastle University who has made contributions to our understanding of the origin of eukaryotes and the evolution of organelles such as mitochondria, mitosomes and hydrogenosomes, that are found in parasitic protists.
MNase-seq, short for micrococcal nuclease digestion with deep sequencing, is a molecular biological technique that was first pioneered in 2006 to measure nucleosome occupancy in the C. elegans genome, and was subsequently applied to the human genome in 2008. Though, the term ‘MNase-seq’ had not been coined until a year later, in 2009. Briefly, this technique relies on the use of the non-specific endo-exonuclease micrococcal nuclease, an enzyme derived from the bacteria Staphylococcus aureus, to bind and cleave protein-unbound regions of DNA on chromatin. DNA bound to histones or other chromatin-bound proteins may remain undigested. The uncut DNA is then purified from the proteins and sequenced through one or more of the various Next-Generation sequencing methods.