This list of sequenced protist genomes contains all the protist species known to have publicly available complete genome sequences that have been assembled, annotated and published; draft genomes are not included, nor are organelle only sequences.
Alveolata are a group of protists which includes the Ciliophora, Apicomplexa and Dinoflagellata. Members of this group are of particular interest to science as the cause of serious human and livestock diseases.
Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion | Assembly status | Links |
---|---|---|---|---|---|---|---|---|
Babesia bovis | Apicomplexan | Cattle pathogen | 8.2 Mb | 3,671 | 2007 [1] | |||
Breviolum minutim (Symbiodinium minutum; clade B1) | Dinoflagellate | Coral symbiont | 1.5 Gb | 47,014 | Okinawa Institute of Science and Technology | 2013 [2] | Draft | OIST Marine Genomics [3] |
Cladocopium goreaui (Symbiodinium goreaui; Clade C1) | Dinoflagellate | Coral symbiont | 1.19 Gb | 35,913 | Reef Future Genomics (ReFuGe) 2020/ University of Queensland | 2018 [4] | Draft | ReFuGe 2020 [5] |
Cladocopium C92 strain Y103 ( Symbiodinium sp. clade C; putative type C92) | Dinoflagellate | Foraminiferan symbiont | Unknown (assembly size 0.70 Gb) | 65,832 | Okinawa Institute of Science and Technology | 2018 [6] | Draft | OIST Marine Genomics [3] |
Cryptosporidium hominis Strain:TU502 | Apicomplexan | Human pathogen | 10.4 Mb | 3,994 [7] | Virginia Commonwealth University | 2004 [7] | ||
Cryptosporidium parvum C- or genotype 2 isolate | Apicomplexan | Human pathogen | 16.5 Mb | 3,807 [8] | UCSF and University of Minnesota | 2004 [8] | ||
Eimeria tenella Houghton strain | Apicomplexan | Intestinal parasite of domestic fowl | 55-60 Mb [9] | The Wellcome Trust Sanger Institute [10] | Available for download; [10] 2007 for Chr 1 [11] | |||
Fugacium kawagutii CS156=CCMP2468 ( Symbiodinium kawagutii; clade F1) | Dinoflagellate | Coral symbiont? | 1.07 Gb | 26,609 | Reef Future Genomics (ReFuGe) 2020 / University of Queensland | 2018 [4] | Draft | ReFuGe 2020 [5] |
Fugacium kawagutii CCMP2468 ( Symbiodinium kawagutii; clade F1) | Dinoflagellate | Coral symbiont? | 1.18 Gb | 36,850 | University of Connecticut / Xiamen University | 2015 [12] | Draft | S. kawagutii genome project [13] |
Neospora caninum | Apicomplexan | Pathogen for cattle and dogs | 62 Mb [14] | The Wellcome Trust Sanger Institute [15] | Available for download [15] | |||
Paramecium tetraurelia | Ciliate | Model organism | 72 Mb | 39,642 [16] | Genoscope | 2006 [16] | ||
Polarella glacialis CCMP1383 | Dinoflagellate | Psychrophile, Antarctic | 3.02 Gb (diploid), 1.48 Gbp (haploid) | 58,232 | University of Queensland | 2020 [17] | Draft | UQ eSpace [18] |
Polarella glacialis CCMP2088 | Dinoflagellate | Psychrophile, Arctic | 2.65 Gb (diploid), 1.30 Gbp (haploid) | 51,713 | University of Queensland | 2020 [17] | Draft | UQ eSpace [18] |
Plasmodium berghei ANKA | Apicomplexan | Rabbit malaria | 18.5 Mb [19] | 4,900; [19] 11,654 (UniProt) | ||||
Plasmodium chabaudi | Apicomplexan | Rodent malaria | 19.8 Mb [20] | 5,000 [20] | ||||
Plasmodium falciparum Clone:3D7 | Apicomplexan | Human pathogen (malaria) | 22.9 Mb | 5,268 [21] | Malaria Genome Project Consortium | 2002 [21] | ||
Plasmodium knowlesi | Apicomplexan | Primate pathogen (malaria) | 23.5 Mb | 5,188 [22] | 2008 [22] | |||
Plasmodium vivax | Apicomplexan | Human pathogen (malaria) | 26.8 Mb | 5,433 [23] | 2008 [23] | |||
Plasmodium yoelii yoelii Strain:17XNL | Apicomplexan | Rodent pathogen (malaria) | 23.1 Mb | 5,878 [24] | TIGR and NMRC | 2002 [24] | ||
Symbiodinium microadriaticum (clade A) | Dinoflagellate | Coral symbiont | 1.1 Gb | 49,109 | King Abdullah University of Science and Technology | 2016 [25] | Draft | Reef Genomics [26] |
Symbiodinium A3 strain Y106 ( Symbiodinium sp. clade A3) | Dinoflagellate | symbiont | Unknown (assembly size 0.77 Gb) | 69,018 | Okinawa Institute of Science and Technology | 2018 [6] | Draft | OIST Marine Genomics [3] |
Tetrahymena thermophila | Ciliate | Model organism | 104 Mb | 27,000 [27] | 2006 [27] | |||
Theileria annulata Ankara clone C9 | Apicomplexan | Cattle pathogen | 8.3 Mb | 3,792 | Sanger | 2005 [28] | ||
Theileria parva Strain:Muguga | Apicomplexan | Cattle pathogen (African east coast fever) | 8.3 Mb | 4,035 [29] | TIGR and the International Livestock Research Institute | 2005 [29] | ||
Toxoplasma gondii GT1, ME49, VEG strains | Apicomplexan | Mammal pathogen | 63 Mb (RefSeq) | 8,100 (UniProt) - 9,000 (EuPathDB) | J. Craig Venter Inst., TIGR, UPenn. | 2008 [30] |
Amoebozoa are a group of motile amoeboid protists, members of this group move or feed by means of temporary projections, called pseudopods. The best known member of this group is the slime mold, which has been studied for centuries; other members include the Archamoebae, Tubulinea and Flabellinia. Some Amoeboza cause disease.
Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
---|---|---|---|---|---|---|
Dictyostelium discoideum Strain:AX4 | Slime mold | Model organism | 34 Mb | 12,500 [31] | Consortium from University of Cologne, Baylor College of Medicine and the Sanger Centre | 2005 [31] |
Entamoeba histolytica HM1:IMSS | Parasitic protozoan | Human pathogen (amoebic dysentery) | 23.8 Mb | 9,938 [32] | TIGR, Sanger Institute and the London School of Hygiene and Tropical Medicine | 2005 [32] |
Polysphondylium pallidum Strain:PN500 | Slime mold | Model organism | 12,939, [33] 12,350 (UniProt) | Leibniz Institute for Age Research | 2009 [33] |
The Chromista are a group of protists that contains the algal phyla Heterokontophyta (stramenopiles), Haptophyta and Cryptophyta. Members of this group are mostly studied for evolutionary interest.
Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
---|---|---|---|---|---|---|
Albugo laibachii | Oomycete | Arabidopsis parasite, biotroph | 37 Mb [34] | 13,032 [34] | 2011 [34] | |
Aureococcus anophagefferens Strain:CCMP1984 | Pelagophyte | DOE Joint Genome Institute | 2011 [35] | |||
Bigelowiella natans | Chlorarachniophyte | Model organism | nucleomorph: 0.331 Mb nuclear: 95 Mb | nucleomorph: 373 [36] nuclear: >21,000 [37] | nucleomorph: Hall Institute Australia, Univ. Melbourne, Univ. BC nuclear: Dalhousie University, Halifax, Nova Scotia, Canada | 2006, [36] 2012 [37] |
Chroomonas mesostigmaticaCCMP1168 | Cryptophyta | 2012 [38] | ||||
Cryptomonas paramecium | Cryptophyta | 2010 [39] | ||||
Emiliania huxleyi CCMP1516 | Coccolithophore (phytoplankton) | 141.7 Mb [40] | 30,569 [40] | Joint Genome Institute | 2013 [40] | |
Emiliania huxleyi RCC1217 | Coccolithophore (phytoplankton) | Available for download [41] | ||||
Fragilariopsis cylindrus | Diatom | 61.1 Mb [42] | 21,066 [42] | Joint Genome Institute | 2017 [42] | |
Guillardia theta | Cryptomonad | Model organism | 0.551 Mb (nucleomorph genome only) 87 Mb (nuclear genome) | nucleomorph: 465 [43] 513, 598 (UniProt) nuclear: >21,000 [37] | nucleomorph: Canadian Institute of Advanced Research, Philipps-University Marburg and the University of British Columbia nuclear: Dalhousie University, Halifax, Nova Scotia, Canada | 2001, [43] 2012 [37] |
Hemiselmis andersenii CCMP7644 | Cryptomonad | Model organism | 0.572 Mb (nucleomorph genome only) | 472, [44] 502 (UniProt) | Canadian Institute of Advanced Research | 2007 [44] |
Hyaloperonospora arabidopsidis | Oomycete | obligate biotroph, Arabidopsis pathogen | WUGSC | 2010 [45] | ||
Nannochloropis gaditana Strain: CCMP526 | Eustigmatophyte | Lipid-producing, biotechnology applications | Virginia Bioinformatics Institute | 2012 [46] | ||
Phaeodactylum tricornutum Strain: CCAP1055/1 | Diatom | 27.4 Mb | 10,402 | Joint Genome Institute | 2008 [47] | |
Phytophthora infestans Strain:T30-4 | Oomycete | Great Famine of Ireland pathogen | Broad Institute | 2009 [48] | ||
Phytophthora ramorum | Oomycete | Sudden oak death pathogen | 65 Mb (7x) | 15,743 | Joint Genome Institute et al. | 2006 [49] |
Phytophthora sojae | Oomycete | Soybean pathogen | 95 Mb (9x) | 19,027 | Joint Genome Institute et al. | 2006 [49] |
Pseudo-nitzschia multiseries | Diatom | Joint Genome Institute | ||||
Plasmodiophora brassicae | Plasmodiophorid | Clubroot disease pathogen | 25.5 Mb | 9,730 | SLU Uppsala et al. | 2015 [50] |
Pythium ultimum | Oomycete | ubiquitous plant pathogen | 42.8 Mb | 15,290 | Michigan State University et al. | 2010 [51] |
Thalassiosira pseudonana Strain:CCMP 1335 | Diatom | 34.5 Mb | 11,242 [52] | Joint Genome Institute and the University of Washington | 2004 [52] |
Excavata is a group of related free living and symbiotic protists; it includes the Metamonada, Loukozoa, Euglenozoa and Percolozoa. They are researched for their role in human disease.
Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
---|---|---|---|---|---|---|
Giardia enterica (G. duodenalis assemblage B) | Parasitic protozoan | Human pathogen (Giardiasis) | 11.7 Mb | 4,470 [53] | multicenter collaboration | 2009 [53] |
Giardia duodenalis ATCC 50803 (Giardia duodenalis assemblage A) | Parasitic protozoan | Human pathogen (Giardiasis) | 11.7 Mb | 6,470, [54] 7,153 (UniProt) | Karolinska Institutet, Marine Biological Laboratory | 2007 [54] |
Leishmania braziliensis MHOM/BR/75M2904 | Parasitic protozoan | Human pathogen (Leishmaniasis) | 33 Mb | 8,314 [55] | Sanger Institute, Universidade de São Paulo, Imperial College | 2007 [55] |
Leishmania infantum JPCM5 | Parasitic protozoan | Human pathogen (Visceral leishmaniasis) | 33 Mb | 8,195 [55] | Sanger Institute, Imperial College and University of Glasgow | 2007 [55] |
Leishmania major Strain:Friedlin | Parasitic protozoan | Human pathogen (Cutaneous leishmaniasis) | 32.8 Mb | 8,272 [56] | Sanger Institute and Seattle Biomedical Research Institute | 2005 [56] |
Naegleria gruberi | amoeboflagellate | Diverged from other eukaryotes over 1 billion years ago | 41 Mb [57] | 15,727 [57] | 2010 [57] | |
Trichomonas vaginalis | Parasitic protozoan | Human pathogen (Trichomoniasis) | 160 Mb | 59,681 [58] | TIGR | 2007 [58] |
Trypanosoma brucei Strain:TREU927/4 GUTat10.1 | Parasitic protozoan | Human pathogen (Sleeping sickness) | 26 Mb | 9,068 [59] | Sanger Institute and TIGR | 2005 [59] |
Trypanosoma cruzi Strain:CL Brener TC3 | Parasitic protozoan | Human pathogen (Chagas disease) | 34 Mb | 22,570 [60] | TIGR, Seattle Biomedical Research Institute and Uppsala University | 2005 [60] |
Opisthokonts are a group of eukaryotes that include both animals and fungi as well as basal groups that are not classified in these groups. These basal opisthokonts are reasonably categorized as protists and include choanoflagellates, which are the sister or near-sister group of animals.
Organism | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion |
---|---|---|---|---|---|---|
Monosiga brevicollis | Choanoflagellate | close relative of metazoans | 41.6 Mb | 9,200 [61] | Joint Genome Institute | 2007 [61] |
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.
Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.
The dinoflagellates are a monophyletic group of single-celled eukaryotes constituting the phylum Dinoflagellata and are usually considered protists. Dinoflagellates are mostly marine plankton, but they also are common in freshwater habitats. Their populations vary with sea surface temperature, salinity, and depth. Many dinoflagellates are photosynthetic, but a large fraction of these are in fact mixotrophic, combining photosynthesis with ingestion of prey.
The alveolates are a group of protists, considered a major clade and superphylum within Eukarya. They are currently grouped with the stramenopiles and Rhizaria among the protists with tubulocristate mitochondria into the SAR supergroup.
Nucleomorphs are small, vestigial eukaryotic nuclei found between the inner and outer pairs of membranes in certain plastids. They are thought to be vestiges of primitive red and green algal nuclei that were engulfed by a larger eukaryote. Because the nucleomorph lies between two sets of membranes, nucleomorphs support the endosymbiotic theory and are evidence that the plastids containing them are complex plastids. Having two sets of membranes indicate that the plastid, a prokaryote, was engulfed by a eukaryote, an alga, which was then engulfed by another eukaryote, the host cell, making the plastid an example of secondary endosymbiosis.
Comparative genomics is a branch of biological research that examines genome sequences across a spectrum of species, spanning from humans and mice to a diverse array of organisms from bacteria to chimpanzees. This large-scale holistic approach compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes. Comparison of whole genome sequences provides a highly detailed view of how organisms are related to each other at the gene level. By comparing whole genome sequences, researchers gain insights into genetic relationships between organisms and study evolutionary changes. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved or common among species, as well as genes that give unique characteristics of each organism. Moreover, these studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms.
Genome size is the total amount of DNA contained within one copy of a single complete genome. It is typically measured in terms of mass in picograms or less frequently in daltons, or as the total number of nucleotide base pairs, usually in megabases. One picogram is equal to 978 megabases. In diploid organisms, genome size is often used interchangeably with the term C-value.
Chromosome conformation capture techniques are a set of molecular biology methods used to analyze the spatial organization of chromatin in a cell. These methods quantify the number of interactions between genomic loci that are nearby in 3-D space, but may be separated by many nucleotides in the linear genome. Such interactions may result from biological functions, such as promoter-enhancer interactions, or from random polymer looping, where undirected physical motion of chromatin causes loci to collide. Interaction frequencies may be analyzed directly, or they may be converted to distances and used to reconstruct 3-D structures.
Guanine nucleotide-binding protein G(I)/G(S)/G(O) subunit gamma-11 is a protein that in humans is encoded by the GNG11 gene.
Holozoa is a clade of organisms that includes animals and their closest single-celled relatives, but excludes fungi and all other organisms. Together they amount to more than 1.5 million species of purely heterotrophic organisms, including around 300 unicellular species. It consists of various subgroups, namely Metazoa and the protists Choanoflagellata, Filasterea, Pluriformea and Ichthyosporea. Along with fungi and some other groups, Holozoa is part of the Opisthokonta, a supergroup of eukaryotes. Choanofila was previously used as the name for a group similar in composition to Holozoa, but its usage is discouraged now because it excludes animals and is therefore paraphyletic.
Amphidinium is a genus of dinoflagellates. The type for the genus is Amphidinium operculatum Claparède & Lachmann. The genus includes the species Amphidinium carterae which is used as a model organism.
Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.
A plant genome assembly represents the complete genomic sequence of a plant species, which is assembled into chromosomes and other organelles by using DNA fragments that are obtained from different types of sequencing technology.
{{cite journal}}
: Cite journal requires |journal=
(help)