This list of sequenced algal genomes contains algal species known to have publicly available complete genome sequences that have been assembled, annotated and published. Unassembled genomes are not included, nor are organelle-only sequences. For plant genomes see the list of sequenced plant genomes. For plastid sequences, see the list of sequenced plastomes. For all kingdoms, see the list of sequenced genomes.
See also List of sequenced protist genomes.
Organism strain | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion | Assembly status | Links |
---|---|---|---|---|---|---|---|---|
Breviolum minutum ( Symbiodinium minutum; clade B1) | Dinoflagellate | Coral symbiont | 1.5 Gb | 47,014 | Okinawa Institute of Science and Technology | 2013 [1] | Draft | OIST Marine Genomics [2] |
Cladocopium goreaui ( Symbiodinium goreaui; clade C, type C1) | Dinoflagellate | Coral symbiont | 1.19 Gb | 35,913 | Reef Future Genomics (ReFuGe) 2020 / University of Queensland | 2018 [3] | Draft | ReFuGe 2020 [4] |
Cladocopium C92 strain Y103 ( Symbiodinium sp. clade C; putative type C92) | Dinoflagellate | Foraminiferan symbiont | Unknown (assembly size 0.70 Gb) | 65,832 | Okinawa Institute of Science and Technology | 2018 [5] | Draft | OIST Marine Genomics [2] |
Fugacium kawagutii CS156=CCMP2468 ( Symbiodinium kawagutii; clade F1) | Dinoflagellate | Coral symbiont? | 1.07 Gb | 26,609 | Reef Future Genomics (ReFuGe) 2020 / University of Queensland | 2018 [3] | Draft | ReFuGe 2020 [4] |
Fugacium kawagutii CCMP2468 ( Symbiodinium kawagutii; clade F1) | Dinoflagellate | Coral symbiont? | 1.18 Gb | 36,850 | University of Connecticut / Xiamen University | 2015 [6] | Draft | S. kawagutii genome project [7] |
Polarella glacialis CCMP1383 | Dinoflagellate | Psychrophile, Antarctic | 3.02 Gb (diploid), 1.48 Gbp (haploid) | 58,232 | University of Queensland | 2020 [8] | Draft | UQ eSpace [9] |
Polarella glacialis CCMP2088 | Dinoflagellate | Psychrophile, Arctic | 2.65 Gb (diploid), 1.30 Gbp (haploid) | 51,713 | University of Queensland | 2020 [8] | Draft | UQ eSpace [9] |
Symbiodinium microadriaticum (clade A) | Dinoflagellate | Coral symbiont | 1.1 Gb | 49,109 | King Abdullah University of Science and Technology | 2016 [10] | Draft | Reef Genomics [11] |
Symbiodinium A3 strain Y106 ( Symbiodinium sp. clade A3) | Dinoflagellate | symbiont | Unknown (assembly size 0.77 Gb) | 69,018 | Okinawa Institute of Science and Technology | 2018 [5] | Draft | OIST Marine Genomics [2] |
Organism strain | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion | Assembly status | Links |
---|---|---|---|---|---|---|---|---|
Cryptophyceae sp. CCMP2293 | Nanoflagellate | Nucleomorph, Psychrophile | 534.5 Mb | 33,051 | Joint Genome Institute | 2016 [12] | JGI Genome Portal [13] | |
Guillardia theta | Eukaryote Endosymbiosis | 87.2 Mb | 24, 840 | Dalhousie University | 2012 [14] | The Greenhouse [15] |
Organism strain | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion | Assembly status | Links |
---|---|---|---|---|---|---|---|---|
Cyanophora | Model Organism | 70.2 Mb | 3,900 | Rutgers University | 2012 [16] | Draft v1 | The Greenhouse [15] Cyanophora Genome Project [17] | |
Cyanophora | Model Organism | 99.94 Mb | 25,831 | Rutgers University | 2019 [18] | Draft v2 | Cyanophora Genome Project [19] |
Organism strain | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion | Assembly status | Links |
---|---|---|---|---|---|---|---|---|
Asterochloris sp. Cgr/DA1pho | Photobiont | 55.8 Mb | 10,025 | Duke University | 2011 [20] | JGI Genome Portal [13] | ||
Auxenochlorella protothecoides | Biofuels | 22.9 Mb | 7,039 | Tsinghua University | 2014 [21] | The Greenhouse [15] | ||
Bathycoccus prasinos | Comparative analysis | 15.1 Mb | 7,900 | Joint Genome Institute | 2012 [22] | JGI Genome Portal [13] | ||
Chlamydomonas reinhardtii CC-503 cw92 mt+ | Model Organism | 111.1 Mb | 17,741 | Joint Genome Institute | 2017 [23] | Phytozome [24] The Greenhouse [15] | ||
Chlorella sorokiniana str. 1228 | Biofuels | 61.4 Mb | Los Alamos National Lab | 2018 [25] | The Greenhouse [15] | |||
Chlorella sorokiniana UTEX 1230 | Biofuels | 58.5 Mb | Los Alamos National Lab | 2018 [26] | The Greenhouse [15] | |||
Chlorella sorokiniana DOE1412 | Biofuels | 57.8 Mb | Los Alamos National Lab | 2018 [27] | The Greenhouse [15] | |||
Chlorella variabilis NC64A | Biofuels | 46.2 Mb | 9,791 | 2010 [28] | The Greenhouse [15] | |||
Chlorella vulgaris | Biofuels | 37.3 Mb | National Renewable | 2015 [29] | The Greenhouse [15] | |||
Coccomyxa subellipsoidea sp. C-169 | Biofuels | 48.8 Mb | 9839 | Joint Genome Institute | 2012 [30] | Phytozome [24] The Greenhouse [15] | ||
Dunaliella salina CCAP19/18 | Halophile Biofuels Beta-carotene and glycerol production | 343.7 Mb | 16,697 | Joint Genome Institute | 2017 [31] | Phytozome [24] | ||
Eudorina sp. | Multicellular alga, model organism | ~180 Mb | University of Tokyo | 2018 [32] | ||||
Gonium pectorale | 148.81 Mb | Kansas State University | 2016 [33] | |||||
Micromonas commoda NOUM17 (RCC288) | Marine phytoplankton | 21.0 Mb | 10,262 | Monterey Bay Aquarium Research Institute | 2013 [34] [35] | JGI Genome Portal [13] | ||
Micromonas pusilla CCMP-1545 | Marine | 21.9 Mb | 10,575 | Micromonas Genome Consortium | 2009 [36] | Phytozome [24] The Greenhouse [15] | ||
Micromonas RCC299/NOUM17 | Marine | 20.9 Mb | 10,056 | Joint Genome | 2009 [36] | Phytozome [24] The Greenhouse [15] | ||
Monoraphidium | Biofuels | 69.7 Mb | 16,755 | Bielefeld | 2013 [37] | The Greenhouse [15] | ||
Ostreococcus CCE9901 | Small genome | 13.2 Mb | 7,603 | Joint Genome Institute | 2007 [38] | Phytozome [24] | ||
Ostreococcus tauri OTH95 | Small genome | 12.9 Mb | 7,699 | CNRS | 2014 [39] | The Greenhouse [15] | ||
Ostreococcus sp. RCC809 | Small genome | 13.3 Mb | 7,492 | Joint Genome | 2009 [40] | JGI [41] | ||
Picochlorum DOE101 | Biofuels | 15.2 Mb | 7,844 | Los Alamos | 2017 [42] | The Greenhouse [15] | ||
Picochlorum SENEW3 | Biofuels | 13.5 Mb | 7,367 | Rutgers University | 2014 [43] | The Greenhouse [15] | ||
Scenedesmus obliquus DOE0152Z | Biofuels | 210.3 Mb | Brooklyn College | 2017 [44] | The Greenhouse [15] | |||
Symbiochloris reticulata (Metagenome) | Photobiont | 58.6 Mb | 12,720 | Joint Genome Institute | 2018 [45] | JGI Genome Portal [13] | ||
Tetraselmis sp. | Biofuels | 228 Mb | Los Alamos | 2018 [15] | The Greenhouse [15] | |||
Pedinomonas minor (Chlorophyta) | 55 Mb | New Phytologist | 2022 [46] | |||||
Volvox carteri | Multicellular alga, model organism | 131.2 Mb | 14,247 | Joint Genome | 2010 [47] | Phytozome [24] The Greenhouse [15] | ||
Yamagishiella unicocca | Multicellular alga, model organism | ~140 Mb | University of Tokyo | 2018 [32] |
Organism strain | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion | Assembly status | Links |
---|---|---|---|---|---|---|---|---|
Chrysochromulina | Biofuels | 65.8 Mb | Los Alamos National Laboratory | 2018 [48] | The Greenhouse [15] | |||
Chrysochromulina tobinii CCMP291 | Model organism, Biofuels | 59.1 Mb | 16,765 | University of Washington | 2015 [49] | The Greenhouse [15] | ||
Emiliania huxleyi | Coccolithophore | Alkenone production, Algal blooms | 167.7 Mb | 38,554 | Joint Genome Institute | 2013 [50] | The Greenhouse [15] | |
Pavlovales sp. CCMP2436 | Psychrophile | 165.4 Mb | 26,034 | Joint Genome Institute | 2016 [51] | JGI Genome Portal [13] |
Organism strain | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion | Assembly status | Links |
---|---|---|---|---|---|---|---|---|
Aureococcus | Harmful Algal Bloom | 50.1 Mb | 11,522 | Joint Genome Institute | 2011 [52] | The Greenhouse [15] | ||
Ectocarpus siliculosus | Brown algae | Model organism | 198.5 Mb | 16,269 | Genoscope | 2012 [53] | The Greenhouse [15] | |
Fragilariopsis cylindrus CCMP1102 | Psychrophile | 61.1 Mb | 21,066 | University of East Anglia, Joint Genome Institute | 2017 [54] | JGI Genome Portal [13] | ||
Nannochloropsis | Biofuels | 28.5 Mb | 10,486 | University of Padua | 2014 [55] | The Greenhouse [15] | ||
Nannochloropsis | Biofuels | 31.5 Mb | Chinese Academy of Sciences, Qingdao Institute of Bioenergy and Bioprocess Technology | 2016 [56] | The Greenhouse [15] | |||
Nannochloropsis Salina CCMP1766 | Biofuels | 24.4 Mb | Chinese Academy of Sciences, Qingdao Institute of Bioenergy and Bioprocess Technology | 2016 [57] | The Greenhouse [15] | |||
Ochromonadaceae sp. CCMP2298 | Psychrophile | 61.1 Mb | 20,195 | Joint Genome Institute | 2016 [58] | JGI Genome Portal [13] | ||
Pelagophyceae sp. CCMP2097 | Psychrophile | 85.2 Mb | 19,402 | Joint Genome Institute | 2016 [59] | JGI Genome Portal [13] | ||
Phaeodactylum tricornutum | Model organism | 27.5 Mb | 10,408 | Diatom Consortium | 2008 [60] | The Greenhouse [15] | ||
Pseudo-nitzschia multiseries CLN-47 | 218.7 Mb | 19,703 | Joint Genome Institute | 2011 [61] | JGI Genome Portal [13] | |||
Saccharina japonica | Brown algae | Commercial crop | 543.4 Mb | Chinese Academy of Sciences, Beijing Institutes of Life Science | 2015 [62] | The Greenhouse [15] | ||
Thalassiosira oceanica CCMP 1005 | Model organism | 92.2 Mb | 34,642 | The Future Ocean | 2012 [63] | The Greenhouse [15] | ||
Thalassiosira pseudonana | model organism | 32.4 Mb | 11,673 | Diatom Consortium | 2009 [64] | The Greenhouse [15] |
Organism strain | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion | Assembly status | Links |
---|---|---|---|---|---|---|---|---|
Chondrus crispus | Carrageenan production, model organism | 105 Mb | 9,606 | Genoscope | 2013 | The Greenhouse [15] | ||
Cyanidioschyzon merolae 10D | Model organism | 16.5 Mb | 4,775 | National Institute of Genetics, Japan | 2007 [65] | The Greenhouse [15] | ||
Galdieria sulphuraria | Extremophile | 12.1 Mb | The University of York | 2016 [66] | The Greenhouse [15] | |||
Gracilariopsis chorda | Mesophile | 92.1 Mb | 10,806 | Sungkyunkwan University | 2018 [67] | |||
Porphyridium purpureum | Mesophile | 19.7 Mb | 8,355 | Rutgers University | 2013 [68] | |||
Porphyra umbilicalis | Mariculture | 87.6 Mb | 13,360 | University of Maine | 2017 [69] | Phytozome [24] | ||
Pyropia yezoensis | Mariculture | 43.5 Mb | 10,327 | National Research Institute of Fisheries Science | 2013 [70] |
Organism strain | Type | Relevance | Genome size | Number of genes predicted | Organization | Year of completion | Assembly status | Links |
---|---|---|---|---|---|---|---|---|
Bigelowiella natans | Model organism | 94. Mb | 21,708 | Dalhousie University | 2012 [14] | The Greenhouse [15] |
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.
Nucleomorphs are small, vestigial eukaryotic nuclei found between the inner and outer pairs of membranes in certain plastids. They are thought to be vestiges of primitive red and green algal nuclei that were engulfed by a larger eukaryote. Because the nucleomorph lies between two sets of membranes, nucleomorphs support the endosymbiotic theory and are evidence that the plastids containing them are complex plastids. Having two sets of membranes indicate that the plastid, a prokaryote, was engulfed by a eukaryote, an alga, which was then engulfed by another eukaryote, the host cell, making the plastid an example of secondary endosymbiosis.
The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a physical and a functional standpoint. It started in 1990 and was completed in 2003. It remains the world's largest collaborative biological project. Planning for the project started after it was adopted in 1984 by the US government, and it officially launched in 1990. It was declared complete on April 14, 2003, and included about 92% of the genome. Level "complete genome" was achieved in May 2021, with only 0.3% of the bases covered by potential issues. The final gapless assembly was finished in January 2022.
David J. Lipman is an American biologist who from 1989 to 2017 was the director of the National Center for Biotechnology Information (NCBI) at the National Institutes of Health. NCBI is the home of GenBank, the U.S. node of the International Sequence Database Consortium, and PubMed, one of the most heavily used sites in the world for the search and retrieval of biomedical information. Lipman is one of the original authors of the BLAST sequence alignment program, and a respected figure in bioinformatics. In 2017, he left NCBI and became Chief Science Officer at Impossible Foods.
Symbiodinium is a genus of dinoflagellates that encompasses the largest and most prevalent group of endosymbiotic dinoflagellates known and have photosymbiotic relationships with many species. These unicellular microalgae commonly reside in the endoderm of tropical cnidarians such as corals, sea anemones, and jellyfish, where the products of their photosynthetic processing are exchanged in the host for inorganic molecules. They are also harbored by various species of demosponges, flatworms, mollusks such as the giant clams, foraminifera (soritids), and some ciliates. Generally, these dinoflagellates enter the host cell through phagocytosis, persist as intracellular symbionts, reproduce, and disperse to the environment. The exception is in most mollusks, where these symbionts are intercellular. Cnidarians that are associated with Symbiodinium occur mostly in warm oligotrophic (nutrient-poor), marine environments where they are often the dominant constituents of benthic communities. These dinoflagellates are therefore among the most abundant eukaryotic microbes found in coral reef ecosystems.
Archaeal Richmond Mine acidophilic nanoorganisms (ARMAN) were first discovered in an extremely acidic mine located in northern California (Richmond Mine at Iron Mountain) by Brett Baker in Jill Banfield's laboratory at the University of California Berkeley. These novel groups of archaea named ARMAN-1, ARMAN-2 (Candidatus Micrarchaeum acidiphilum ARMAN-2), and ARMAN-3 were missed by previous PCR-based surveys of the mine community because the ARMANs have several mismatches with commonly used PCR primers for 16S rRNA genes. Baker et al. detected them in a later study using shotgun sequencing of the community. The three groups were originally thought to represent three unique lineages deeply branched within the Euryarchaeota, a subgroup of the Archaea. However, based on a more complete archaeal genomic tree, they were assigned to a new superphylum named DPANN. The ARMAN groups now comprise deeply divergent phyla named Micrarchaeota and Parvarchaeota. Their 16S rRNA genes differ by as much as 17% between the three groups. Prior to their discovery, all of the Archaea shown to be associated with Iron Mountain belonged to the order Thermoplasmatales (e.g., Ferroplasma acidarmanus).
Striated muscle preferentially expressed protein kinase, in the human is encoded by the SPEG gene, a member of the myosin light chain kinase protein family. SPEG is involved in the development of the muscle cell cytoskeleton, and the expression of this gene has important roles in the development of skeletal muscles, and their maintenance and function. Mutations are associated with centronuclear myopathies a group of congenital disorders where the cell nuclei are abnormally centrally placed.
The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.
A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead, a reference provides a haploid mosaic of different DNA sequences from each donor. For example, one of the most recent human reference genomes, assembly GRCh38/hg38, is derived from >60 genomic clone libraries. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser.
Ostreococcus tauri is a unicellular species of marine green alga about 0.8 micrometres (μm) in diameter, the smallest free-living (non-symbiotic) eukaryote yet described. It has a very simple ultrastructure, and a compact genome.
TIGRFAMs is a database of protein families designed to support manual and automated genome annotation. Each entry includes a multiple sequence alignment and hidden Markov model (HMM) built from the alignment. Sequences that score above the defined cutoffs of a given TIGRFAMs HMM are assigned to that protein family and may be assigned the corresponding annotations. Most models describe protein families found in Bacteria and Archaea.
Propionispira raffinosivorans is a motile, obligate anaerobic, gram-negative bacteria. It was originally isolated from spoiled beer and believed to have some causative effect in beer spoilage. Since then, it has been taxonomically reclassified and proven to play a role in anaerobic beer spoilage, because of its production of acids, such as acetic and propionic acid, during fermentation
Endozoicomonas gorgoniicola is a Gram-negative and facultative anaerobic bacterium from the genus of Endozoicomonas. Individual cells are motile and rod-shaped. Bacteria in this genus are symbionts of coral. E. gorgoniicola live specifically with soft coral and were originally isolated from a species of Plexaura, an octocoral, off the coast of Bimini in the Bahamas. The presence of this bacterium in a coral microbiome is associated with coral health.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.
FAM86B1 is a protein, which in humans is encoded by the FAM86B1 gene. FAM86B1 is an essential gene in humans. The protein contains two domains: FAM86, and AdoMet-MTase.
Armadillo-like Helical Domain Containing 1 (ARMH1) is a protein which in humans is encoded by chromosome 1 open reading frame 228, also known as the ARMH1 gene. The gene shows expression levels significantly higher in bone marrow, lymph nodes, and testis. Currently the function of this gene and subsequent protein is still uncertain.
{{cite journal}}
: Cite journal requires |journal=
(help)