1000 Plant Genomes Project

Last updated

1000 Plant Genomes Project
Funding agency
Duration2008 – 2019
Website onekp.com

The 1000 Plant Transcriptomes Initiative (1KP) was an international research effort to establish the most detailed catalogue of genetic variation in plants. It was announced in 2008 and headed by Gane Ka-Shu Wong and Michael Deyholos of the University of Alberta. The project successfully sequenced the transcriptomes (expressed genes) of 1,000 different plant species by 2014; [1] [2] its final capstone products were published in 2019. [3] [4] [5]

Contents

1KP was a large-scale (involving many organisms) sequencing projects designed to take advantage of the wider availability of high-throughput ("next-generation") DNA sequencing technologies. The similar 1000 Genomes Project, for example, obtained high-coverage genome sequences of 1,000 individual people between 2008 and 2015, to better understand human genetic variation. [6] [7] The initiative provided a template for further planetary-scale genome projects, including the 10KP Project—sequencing the whole genomes of 10,000 plants, [8] and the Earth BioGenome Project—aiming to sequence, catalogue, and characterize the genomes of all of Earth's eukaryotic biodiversity. [9]

Goals

As of 2002, the number of classified green plant species was estimated to be around 370,000, however, there are probably many thousands more yet unclassified. [10] Despite this number, very few of these species have detailed DNA sequence information to date; 125,426 species in GenBank, as of 11 April 2012, [11] but most (>95%) having DNA sequence for only one or two genes. "...almost none of the roughly half million plant species known to humanity has been touched by genomics at any level". [1] The 1000 Plant Genomes Project aimed to produce a roughly a 100x increase in the number of plant species with available broad genome sequence.

Evolutionary relationships

There have been efforts to determine the evolutionary relationships between the known plant species, [12] [13] but phylogenies (or phylogenetic trees) created solely using morphological data, cellular structures, single enzymes, or on only a few sequences (like rRNA) can be prone to error; [14] morphological features are especially vulnerable when two species look physically similar though they are not closely related (as a result of convergent evolution for example) or homology, or when two species closely related look very different because, for example, they are able to change in response to their environment very well. These situations are very common in the plant kingdom. An alternative method for constructing evolutionary relationships is through changes in DNA sequence of many genes between the different species which is often more robust to problems of similar-appearing species. [14] With the amount of genomic sequence produced by this project, many predicted evolutionary relationships could be better tested by sequence alignment to improve their certainty. With 383,679 nuclear gene family phylogenies and 2,306 gene age distributions with Ks plots used in the final analysis and shared in GigaDB alongside the capstone paper. [15]

Biotechnology applications

The list of plant genomes sequenced in the project was not random; instead plants that produce valuable chemicals or other products (secondary metabolites in many cases) were focused on in the hopes that characterizing the involved genes will allow the underlying biosynthetic processes to be used or modified. [1] For example, there are many plants known to produce oils (like olives) and some of the oils from certain plants bear a strong chemical resemblance to petroleum products like the Oil palm and hydrocarbon-producing species. [16] If these plant mechanisms could be used to produce mass quantities of industrially useful oil, or modified such that they do, then they would be of great value. Here, knowing the sequence of the plant's genes involved in the metabolic pathway producing the oil is a large first step to allow such utilization. A recent example of how engineering natural biochemical pathways works is Golden rice which has involved genetically modifying its pathway, so that a precursor to vitamin A is produced in large quantities making the brown-colored rice a potential solution for vitamin A deficiency. [17] This is concept of engineering plants to do "work" is popular [18] and its potential would dramatically increase as a result of gene information on these 1000 plant species. Biosynthetic pathways could also be used for mass production of medicinal compounds using plants rather than manual organic chemical reactions as most are created currently.

One of the most unexpected results of the project was the discovery of multiple novel light-sensitive ion-channels used extensively for optogenetic control of neurons discovered through sequencing and physiological characterization of opsins from over 100 species of alga species by the project. [19] The characterization of these novel channelrhodopsin sequences providing resources for protein engineers who would normally have no interest in or ability to generate sequence data from these many plant species. [20] A number of biotech companies are developing these channelrhodopsin proteins for medical purposes, with many of these optogenetic therapy candidates under clinical trials to restore vision for retinal blindness. The first published results of these treating retinitis pigmentosa coming out in July 2021. [21]

Project approach

Sequencing was initially done on the Illumina Genome Analyzer GAII next-generation DNA sequencing platform at the Beijing Genomics Institute (BGI Shenzhen, China), but later samples were run on the faster Illumina HiSeq 2000 platform. Starting with the 28 Illumina Genome Analyzer next-generation DNA sequencing machines, these were eventually upgraded to 100 HiSeq 2000 sequencers at the Beijing Genomics Institute. The initial 3Gb/run (3 billion base pairs per experiment) capacity of each of these machines enabled fast and accurate sequencing of the plant samples. [22]

Species selection

The selection of plant species to be sequenced was compiled through an international collaboration of the various funding agencies and researcher groups expressing their interest in certain plants. [1] There was a focus on those plant species that are known to have useful biosynthetic capacity to facilitate the biotechnology goals of the project, and selection of other species to fill in gaps and explain some unknown evolutionary relationships of the current plant phylogeny. In addition to industrial compound biosynthetic capacity, plant species known or suspected to produce medically active chemicals (such as poppies producing opiates) were assigned a high priority to better understand the synthesis process, explore commercial production potential, and discover new pharmaceutical options. A large number of plant species with medicinal properties were selected from traditional Chinese medicine (TCM). [1] The completed list of selected species can be publicly viewed on the website, [23] and methodological details and data access details have been published in detail. [5] [24]

Transcriptome vs. genome sequencing

Rather than sequencing the entire genome (all DNA sequence) of the various plant species, the project sequenced only those regions of the genome that produce a protein product (coding genes); the transcriptome. [1] This approach is justified by the focus on biochemical pathways where only the genes producing the involved proteins are required to understand the synthetic mechanism, and because these thousands of sequences would represent adequate sequence detail to construct very robust evolutionary relationships through sequence comparison. The numbers of coding genes in plant species can vary considerably, but all have tens of thousands or more making the transcriptome a large collection of information. However, non-coding sequence makes up the majority (>90%) of the genome content. [25] Although this approach is similar conceptually to expressed sequence tags (ESTs), it is fundamentally different in that the entire sequence of each gene will be acquired with high coverage rather than just a small portion of the gene sequence with an EST. [26] To distinguish the two, the non-EST method is known as "shotgun transcriptome sequencing". [26]

Transcriptome shotgun sequencing

mRNA (messenger RNA) is collected from a sample, converted to cDNA by a reverse transcriptase enzyme, and then fragmented so that it can be sequenced. [1] [22] Other than transcriptome shotgun sequencing, this technique has been called RNA-seq and whole transcriptome shotgun sequencing (WTSS). [26] Once the cDNA fragments are sequenced, they will be de novo assembled (without aligning to a reference genome sequence) back into the complete gene sequence by combining all of the fragments from that gene during the data analysis phase. A new a de novo transcriptome assembler designed specifically for RNA-Seq was produced for this project, [27] SOAPdenovo-Trans being part of the SOAP suite of genome assembly tools from the BGI.

Plant tissue sampling

The samples came from around the world, with a number of particularly rare species being supplied by botanical gardens such as the Fairy Lake Botanical Garden (Shenzhen, China).[ citation needed ] The type of tissue collected was determined by the expected location of biosynthetic activity; for example if an interesting process or chemical is known to exist primarily in the leaves, leaf sample was used. A number of RNA-sequencing protocols were adapted and tested for different tissue types, [24] and these were openly shared via the protocols.io platform. [28]

Potential limitations

Since only the transcriptome was sequenced, the project did not reveal information about gene regulatory sequence, non-coding RNAs, DNA repetitive elements, or other genomic features that are not part of the coding sequence. Based on the few whole plant genomes collected so far, these non-coding regions will in fact make up the majority of the genome, [25] [29] and the non-coding DNA may actually be the primary driver of trait differences seen between species. [30]

Since mRNA was the starting material, the amount of sequence representation for a given gene is based on the expression level (how many mRNA molecules it produces). This means that highly expressed genes get better coverage because there is more sequence to work from. [30] The result, then, is that some important genes may not have been reliably detected by the project if they are expressed at a low level yet still have important biochemical functions.

Many plant species (especially agriculturally manipulated ones) [29] are known to have undergone large genome-wide changes through duplication of the whole genome. The rice and the wheat genomes, for example, can have 4-6 copies of whole genomes [29] (wheat) whereas animals typically only have 2 (diploidy). These duplicated genes may pose a problem for the de novo assembly of sequence fragments, because repeat sequences confuse the computer programs when trying to put the fragments together, and they can be difficult to track through evolution.

Comparison with the 1000 Genomes Project

Similarities

Just as the Beijing Genomics Institute in Shenzhen, China is one of the major genomics centers involved in the 1000 Genomes Project, the institute is the site of sequencing for the 1000 Plant Genomes Project. [31] Both projects are large-scale efforts to obtain detailed DNA sequence information to improve our understanding of the organisms, and both projects will utilize next-generation sequencing to facilitate a timely completion.

Differences

The goals of the two projects are significantly different. While the 1000 Genomes Project focuses on genetic variation in a single species, the 1000 Plant Genomes Project looks at the evolutionary relationships and genes of 1000 different plant species.

While the 1000 Genomes Project was estimated to cost up to $50 million USD, [6] the 1000 Plant Genomes Project was not as expensive; the difference in cost coming from the target sequence in the genomes. [1] Since the 1000 Plant Genomes Project only sequenced the transcriptome, whereas the human project sequenced as much of the genome as is decided feasible, [6] there is a much lower amount of sequencing effort needed in this more specific approach. While this means that there was less overall sequence output relative to the 1000 Genomes Project, the non-coding portions of the genomes excluded in the 1000 Plant Genomes Project were not as important to its goals like they are to the human project. So then the more focused approach of the 1000 Plant Genomes Project minimized cost while still achieving its goals.

Funding

The project was funded by Alberta Innovates - Technology Futures (merger of iCORE ), Genome Alberta, the University of Alberta, the Beijing Genomics Institute (BGI), and Musea Ventures (a USA-based private investment firm). [32] To date, the project received $1.5 million CAD from the Alberta Government and another $0.5 million from Musea Ventures. [32] In January 2010, BGI announced that it would be contributing $100 million to large-scale sequencing projects of plants and animals (including the 1000 Plant Genomes Project, and then following on to the 10,000 Plant Genome Project [8] ). [31]

See also

Related Research Articles

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 24 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Regions that are completely nonfunctional are called junk DNA.

<span class="mw-page-title-main">Genomics</span> Discipline in genetics

Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

<span class="mw-page-title-main">BGI Group</span> Chinese genome sequencing company

BGI Group, formerly Beijing Genomics Institute, is a Chinese genomics company with headquarters in Yantian, Shenzhen. The company was originally formed in 1999 as a genetics research center to participate in the Human Genome Project. It also sequences the genomes of other animals, plants and microorganisms.

In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases. EST approaches have largely been superseded by whole genome and transcriptome sequencing and metagenome sequencing.

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

<span class="mw-page-title-main">1000 Genomes Project</span> International research effort on genetic variation

The 1000 Genomes Project (1KGP), taken place from January 2008 to 2015, was an international research effort to establish the most detailed catalogue of human genetic variation at the time. Scientists planned to sequence the genomes of at least one thousand anonymous healthy participants from a number of different ethnic groups within the following three years, using advancements in newly developed technologies. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.

<span class="mw-page-title-main">RNA-Seq</span> Lab technique in cellular biology

RNA-Seq is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also known as transcriptome.

Complete Genomics is a life sciences company that has developed and commercialized a DNA sequencing platform for human genome sequencing and analysis. The company is a wholly-owned subsidiary of MGI.

SOAP is a suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data.

Cancer genome sequencing is the whole genome sequencing of a single, homogeneous or heterogeneous group of cancer cells. It is a biochemical laboratory method for the characterization and identification of the DNA or RNA sequences of cancer cell(s).

Paired-end tags (PET) are the short sequences at the 5’ and 3' ends of a DNA fragment which are unique enough that they (theoretically) exist together only once in a genome, therefore making the sequence of the DNA in between them available upon search or upon further sequencing. Paired-end tags (PET) exist in PET libraries with the intervening DNA absent, that is, a PET "represents" a larger fragment of genomic or cDNA by consisting of a short 5' linker sequence, a short 5' sequence tag, a short 3' sequence tag, and a short 3' linker sequence. It was shown conceptually that 13 base pairs are sufficient to map tags uniquely. However, longer sequences are more practical for mapping reads uniquely. The endonucleases used to produce PETs give longer tags but sequences of 50–100 base pairs would be optimal for both mapping and cost efficiency. After extracting the PETs from many DNA fragments, they are linked (concatenated) together for efficient sequencing. On average, 20–30 tags could be sequenced with the Sanger method, which has a longer read length. Since the tag sequences are short, individual PETs are well suited for next-generation sequencing that has short read lengths and higher throughput. The main advantages of PET sequencing are its reduced cost by sequencing only short fragments, detection of structural variants in the genome, and increased specificity when aligning back to the genome compared to single tags, which involves only one end of the DNA fragment.

<span class="mw-page-title-main">DNA nanoball sequencing</span> DNA sequencing technology

DNA nanoball sequencing is a high throughput sequencing technology that is used to determine the entire genomic sequence of an organism. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Fluorescent nucleotides bind to complementary nucleotides and are then polymerized to anchor sequences bound to known sequences on the DNA template. The base order is determined via the fluorescence of the bound nucleotides This DNA sequencing method allows large numbers of DNA nanoballs to be sequenced per run at lower reagent costs compared to other next generation sequencing platforms. However, a limitation of this method is that it generates only short sequences of DNA, which presents challenges to mapping its reads to a reference genome. After purchasing Complete Genomics, the Beijing Genomics Institute (BGI) refined DNA nanoball sequencing to sequence nucleotide samples on their own platform.

De novo transcriptome assembly is the de novo sequence assembly method of creating a transcriptome without the aid of a reference genome.

Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.

In molecular phylogenetics, relationships among individuals are determined using character traits, such as DNA, RNA or protein, which may be obtained using a variety of sequencing technologies. High-throughput next-generation sequencing has become a popular technique in transcriptomics, which represent a snapshot of gene expression. In eukaryotes, making phylogenetic inferences using RNA is complicated by alternative splicing, which produces multiple transcripts from a single gene. As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

The Earth BioGenome Project (EBP) is an initiative that aims to sequence and catalog the genomes of all of Earth's currently described eukaryotic species over a period of ten years. The initiative would produce an open DNA database of biological information that provides a platform for scientific research and supports environmental and conservation initiatives. A scientific paper presenting the vision for the project was published in PNAS in April 2018, and the project officially launched November 1, 2018.

References

  1. 1 2 3 4 5 6 7 8 Retrieved Feb. 25, 2010
  2. Matasci N, Hung LH, Yan Z, Carpenter EJ, Wickett NJ, Mirarab S, et al. (2014). "Data access for the 1,000 Plants (1KP) project". GigaScience. 3 (17): 17. doi: 10.1186/2047-217X-3-17 . PMC   4306014 . PMID   25625010.
  3. One Thousand Plant Transcriptomes Initiative (October 2019). "One thousand plant transcriptomes and the phylogenomics of green plants". Nature. 574 (7780): 679–685. doi:10.1038/s41586-019-1693-2. PMC   6872490 . PMID   31645766.
  4. Wong GK, Soltis DE, Leebens-Mack J, Wickett NJ, Barker MS, de Peer YV, et al. (May 4, 2016). "Sequencing and Analyzing the Transcriptomes of a Thousand Species Across the Tree of Life for Green Plants". Annual Review of Plant Biology. 71: 741–765. doi: 10.1146/annurev-arplant-042916-041040 . ISSN   1543-5008. PMID   31851546. S2CID   209416841.
  5. 1 2 Carpenter EJ, Matasci N, Ayyampalayam S, Wu S, Sun J, Yu J, et al. (October 2019). "Access to RNA-sequencing data from 1,173 plant species: The 1000 Plant transcriptomes initiative (1KP)". GigaScience. 8 (10). doi:10.1093/gigascience/giz126. PMC   6808545 . PMID   31644802.
  6. 1 2 3 4 Hayden EC (January 2008). "International genome project launched". Nature. 451 (7177): 378–9. Bibcode:2008Natur.451R.378C. doi:10.1038/451378b. PMID   18216809. S2CID   205035320.
  7. "About IGSR and the 1000 Genomes Project". IGSR: The International Genome Sample Resource. Retrieved October 2, 2018.
  8. 1 2 Cheng S, Melkonian M, Smith SA, Brockington S, Archibald JM, Delaux PM, et al. (March 1, 2018). "10KP: A phylodiverse genome sequencing plan". GigaScience. 7 (3): 1–9. doi:10.1093/gigascience/giy013. PMC   5869286 . PMID   29618049.
  9. Lewin HA, Robinson GE, Kress WJ, Baker WJ, Coddington J, Crandall KA, et al. (April 24, 2018). "Earth BioGenome Project: Sequencing life for the future of life". Proceedings of the National Academy of Sciences. 115 (17): 4325–4333. Bibcode:2018PNAS..115.4325L. doi: 10.1073/pnas.1720115115 . ISSN   0027-8424. PMC   5924910 . PMID   29686065.
  10. Pitman NC, Jørgensen PM (November 2002). "Estimating the size of the world's threatened flora". Science. 298 (5595): 989. doi:10.1126/science.298.5595.989. PMID   12411696. S2CID   891010.
  11. "NCBI Taxonomy". NCBI. Retrieved April 11, 2012.
  12. Bremer K (1985). "Summary of Green Plant Phylogeny and Classification". Cladistics. 1 (4): 369–385. doi: 10.1111/j.1096-0031.1985.tb00434.x . PMID   34965683. S2CID   84961691.
  13. Graham LE, Delwiche CF, Mishler BD (1991). "Phylogenetic connections between the'green algae'and the'bryophytes'". Advances in Bryology. 213–44 (3): 451–483. JSTOR   2399900.
  14. 1 2 Doyle JJ (January 1992). "Gene trees and species trees: molecular systematics as one-character taxonomy". Systematic Botany. 1 (1): 144–63. doi:10.2307/2419070. JSTOR   2419070.
  15. Li Z, Barker MS (February 1, 2020). "Inferring putative ancient whole-genome duplications in the 1000 Plants (1KP) initiative: access to gene family phylogenies and age distributions". GigaScience. 9 (2). doi:10.1093/gigascience/giaa004. PMC   7011446 . PMID   32043527.
  16. Augustus GD, Jayabalan M, Rajarathinam K, Ray AK, Seiler GJ (2002). "Potential hydrocarbon producing species of Western Ghats, Tamil Nadu, India". Biomass and Bioenergy. 23 (3): 165–169. Bibcode:2002BmBe...23..165A. doi:10.1016/S0961-9534(02)00045-4.
  17. Ye X, Al-Babili S, Klöti A, Zhang J, Lucca P, Beyer P, et al. (January 2000). "Engineering the provitamin A (beta-carotene) biosynthetic pathway into (carotenoid-free) rice endosperm". Science. 287 (5451): 303–5. Bibcode:2000Sci...287..303Y. doi:10.1126/science.287.5451.303. PMID   10634784. S2CID   40258379.
  18. Taiz L, Zeiger E (2006). "Chapter 13: Secondary metabolites and plant defense". Plant physiology (4th ed.). Sinauer Associates. ISBN   978-0-87893-856-8.
  19. Klapoetke NC, Murata Y, Kim SS, Pulver SR, Birdsey-Benson A, Cho YK, et al. (March 2014). "Independent optical excitation of distinct neural populations". Nature Methods. 11 (3): 338–346. doi:10.1038/nmeth.2836. PMC   3943671 . PMID   24509633.
  20. Wong GK, Soltis DE, Leebens-Mack J, Wickett NJ, Barker MS, Van de Peer Y, et al. (April 2020). "Sequencing and Analyzing the Transcriptomes of a Thousand Species Across the Tree of Life for Green Plants". Annual Review of Plant Biology. 71: 741–765. doi: 10.1146/annurev-arplant-042916-041040 . PMID   31851546. S2CID   209416841.
  21. Sahel JA, Boulanger-Scemama E, Pagot C, Arleo A, Galluppi F, Martel JN, et al. (July 2021). "Partial recovery of visual function in a blind patient after optogenetic therapy". Nature Methods. 27 (7): 1223–1229. doi: 10.1038/s41591-021-01351-4 . PMID   34031601. S2CID   235203605.
  22. 1 2 "Retrieved Feb. 25, 2010". Archived from the original on March 7, 2010. Retrieved March 3, 2010.
  23. "1kP Sample List Viewer". www.onekp.com. Retrieved April 10, 2020.
  24. 1 2 Johnson MT, Carpenter EJ, Tian Z, Bruskiewich R, Burris JN, Carrigan CT, et al. (November 21, 2012). "Evaluating Methods for Isolating Total RNA and Predicting the Success of Sequencing Phylogenetically Diverse Plant Transcriptomes". PLOS ONE. 7 (11): e50226. Bibcode:2012PLoSO...750226J. doi: 10.1371/journal.pone.0050226 . ISSN   1932-6203. PMC   3504007 . PMID   23185583.
  25. 1 2 Morgante M (April 2006). "Plant genome organisation and diversity: the year of the junk!". Current Opinion in Biotechnology. 17 (2): 168–73. doi:10.1016/j.copbio.2006.03.001. PMID   16530402.
  26. 1 2 3 Morozova O, Hirst M, Marra MA (2009). "Applications of new sequencing technologies for transcriptome analysis". Annual Review of Genomics and Human Genetics. 10: 135–51. doi:10.1146/annurev-genom-082908-145957. PMID   19715439. S2CID   26713396.
  27. Xie Y, Wu G, Tang J, Luo R, Patterson J, Liu S, et al. (June 15, 2014). "SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads". Bioinformatics. 30 (12): 1660–1666. arXiv: 1305.6760 . doi: 10.1093/bioinformatics/btu077 . ISSN   1367-4803. PMID   24532719.
  28. T M, J E, Tian Z, Bruskiewich R, N J, T C, et al. (August 15, 2019). "RNA Isolation from Plant Tissue v1 (protocols.io.439gyr6)". Protocols.io. doi: 10.17504/protocols.io.439gyr6 .
  29. 1 2 3 Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, et al. (April 2002). "A draft sequence of the rice genome (Oryza sativa L. ssp. indica)". Science. 296 (5565): 79–92. Bibcode:2002Sci...296...79Y. doi:10.1126/science.1068037. PMID   11935017. S2CID   208529258.
  30. 1 2 Bird CP, Stranger BE, Liu M, Thomas DJ, Ingle CE, Beazley C, et al. (2007). "Fast-evolving noncoding sequences in the human genome". Genome Biology. 8 (6): R118. doi: 10.1186/gb-2007-8-6-r118 . PMC   2394770 . PMID   17578567.
  31. 1 2 "BGI Seeks Proposals to Sequence 1,000 Plant, Animal Genomes; Pledges $100M Toward Effort". GenomeWeb. January 12, 2010. Retrieved February 25, 2010.
  32. 1 2 "Alberta iCORE researcher leads international genome project". Government of Alberta. November 13, 2008. Archived from the original on September 25, 2012. Retrieved August 21, 2018.
  33. Weigel D, Mott R (2009). "The 1001 genomes project for Arabidopsis thaliana". Genome Biology. 10 (5): 107. doi: 10.1186/gb-2009-10-5-107 . PMC   2718507 . PMID   19519932.
  34. Genome 10K Community of Scientists (2009). "Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species". The Journal of Heredity. 100 (6): 659–74. doi:10.1093/jhered/esp086. PMC   2877544 . PMID   19892720.{{cite journal}}: CS1 maint: numeric names: authors list (link)