Genome survey sequence

Last updated

In the fields of bioinformatics and computational biology, Genome survey sequences (GSS) are nucleotide sequences similar to expressed sequence tags (ESTs) that the only difference is that most of them are genomic in origin, rather than mRNA. [1]

Contents

Genome survey sequences are typically generated and submitted to NCBI by labs performing genome sequencing and are used, amongst other things, as a framework for the mapping and sequencing of genome size pieces included in the standard GenBank divisions. [1]

Contributions

Genome survey sequencing is a new way to map the genome sequences since it is not dependent on mRNA. Current genome sequencing approaches are mostly high-throughput shotgun methods, and GSS is often used on the first step of sequencing. GSSs can provide an initial global view of a genome, which includes both coding and non-coding DNA and contain repetitive section of the genome unlike ESTs. For the estimation of repetitive sequences, GSS plays an important role in the early assessment of a sequencing project since these data can affect the assessment of sequences coverage, library quality and the construction process. [2] For example, in the estimation of dog genome, it can estimate the global parameters, such as neutral mutation rate and repeat content. [3]

GSS is also an effective way to large-scale and rapidly characterizing genomes of related species where there is only little gene sequences or maps. [4] GSS with low coverage can generate abundant information of gene content and putative regulatory elements of comparative species. [5] It can compare these genes of related species to find out relatively expanded or contracted families. And combined with physical clone coverage, researchers can navigate the genome easily and characterize the specific genomic section by more extensive sequencing. [3]

Limitation

The limitation of genomic survey sequence is that it lacks long-range continuity because of its fragmentary nature, which makes it harder to forecast gene and marker order. For example, to detect repetitive sequences in GSS data, it may not be possible to find out all the repeats since the repetitive genome may be longer than the reads, which is difficult to recognize. [2]

Types of data

The GSS division contains (but is not limited to) the following types of data:

Random "single pass read" genome survey sequences

Random “single pass read” genome survey sequences is GSSs that generated along single pass read by random selection. Single-pass sequencing with lower fidelity can be used on the rapid accumulation of genomic data but with a lower accuracy. [6] It includes RAPD, RFLP, AFLP and so on. [7]

Cosmid/BAC/YAC end sequences

Cosmid/BAC/YAC end sequences use Cosmid/Bacterial artificial chromosome/Yeast artificial chromosome to sequence the genome from the end side. These sequences act like very low copy plasmids that there is only one copy per cell sometimes. To get enough chromosome, they need a large number of E. coli culture that 2.5 - 5 litres may be a reasonable amount. [8]

Cosmid/BAC/YAC can also be used to get bigger clone of DNA fragment than vectors like plasmid and phagemid. A larger insert is often helpful for the sequence project in organizing clones. [9]

Eukaryotic proteins can be expressed by using YAC with posttranslational modification. [10] BAC can’t do that, but BACs can reliably represent human DNA much better than YAC or cosmid. [11]

Exon trapped genomic sequences

Exon trapped sequence is used to identify genes in cloned DNA, and this is achieved by recognizing and trapping carrier containing exon sequence of DNA. Exon trapping has two main features: First, it is independent of availability of the RNA expressing target DNA. Second, isolated sequences can be derived directly from clone without knowing tissues expressing the gene which needs to be identified. [12] During slicing, exon can be remained in mRNA and information carried by exon can be contained in the protein. Since fragment of DNA can be inserted into sequences, if an exon is inserted into intron, the transcript will be longer than usual and this transcript can be trapped by analysis.

Alu PCR sequences

Alu repetitive element is member of Short Interspersed Elements (SINE) in mammalian genome. There are about 300 to 500 thousand copies of Alu repetitive element in human genome, which means one Alu element exists in 4 to 6 kb averagely. Alu elements are distributed widely in mammalian genome, and repeatability is one of the characteristics, that is why it is called Alu repetitive element. By using special Alu sequence as target locus, specific human DNA can be obtained from clone of TAC, BAC, PAC or human-mouse cell hybrid.

PCR is an approach used to clone a small piece of fragment of DNA. The fragment could be one gene or just a part of gene. PCR can only clone very small fragment of DNA, which generally does not exceed 10kbp.

Alu PCR is a "DNA fingerprinting" technique. This approach is rapid and easy to use. It is obtained from analysis of many genomic loci flanked by Alu repetitive elements, which are non-autonomous retrotransposons present in high number of copies in primate genomes. [13] Alu element can be used for genome fingerprinting based on PCR, which is also called Alu PCR.

Transposon-tagged sequences

There are several ways to analyze the function of a particular gene sequence, the most direct method is to replace it or cause a mutation and then to analyze the results and effects. There are three method are developed for this purpose: gene replacement, sense and anti-sense suppression, and insertional mutagenesis. Among these methods, insertional mutagenesis was proved to be very good and successful approach.

At first, T-DNA was applied for insertional mutagenesis. However, using transposable element can bring more advantages. Transposable elements were first discovered by Barbara McClintock in maize plants. She identified the first transposable genetic element, which she called the Dissociation (Ds) locus. [14] The size of transposable element is between 750 and 40000bp. Transposable element can be mainly classified as two classes: One class is very simple, called insertion sequence (IS), the other class is complicated, called transposon. Transposon has one or several characterized genes, which can be easily identified. IS has the gene of transposase.

Transposon can be used as tag for a DNA with a know sequence. Transposon can appear at other locus through transcription or reverse transcription by the effect of nuclease. This appearance of transposon proved that genome is not statistical, but always changing the structure of itself.

There are two advantages by using transposon tagging. First, if a transposon is inserted into a gene sequence, this insertion is single and intact. The intactness can make tagged sequence easily to molecular analysis. The other advantage is that, many transposons can be found eliminated from tagged gene sequence when transposase is analyzed. This provides confirmation that the inserted gene sequence was really tagged by transposon. [15]

Example of GSS file

The following is an example of GSS file that can be submitted to GenBank: [16]

TYPE: GSS STATUS:  New CONT_NAME: Sikela JM GSS#: Ayh00001 CLONE: HHC189 SOURCE: ATCC SOURCE_INHOST: 65128 OTHER_GSS:  GSS00093, GSS000101 CITATION:  Genomic sequences from Human  brain tissue SEQ_PRIMER: M13 Forward P_END: 5' HIQUAL_START: 1 HIQUAL_STOP: 285 DNA_TYPE: Genomic CLASS: shotgun LIBRARY: Hippocampus, Stratagene (cat. #936205) PUBLIC:  PUT_ID: Actin, gamma, skeletal COMMENT: SEQUENCE: AATCAGCCTGCAAGCAAAAGATAGGAATATTCACCTACAGTGGGCACCTCCTTAAGAAGCTG ATAGCTTGTTACACAGTAATTAGATTGAAGATAATGGACACGAAACATATTCCGGGATTAAA CATTCTTGTCAAGAAAGGGGGAGAGAAGTCTGTTGTGCAAGTTTCAAAGAAAAAGGGTACCA GCAAAAGTGATAATGATTTGAGGATTTCTGTCTCTAATTGGAGGATGATTCTCATGTAAGGT GCAAAAGTGATAATGATTTGAGGATTTCTGTCTCTAATTGGAGGATGATTCTCATGTAAGGT TGTTAGGAAATGGCAAAGTATTGATGATTGTGTGCTATGTGATTGGTGCTAGATACTTTAAC TGAGTATACGAGTGAAATACTTGAGACTCGTGTCACTT || 

Related Research Articles

Transposable element Semiparasitic DNA sequence

A transposable element is a DNA sequence that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposition often results in duplication of the same genetic material. Barbara McClintock's discovery of them earned her a Nobel Prize in 1983.

An Alu element is a short stretch of DNA originally characterized by the action of the Arthrobacter luteus (Alu) restriction endonuclease. Alu elements are the most abundant transposable elements, containing over one million copies dispersed throughout the human genome. Alu elements were thought to be selfish or parasitic DNA, because their sole known function is self reproduction. However, they are likely to play a role in evolution and have been used as genetic markers. They are derived from the small cytoplasmic 7SL RNA, a component of the signal recognition particle. Alu elements are highly conserved within primate genomes and originated in the genome of an ancestor of Supraprimates.

Yeast artificial chromosome Genetically engineered chromosome derived from the DNA of yeast

Yeast artificial chromosomes (YACs) are genetically engineered chromosomes derived from the DNA of the yeast, Saccharomyces cerevisiae, which is then ligated into a bacterial plasmid. By inserting large fragments of DNA, from 100–1000 kb, the inserted sequences can be cloned and physically mapped using a process called chromosome walking. This is the process that was initially used for the Human Genome Project, however due to stability issues, YACs were abandoned for the use of Bacterial artificial chromosomes (BAC). Beginning with the initial research of the Rankin et al., Strul et al., and Hsaio et al., the inherently fragile chromosome was stabilized by discovering the necessary autonomously replicating sequence (ARS); a refined YAC utilizing this data was described in 1983 by Murray et al.

Retrotransposon Type of genetic component

Retrotransposons are a type of genetic component that copy and paste themselves into different genomic locations (transposon) by converting RNA back into DNA through the process reverse transcription using an RNA transposition intermediate.

Library (biology)

In molecular biology, a library is a collection of DNA fragments that is stored and propagated in a population of micro-organisms through the process of molecular cloning. There are different types of DNA libraries, including cDNA libraries, genomic libraries and randomized mutant libraries. DNA library technology is a mainstay of current molecular biology, genetic engineering, and protein engineering, and the applications of these libraries depend on the source of the original DNA fragments. There are differences in the cloning vectors and techniques used in library preparation, but in general each DNA fragment is uniquely inserted into a cloning vector and the pool of recombinant DNA molecules is then transferred into a population of bacteria or yeast such that each organism contains on average one construct. As the population of organisms is grown in culture, the DNA molecules contained within them are copied and propagated.

This is a list of topics in molecular biology. See also index of biochemistry articles.

Genetics, a discipline of biology, is the science of heredity and variation in living organisms.

P elements are transposable elements that were discovered in Drosophila as the causative agents of genetic traits called hybrid dysgenesis. The transposon is responsible for the P trait of the P element and it is found only in wild flies. They are also found in many other eukaryotes.


A genomic library is a collection of the total genomic DNA from a single organism. The DNA is stored in a population of identical vectors, each containing a different insert of DNA. In order to construct a genomic library, the organism's DNA is extracted from cells and then digested with a restriction enzyme to cut the DNA into fragments of a specific size. The fragments are then inserted into the vector using DNA ligase. Next, the vector DNA can be taken up by a host organism - commonly a population of Escherichia coli or yeast - with each cell containing only one vector molecule. Using a host cell to carry the vector allows for easy amplification and retrieval of specific clones from the library for analysis.

Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron structure. There are different mechanisms through which exon shuffling occurs: transposon mediated exon shuffling, crossover during sexual recombination of parental genomes and illegitimate recombination.

In molecular biology, insertional mutagenesis is the creation of mutations of DNA by addition of one or more base pairs. Such insertional mutations can occur naturally, mediated by viruses or transposons, or can be artificially created for research purposes in the lab.

Fosmids are similar to cosmids but are based on the bacterial F-plasmid. The cloning vector is limited, as a host can only contain one fosmid molecule. Fosmids can hold DNA inserts of up to 40 kb in size; often the source of the insert is random genomic DNA. A fosmid library is prepared by extracting the genomic DNA from the target organism and cloning it into the fosmid vector. The ligation mix is then packaged into phage particles and the DNA is transfected into the bacterial host. Bacterial clones propagate the fosmid library. The low copy number offers higher stability than vectors with relatively higher copy numbers, including cosmids. Fosmids may be useful for constructing stable libraries from complex genomes. Fosmids have high structural stability and have been found to maintain human DNA effectively even after 100 generations of bacterial growth. Fosmid clones were used to help assess the accuracy of the Public Human Genome Sequence.

Transposon mutagenesis, or transposition mutagenesis, is a biological process that allows genes to be transferred to a host organism's chromosome, interrupting or modifying the function of an extant gene on the chromosome and causing mutation. Transposon mutagenesis is much more effective than chemical mutagenesis, with a higher mutation frequency and a lower chance of killing the organism. Other advantages include being able to induce single hit mutations, being able to incorporate selectable markers in strain construction, and being able to recover genes after mutagenesis. Disadvantages include the low frequency of transposition in living systems, and the inaccuracy of most transposition systems.

Knockout rat

A knockout rat is a genetically engineered rat with a single gene turned off through a targeted mutation used for academic and pharmaceutical research. Knockout rats can mimic human diseases and are important tools for studying gene function and for drug discovery and development. The production of knockout rats was not economically or technically feasible until 2008.

Helitrons are one of the three groups of eukaryotic class 2 transposable elements (TEs) so far described. They are the eukaryotic rolling-circle transposable elements which are hypothesized to transpose by a rolling circle replication mechanism via a single-stranded DNA intermediate. They were first discovered in plants and in the nematode Caenorhabditis elegans, and now they have been identified in a diverse range of species, from protists to mammals. Helitrons make up a substantial fraction of many genomes where non-autonomous elements frequently outnumber the putative autonomous partner. Helitrons seem to have a major role in the evolution of host genomes. They frequently capture diverse host genes, some of which can evolve into novel host genes or become essential for Helitron transposition.

Transposons are semi-parasitic DNA sequences which can replicate and spread through the host's genome. They can be harnessed as a genetic tool for analysis of gene and protein function. The use of transposons is well-developed in Drosophila and in Thale cress and bacteria such as Escherichia coli.

Transposon insertion sequencing (Tn-seq) combines transposon insertional mutagenesis with massively parallel sequencing (MPS) of the transposon insertion sites to identify genes contributing to a function of interest in bacteria. The method was originally established by concurrent work in four laboratories under the acronyms HITS, INSeq, TraDIS, and Tn-Seq. Numerous variations have been subsequently developed and applied to diverse biological systems. Collectively, the methods are often termed Tn-Seq as they all involve monitoring the fitness of transposon insertion mutants via DNA sequencing approaches.

WormBase is an online biological database about the biology and genome of the nematode model organism Caenorhabditis elegans and contains information about other related nematodes. WormBase is used by the C. elegans research community both as an information resource and as a place to publish and distribute their results. The database is regularly updated with new versions being released every two months. WormBase is one of the organizations participating in the Generic Model Organism Database (GMOD) project.

DNA transposons are DNA sequences, sometimes referred to "jumping genes", that can move and integrate to different locations within the genome. They are class II transposable elements (TEs) that move through a DNA intermediate, as opposed to class I TEs, retrotransposons, that move through an RNA intermediate. DNA transposons can move in the DNA of an organism via a single-or double-stranded DNA intermediate. DNA transposons have been found in both prokaryotic and eukaryotic organisms. They can make up a significant portion of an organism's genome, particularly in eukaryotes. In prokaryotes, TE's can facilitate the horizontal transfer of antibiotic resistance or other genes associated with virulence. After replicating and propagating in a host, all transposon copies become inactivated and are lost unless the transposon passes to a genome by starting a new life cycle with horizontal transfer. It is important to note that DNA transposons do not randomly insert themselves into the genome, but rather show preference for specific sites.

Vectorette PCR

Vectorette PCR is a variation of polymerase chain reaction (PCR) designed in 1988. The polymerase chain reaction (PCR) was created and also patented during the 1980s. Vectorette PCR was first noted and described in an article in 1990 by Riley and his team. Since then, multiple variants of PCR have been created. Vectorette PCR focuses on amplifying a specific sequence obtained from an internal sequence that is originally known until the fragment end. Multiple researches have taken this method as an opportunity to conduct experiments in order to uncover the potential uses that can be derived from Vectorette PCR.

References

  1. 1 2 GenBank Flat File 96.0 Release Notes
  2. 1 2 Otto, Thomas D., et al. "ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)." Bmc Bioinformatics 9.1 (2008): 366.
  3. 1 2 Kirkness, E. F. (2003-09-26). "The Dog Genome: Survey Sequencing and Comparative Analysis". Science. American Association for the Advancement of Science (AAAS). 301 (5641): 1898–1903. Bibcode:2003Sci...301.1898K. doi:10.1126/science.1086432. ISSN   0036-8075. PMID   14512627. S2CID   22366556.
  4. Venkatesh, Byrappa, et al. "Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii) genome." PLoS biology 5.4 (2007): e101.
  5. Hitte, Christophe, et al. "Facilitating genome navigation: survey sequencing and dense radiation-hybrid gene mapping." Nature Reviews Genetics 6.8 (2005): 643-648.
  6. "DNA sequencing How to determine the sequence of bases in a DNA molecule". Archived from the original on 2013-10-21. Retrieved 2013-10-21.
  7. DDBJ-GSS
  8. MEGA- and GIGA preps of cosmid-, BAC-, PAC, YAC-, and P1-DNA with JETSTAR 2.0
  9. "WSSP-04 Chapter 2 – Vectors" (PDF). Archived from the original (PDF) on 2013-10-23. Retrieved 2013-10-22.
  10. Yeast artificial chromosome
  11. Venter, J. Craig, Hamilton O. Smith, and Leroy Hood. "A New Cooperative Strategy for Sequencing the Human and Other Genomes."
  12. Martin C. Wapenaar; Johan T. Den Dunnen (2001). Exon Trapping: Application of a Large-Insert Multiple-Exon-Trapping System. Methods in Molecular Biology. 175. pp. 201–215. doi:10.1385/1-59259-235-X:201. ISBN   978-1-59259-235-7. PMID   11462836.
  13. Cardelli M (2011). "Alu PCR". PCR Protocols. Methods in Molecular Biology. 687. pp. 221–9. doi:10.1007/978-1-60761-944-4_15. ISBN   978-1-60761-943-7. PMID   20967611.
  14. Tsugeki R, Olson ML, Fedoroff NV (May 2007). "Transposon tagging and the study of root development in Arabidopsis". Gravitational and Space Biology. 11 (2): 79–87. PMID   11540642.
  15. Ramachandran S, Sundaresan V (2001). "Transposons as tools for functional genomics". Plant Physiology and Biochemistry. 39 (3–4): 243–252. doi:10.1016/s0981-9428(01)01243-8.
  16. dbGSS_submit