Consed

Last updated
Consed
Developer(s) David Gordon
Operating system UNIX, Linux, Mac OS X
Type Bioinformatics
License Proprietary
Website http://bozeman.mbt.washington.edu/consed/consed.html

Consed [1] is a program for viewing, editing, and finishing DNA sequence assemblies. Originally developed for sequence assemblies created with phrap, recent versions also support other sequence assembly programs like Newbler.

Contents

History

Consed was originally developed as a contig editing and finishing tool for large-scale cosmid shotgun sequencing in the Human Genome Project. At genome sequencing centers, Consed was used to check assemblies generated by phrap, solve assembly problems like those caused by highly identical repeats, and finishing tasks like primer picking and gap closure. Development of Consed has continued after the completion of the Human Genome Project. Current Consed versions support very large projects with millions of reads, enabling the use with newer sequencing methods like 454 sequencing and Solexa sequencing. Consed also has advanced tools for finishing tasks like automated primer picking [2]

See also

Related Research Articles

In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random firing pattern of a shotgun.

Genomics Discipline in genetics

Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

DNA sequencer

A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then reported as a text string, called a read. Some DNA sequencers can be also considered optical instruments as they analyze light signals originating from fluorochromes attached to nucleotides.

Genome project

Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism and to annotate protein-coding genes and other important genome-encoded features. The genome sequence of an organism includes the collective DNA sequences of each chromosome in the organism. For a bacterium containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of autosomes and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences.

In bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology cannot read whole genomes in one go, but rather reads small pieces of between 20 and 30,000 bases, depending on the technology used. Typically the short fragments, called reads, result from shotgun sequencing genomic DNA, or gene transcript (ESTs).

DNA sequencing process of determining the nucleic acid sequence – the order of nucleotides in DNA

DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

Sanger sequencing Method of DNA sequencing developed in 1977

Sanger sequencing is a method of DNA sequencing based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. Developed by Frederick Sanger and colleagues in 1977, it was the most widely used sequencing method for approximately 40 years. It was first commercialized by Applied Biosystems in 1986. More recently, higher volume Sanger sequencing has been replaced by "Next-Gen" sequencing methods, especially for large-scale, automated genome analyses. However, the Sanger method remains in wide use, for smaller-scale projects, and for validation of Next-Gen results. It still has the advantage over short-read sequencing technologies that it can produce DNA sequence reads of > 500 nucleotides.

Jim Kent American bioinformatician

William James Kent is an American research scientist and computer programmer. He has been a contributor to genome database projects and the 2003 winner of the Benjamin Franklin Award.

Phred quality score

A Phred quality score is a measure of the quality of the identification of the nucleobases generated by automated DNA sequencing. It was originally developed for Phred base calling to help in the automation of DNA sequencing in the Human Genome Project. Phred quality scores are assigned to each nucleotide base call in automated sequencer traces. The FASTQ format encodes phred scores as ASCII characters alongside the read sequences. Phred quality scores have become widely accepted to characterize the quality of DNA sequences, and can be used to compare the efficacy of different sequencing methods. Perhaps the most important use of Phred quality scores is the automatic determination of accurate, quality-based consensus sequences.

BBS5 protein-coding gene in the species Homo sapiens

Bardet–Biedl syndrome 5 protein is a protein that in humans is encoded by the BBS5 gene.

UBE1L2 protein-coding gene in the species Homo sapiens

Ubiquitin-like modifier-activating enzyme 6 is a protein that in humans is encoded by the UBA6 gene.

ZNF452 protein-coding gene in the species Homo sapiens

SCAN domain-containing protein 3 is a protein that in humans is encoded by the SCAND3 gene.

MOBKL2A protein-coding gene in the species Homo sapiens

Mps one binder kinase activator-like 2A is an enzyme that in humans is encoded by the MOBKL2A gene.

MEGAN is a computer program that allows optimized analysis of large metagenomic datasets.

Phred base-calling is a computer program for identifying a base (nucleobase) sequence from a fluorescence "trace" data generated by an automated DNA sequencer that uses electrophoresis and 4-fluorescent dye method. When originally developed, Phred produced significantly fewer errors in the data sets examined than other methods, averaging 40–50% fewer errors. Phred quality scores have become widely accepted to characterize the quality of DNA sequences, and can be used to compare the efficacy of different sequencing methods.

Phrap is a widely used program for DNA sequence assembly. It is part of the Phred-Phrap-Consed package.

The Staden Package is computer software, a set of tools for DNA sequence assembly, editing, and sequence analysis. It is open-source software, released under a BSD 3-clause license.

SOAP is a suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data.

HMGB4 protein-coding gene in the species Homo sapiens

High mobility group protein B4 is a transcription factor that in humans is encoded by the HMGB4 gene.

Scaffolding (bioinformatics)

Scaffolding is a technique used in bioinformatics. It is defined as follows:

Link together a non-contiguous series of genomic sequences into a scaffold, consisting of sequences separated by gaps of known length. The sequences that are linked are typically contiguous sequences corresponding to read overlaps.

References

  1. Gordon D, Abajian C, Green P (1998). "Consed: A Graphical Tool for Sequence Finishing". Genome Research. 8 (3): 195–202. doi: 10.1101/gr.8.3.195 . PMID   9521923.
  2. Gordon D, Desmarais C, Green P (2001). "Automated Finishing with Autofinish". Genome Research. 11 (4): 614–625. doi:10.1101/gr.171401. PMC   311035 . PMID   11282977.