Developer(s) | CodonCode Corporation |
---|---|
Stable release | 4.2.5 / 2013 |
Operating system | Mac OS X, Windows |
Type | Bioinformatics |
License | commercial; free for limited use (trace viewing & editing) |
Website | http://www.codoncode.com/aligner |
CodonCode Aligner is a commercial application for DNA sequence assembly, sequence alignment, and editing on Mac OS X and Windows.
Features include chromatogram editing, end clipping, and vector trimming, sequence assembly and contig editing, aligning cDNA against genomic templates, sequence alignment and editing, alignment of contigs to each other with ClustalW, MUSCLE, or built-in algorithms, mutation detection, including detection of heterozygous single-nucleotide polymorphism, analysis of heterozygous insertions and deletions, start online BLAST searches, restriction analysis (find and view restriction cut sites), trace sharpening, and support for Phred, Phrap, ClustalW, and MUSCLE.
The first beta version of CodonCode Aligner was released in April 2003, followed by the first full version in June 2003. Major upgrades were released in 2003, 2004, 2005, 2006, 2007, and 2008.
In April 2009, CodonCode Aligner had been cited in more than 400 scientific publications. Citations cover a wide variety of biomedical research areas, including HIV research, [1] [2] [3] biogeography and environmental biology, [4] [5] DNA methylation studies, [6] genetic diseases, [7] [8] [9] clinical microbiology, [10] [11] and evolution research and phylogenetics. [12] [13] [14]
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.
In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases. EST approaches have largely been superseded by whole genome and transcriptome sequencing and metagenome sequencing.
Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its algorithm is also detailed in their respective categories. Available operating systems listed in the sidebar are a combination of the software availability and may not be supported for every current version of the Clustal tools. Clustal Omega has the widest variety of operating systems out of all the Clustal tools.
In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids or proteins across species, or within a genome, or between donor and receptor taxa. Conservation indicates that a sequence has been maintained by natural selection.
The phi X 174 bacteriophage is a single-stranded DNA (ssDNA) virus that infects Escherichia coli, and the first DNA-based genome to be sequenced. This work was completed by Fred Sanger and his team in 1977. In 1962, Walter Fiers and Robert Sinsheimer had already demonstrated the physical, covalently closed circularity of ΦX174 DNA. Nobel prize winner Arthur Kornberg used ΦX174 as a model to first prove that DNA synthesized in a test tube by purified enzymes could produce all the features of a natural virus, ushering in the age of synthetic biology. In 1972–1974, Jerard Hurwitz, Sue Wickner, and Reed Wickner with collaborators identified the genes required to produce the enzymes to catalyze conversion of the single stranded form of the virus to the double stranded replicative form. In 2003, it was reported by Craig Venter's group that the genome of ΦX174 was the first to be completely assembled in vitro from synthesized oligonucleotides. The ΦX174 virus particle has also been successfully assembled in vitro. In 2012, it was shown how its highly overlapping genome can be fully decompressed and still remain functional.
Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations that appear as differing characters in a single alignment column, and insertion or deletion mutations that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides.
T-Coffee is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment. It can also combine multiple sequences alignments obtained previously and in the latest versions can use structural information from PDB files (3D-Coffee). It has advanced features to evaluate the quality of the alignments and some capacity for identifying occurrence of motifs (Mocca). It produces alignment in the aln format (Clustal) by default, but can also produce PIR, MSF, and FASTA format. The most common input formats are supported.
The retroviral psi packaging element, also known as the Ψ RNA packaging signal, is a cis-acting RNA element identified in the genomes of the retroviruses Human immunodeficiency virus (HIV) and Simian immunodeficiency virus (SIV). It is involved in regulating the essential process of packaging the retroviral RNA genome into the viral capsid during replication. The final virion contains a dimer of two identical unspliced copies of the viral genome.
COP9 signalosome complex subunit 6 is a protein that in humans is encoded by the COPS6 gene.
DNA damage-binding protein 1 is a protein that in humans is encoded by the DDB1 gene.
Poliovirus receptor-related 1 (PVRL1), also known as nectin-1 and CD111 (formerly herpesvirus entry mediator C, HVEC) is a human protein of the immunoglobulin superfamily (IgSF), also considered a member of the nectins. It is a membrane protein with three extracellular immunoglobulin domains, a single transmembrane helix and a cytoplasmic tail. The protein can mediate Ca2+-independent cellular adhesion further characterizing it as IgSF cell adhesion molecule (IgSF CAM).
DNA dC->dU-editing enzyme APOBEC-3F is a protein that in humans is encoded by the APOBEC3F gene.
HERV-K_19q12 provirus ancestral Pol protein is a protein that in humans is encoded by the ERVK6 gene.
Eukaryotic translation initiation factor 3, subunit M (eIF3m) also known as PCI domain containing 1 (herpesvirus entry mediator) (PCID1), is a protein that in humans is encoded by the EIF3M gene.
Human bocavirus (HBoV) is the name given to all viruses in the genus Bocaparvovirus of virus family Parvoviridae that are known to infect humans. HBoV1 and HBoV3 are members of species Primate bocaparvovirus 1 whereas viruses HBoV2 and HBoV4 belong to species Primate bocaparvovirus 2. Some of these viruses cause human disease. HBoV1 is strongly implicated in causing some cases of lower respiratory tract infection, especially in young children, and several of the viruses have been linked to gastroenteritis, although the full clinical role of this emerging infectious disease remains to be elucidated.
MacVector is a commercial sequence analysis application for Apple Macintosh computers running Mac OS X. It is intended to be used by molecular biologists to help analyze, design, research and document their experiments in the laboratory. MacVector 18.1 is a Universal Binary capable of running on Intel and Apple Silicon Macs.
A late protein is a viral protein that is formed after replication of the virus. One example is VP4 from simian virus 40 (SV40).
The Staden Package is computer software, a set of tools for DNA sequence assembly, editing, and sequence analysis. It is open-source software, released under a BSD 3-clause license.
Digital transcriptome subtraction (DTS) is a bioinformatics method to detect the presence of novel pathogen transcripts through computational removal of the host sequences. DTS is the direct in silico analogue of the wet-lab approach representational difference analysis (RDA), and is made possible by unbiased high-throughput sequencing and the availability of a high-quality, annotated reference genome of the host. The method specifically examines the etiological agent of infectious diseases and is best known for discovering Merkel cell polyomavirus, the suspect causative agent in Merkel-cell carcinoma.
Desmond Gerard Higgins is a Professor of Bioinformatics at University College Dublin, widely known for CLUSTAL, a series of computer programs for performing multiple sequence alignment. According to Nature, Higgins' papers describing CLUSTAL are among the top ten most highly cited scientific papers of all time.