DECIPHER (software)

Last updated
DECIPHER
Developer(s) Erik Wright
Stable release
3.0.0 / 2024 (2024)
Written in R, C
Operating system Unix, Linux, macOS, Windows
Platform IA-32, x86-64, ARM
Available inEnglish
Type Bioinformatics
License GPL 3
Website decipher.codes

DECIPHER is a software that can be used to decipher and manage biological sequences efficiently using the programming language R.

Contents

Features

See also

Related Research Articles

Oligonucleotides are short DNA or RNA molecules, oligomers, that have a wide range of applications in genetic testing, research, and forensics. Commonly made in the laboratory by solid-phase chemical synthesis, these small fragments of nucleic acids can be manufactured as single-stranded molecules with any user-specified sequence, and so are vital for artificial gene synthesis, polymerase chain reaction (PCR), DNA sequencing, molecular cloning and as molecular probes. In nature, oligonucleotides are usually found as small RNA molecules that function in the regulation of gene expression, or are degradation intermediates derived from the breakdown of larger nucleic acid molecules.

<span class="mw-page-title-main">DNA microarray</span> Collection of microscopic DNA spots attached to a solid surface

A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles of a specific DNA sequence, known as probes. These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. The original nucleic acid arrays were macro arrays approximately 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown. An example of its application is in SNPs arrays for polymorphisms in cardiovascular diseases, cancer, pathogens and GWAS analysis. It is also used for the identification of structural variations and the measurement of gene expression.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

<span class="mw-page-title-main">Metagenomics</span> Study of genes found in the environment

Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microbiomics.

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

<span class="mw-page-title-main">Bisulfite sequencing</span> Lab procedure detecting 5-methylcytosines in DNA

Bisulfitesequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied. In animals it predominantly involves the addition of a methyl group to the carbon-5 position of cytosine residues of the dinucleotide CpG, and is implicated in repression of transcriptional activity.

<span class="mw-page-title-main">16S ribosomal RNA</span> RNA component

16S ribosomal RNA is the RNA component of the 30S subunit of a prokaryotic ribosome. It binds to the Shine-Dalgarno sequence and provides most of the SSU structure.

OLIGO Primer Analysis Software is a software for DNA primer design. The first paper describing this software was published in 1989. The program is a real time PCR primer and probe search and analysis tool. It additionally performs siRNA and molecular beacon searches, open reading frame analysis, and restriction enzyme analysis. It was created and maintained by Wojciech Rychlik and Piotr Rychlik.

In molecular biology, and more importantly high-throughput DNA sequencing, a chimera is a single DNA sequence originating when multiple transcripts or DNA sequences get joined. Chimeras can be considered artifacts and be filtered out from the data during processing to prevent spurious inferences of biological variation. However, chimeras should not be confused with chimeric reads, who are generally used by structural variant callers to detect structural variation events and are not always an indication of the presence of a chimeric transcript or gene.

SOAP is a suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data.

<span class="mw-page-title-main">In silico PCR</span>

In silico PCR refers to computational tools used to calculate theoretical polymerase chain reaction (PCR) results using a given set of primers (probes) to amplify DNA sequences from a sequenced genome or transcriptome.

In metagenomics, binning is the process of grouping reads or contigs and assigning them to individual genome. Binning methods can be based on either compositional features or alignment (similarity), or both.

<span class="mw-page-title-main">Viral metagenomics</span>

Viral metagenomics uses metagenomic technologies to detect viral genomic material from diverse environmental and clinical samples. Viruses are the most abundant biological entity and are extremely diverse; however, only a small fraction of viruses have been sequenced and only an even smaller fraction have been isolated and cultured. Sequencing viruses can be challenging because viruses lack a universally conserved marker gene so gene-based approaches are limited. Metagenomics can be used to study and analyze unculturable viruses and has been an important tool in understanding viral diversity and abundance and in the discovery of novel viruses. For example, metagenomics methods have been used to describe viruses associated with cancerous tumors and in terrestrial ecosystems.

Metatranscriptomics is the set of techniques used to study gene expression of microbes within natural environments, i.e., the metatranscriptome.

TopHat is an open-source bioinformatics tool for the throughput alignment of shotgun cDNA sequencing reads generated by transcriptomics technologies using Bowtie first and then mapping to a reference genome to discover RNA splice sites de novo. TopHat aligns RNA-Seq reads to mammalian-sized genomes.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

Non-coding RNAs have been discovered using both experimental and bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA, and are thus able to discover entirely new kinds of ncRNAs.

References

  1. Wright ES (2024). "Accurately clustering biological sequences in linear time by relatedness sorting". Nature Communications. 15: 3047. doi: 10.1038/s41467-024-47371-9 . PMC   11001989 . PMID   38589369.
  2. Wright ES (2020). "RNAconTest: comparing tools for noncoding RNA multiple sequence alignment based on structural consistency". RNA. 26: 531–540. doi: 10.1261/rna.073015.119 . PMC   7161358 . PMID   32005745.
  3. Wright ES (2015). "DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment". BMC Bioinformatics. 16: 322. doi: 10.1186/s12859-015-0749-z . PMC   4595117 . PMID   26445311.
  4. Noguera DR, Wright ES, Camejo P, Yilmaz LS (2014). "Mathematical tools to optimize the design of oligonucleotide probes and primers". Applied Microbiology and Biotechnology. 98 (23): 9595–608. doi:10.1007/s00253-014-6165-x. PMID   25359473. S2CID   903222.
  5. Wright ES, Yilmaz LS, Ram S, Gasser JM, Harrington GW, Noguera DR (2014). "Exploiting extension bias in polymerase chain reaction to improve primer specificity in ensembles of nearly identical DNA templates". Environmental Microbiology. 16 (5): 1354–1365. doi:10.1111/1462-2920.12259. PMID   24750536.
  6. Wright ES, Vetsigian KH (2016). "DesignSignatures: a tool for designing primers that yields amplicons with distinct signatures". Bioinformatics. 32 (10): 1565–1567. doi: 10.1093/bioinformatics/btw047 . PMID   26803162.
  7. Wright ES, Yilmaz LS, Corcoran AM, Okten HE, Noguera DR (2014). "Automated Design of Probes for rRNA-Targeted Fluorescence In Situ Hybridization Reveals the Advantages of Using Dual Probes for Accurate Identification". Applied and Environmental Microbiology. 80 (16): 5124–5133. doi:10.1128/AEM.01685-14. PMC   4135741 . PMID   24928876.
  8. Yilmaz LS, Loy A, Wright ES, Wagner M, Noguera DR (2012). "Modeling formamide denaturation of probe-target hybrids for improved microarray probe design in microbial diagnostics". PLOS ONE. 7 (8): e43862. Bibcode:2012PLoSO...743862Y. doi: 10.1371/journal.pone.0043862 . PMC   3428302 . PMID   22952791.
  9. Wright ES, Yilmaz LS, Noguera DR (2012). "DECIPHER, a search-based approach to chimera identification for 16S rRNA sequences". Applied and Environmental Microbiology. 78 (3): 717–725. doi:10.1128/AEM.06516-11. PMC   3264099 . PMID   22101057.
  10. Murali A, Bhargava A, Wright ES (2018). "IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences". Microbiome. 6 (140): 140. doi: 10.1186/s40168-018-0521-5 . PMC   6085705 . PMID   30092815.
  11. Cooley N, Wright ES (2021). "Accurate annotation of protein coding sequences with IDTAXA". NAR Genomics and Bioinformatics. 3 (3): 1–10. doi: 10.1093/nargab/lqab080 . PMC   8445202 . PMID   34541527.
  12. Wright ES (February 2022). "FindNonCoding: rapid and simple detection of non-coding RNAs in genomes". Bioinformatics. 38 (3): 841–843. doi: 10.1093/bioinformatics/btab708 . PMC   10060727 . PMID   34636849.