DECIPHER (software)

Last updated
DECIPHER
Developer(s) Erik Wright
Stable release
2.24.0 / 2022 (2022)
Written in R, C
Operating system Unix, Linux, macOS, Windows
Platform IA-32, x86-64
Available inEnglish
Type Bioinformatics
License GPL 3
Website decipher.codes

DECIPHER is a software toolset that can be used to decipher and manage biological sequences efficiently using the programming language R. Some functions of the program are accessible online through web tools.

Contents

Features

See also

Related Research Articles

Oligonucleotides are short DNA or RNA molecules, oligomers, that have a wide range of applications in genetic testing, research, and forensics. Commonly made in the laboratory by solid-phase chemical synthesis, these small bits of nucleic acids can be manufactured as single-stranded molecules with any user-specified sequence, and so are vital for artificial gene synthesis, polymerase chain reaction (PCR), DNA sequencing, molecular cloning and as molecular probes. In nature, oligonucleotides are usually found as small RNA molecules that function in the regulation of gene expression, or are degradation intermediates derived from the breakdown of larger nucleic acid molecules.

<span class="mw-page-title-main">DNA microarray</span> Collection of microscopic DNA spots attached to a solid surface

A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles of a specific DNA sequence, known as probes. These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. The original nucleic acid arrays were macro arrays approximately 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown. An example of its application is in SNPs arrays for polymorphisms in cardiovascular diseases, cancer, pathogens and GWAS analysis. It is also used for the identification of structural variations and the measurement of gene expression.

The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription.

<span class="mw-page-title-main">Metagenomics</span> Study of genes found in the environment

Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microbiomics.

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

<span class="mw-page-title-main">Bisulfite sequencing</span> Lab procedure detecting 5-methylcytosines in DNA

Bisulfitesequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied. In animals it predominantly involves the addition of a methyl group to the carbon-5 position of cytosine residues of the dinucleotide CpG, and is implicated in repression of transcriptional activity.

<span class="mw-page-title-main">C0719 RNA</span>

The C0719 RNA is a bacterial non-coding RNA of 222 nucleotides in length that is found between the yghK and glcB genes in the genomes of Escherichia coli and Shigella flexneri. This non-coding RNA was originally identified in E.coli using high-density oligonucleotide probe arrays (microarray.) The function of this ncRNA is unknown.

<span class="mw-page-title-main">16S ribosomal RNA</span> RNA component

16S ribosomal RNA is the RNA component of the 30S subunit of a prokaryotic ribosome. It binds to the Shine-Dalgarno sequence and provides most of the SSU structure.

<span class="mw-page-title-main">Human Microbiome Project</span> Former research initiative

The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbiota. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.

OLIGO Primer Analysis Software is a software for DNA primer design. The first paper describing this software was published in 1989. The program is a real time PCR primer and probe search and analysis tool, in addition to siRNA and molecular beacon searches, open reading frame, restriction enzyme analysis. It was created and maintained by Wojciech Rychlik and Piotr Rychlik.

In molecular biology, and more importantly high-throughput DNA sequencing, a chimera is a single DNA sequence originating when multiple transcripts or DNA sequences get joined. Chimeras can be considered artifacts and be filtered out from the data during processing to prevent spurious inferences of biological variation. However, chimeras should not be confused with chimeric reads, who are generally used by structural variant callers to detect structural variation events and are not always an indication of the presence of a chimeric transcript or gene.

SOAP is a suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data.

Bacterial small RNAs (bsRNA) are small RNAs produced by bacteria; they are 50- to 500-nucleotide non-coding RNA molecules, highly structured and containing several stem-loops. Numerous sRNAs have been identified using both computational analysis and laboratory-based techniques such as Northern blotting, microarrays and RNA-Seq in a number of bacterial species including Escherichia coli, the model pathogen Salmonella, the nitrogen-fixing alphaproteobacterium Sinorhizobium meliloti, marine cyanobacteria, Francisella tularensis, Streptococcus pyogenes, the pathogen Staphylococcus aureus, and the plant pathogen Xanthomonas oryzae pathovar oryzae. Bacterial sRNAs affect how genes are expressed within bacterial cells via interaction with mRNA or protein, and thus can affect a variety of bacterial functions like metabolism, virulence, environmental stress response, and structure.

<i>Escherichia coli</i> sRNA

Escherichia coli contains a number of small RNAs located in intergenic regions of its genome. The presence of at least 55 of these has been verified experimentally. 275 potential sRNA-encoding loci were identified computationally using the QRNA program. These loci will include false positives, so the number of sRNA genes in E. coli is likely to be less than 275. A computational screen based on promoter sequences recognised by the sigma factor sigma 70 and on Rho-independent terminators predicted 24 putative sRNA genes, 14 of these were verified experimentally by northern blotting. The experimentally verified sRNAs included the well characterised sRNAs RprA and RyhB. Many of the sRNAs identified in this screen, including RprA, RyhB, SraB and SraL, are only expressed in the stationary phase of bacterial cell growth. A screen for sRNA genes based on homology to Salmonella and Klebsiella identified 59 candidate sRNA genes. From this set of candidate genes, microarray analysis and northern blotting confirmed the existence of 17 previously undescribed sRNAs, many of which bind to the chaperone protein Hfq and regulate the translation of RpoS. UptR sRNA transcribed from the uptR gene is implicated in suppressing extracytoplasmic toxicity by reducing the amount of membrane-bound toxic hybrid protein.

<span class="mw-page-title-main">In silico PCR</span>

In silico PCR refers to computational tools used to calculate theoretical polymerase chain reaction (PCR) results using a given set of primers (probes) to amplify DNA sequences from a sequenced genome or transcriptome.

Metatranscriptomics is the set of techniques used to study gene expression of microbes within natural environments, i.e., the metatranscriptome.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

<span class="mw-page-title-main">CarA ncRNA motif</span>

The carA non-coding RNA (ncRNA) is an RNA motif proposed as a Strong Riboswitch Candidate (SRC). CarA ncRNA has been recognized by a comparative sequence analysis in GC-rich intergenic regions (IGR) of bacteria, using a pipeline call Discovery of Intergenic Motifs PipeLine (DIMPL). CarA ncRNA was located upstream of carA gene which codes for the small subunit of carbamoyl phosphate synthase, which is an enzyme that catalyzes the first committed step in pyrimidine and arginine biosynthesis. CarA ncRNA has been found in bacteria of the class beta proteobacteria, particularly in Polynucleobacter genus. Its proposed secondary structure consists of an extended imperfect hairpin that is immediately upstream of the predicted ribosome binding site (RBS) of the adjacent open reading frame (ORF) suggesting a possible cis-regulatory function where ligand binding regulates translation initiation. CarA ncRNA motif, was reported twice, carA was recognised in Polynucleobacter necessarius genome, and carA-2 in a genome of Beta proteobacterium CB.

References

  1. Wright ES (2015). "DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment". BMC Bioinformatics. 16: 322. doi:10.1186/s12859-015-0749-z. PMC   4595117 . PMID   26445311.
  2. Noguera DR, Wright ES, Camejo P, Yilmaz LS (2014). "Mathematical tools to optimize the design of oligonucleotide probes and primers". Applied Microbiology and Biotechnology. 98 (23): 9595–608. doi:10.1007/s00253-014-6165-x. PMID   25359473. S2CID   903222.
  3. Wright ES, Yilmaz LS, Ram S, Gasser JM, Harrington GW, Noguera DR (2014). "Exploiting extension bias in polymerase chain reaction to improve primer specificity in ensembles of nearly identical DNA templates". Environmental Microbiology. 16 (5): 1354–1365. doi:10.1111/1462-2920.12259. PMID   24750536.
  4. Wright ES, Vetsigian KH (2016). "DesignSignatures: a tool for designing primers that yields amplicons with distinct signatures". Bioinformatics. 32 (10): 1565–1567. doi: 10.1093/bioinformatics/btw047 . PMID   26803162.
  5. Wright ES, Yilmaz LS, Corcoran AM, Okten HE, Noguera DR (2014). "Automated Design of Probes for rRNA-Targeted Fluorescence In Situ Hybridization Reveals the Advantages of Using Dual Probes for Accurate Identification". Applied and Environmental Microbiology. 80 (16): 5124–5133. doi:10.1128/AEM.01685-14. PMC   4135741 . PMID   24928876.
  6. Yilmaz LS, Loy A, Wright ES, Wagner M, Noguera DR (2012). "Modeling formamide denaturation of probe-target hybrids for improved microarray probe design in microbial diagnostics". PLOS ONE. 7 (8): e43862. Bibcode:2012PLoSO...743862Y. doi: 10.1371/journal.pone.0043862 . PMC   3428302 . PMID   22952791.
  7. Wright ES, Yilmaz LS, Noguera DR (2012). "DECIPHER, a search-based approach to chimera identification for 16S rRNA sequences". Applied and Environmental Microbiology. 78 (3): 717–725. doi:10.1128/AEM.06516-11. PMC   3264099 . PMID   22101057.
  8. Murali A, Bhargava A, Wright ES (2018). "IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences". Microbiome. 6 (140): 140. doi:10.1186/s40168-018-0521-5. PMC   6085705 . PMID   30092815.
  9. Wright, E (2021). "FindNonCoding: rapid and simple detection of non-coding RNAs in genomes". Bioinformatics. Oct12: btab708. doi:10.1093/bioinformatics/btab708. PMC   10060727 . PMID   34636849.