Nullomers

Last updated

Nullomers are short sequences of DNA that do not occur in the genome of a species (for example, humans), even though they are theoretically possible. [1] [2] Nullomers must be under selective pressure - for example, they may be toxic to the cell. [2] Some nullomers have been shown to be useful to treat leukemia, breast, and prostate cancer. They are not useful in healthy cells because normal cells adapt and become immune to them. [2] Nullomers are also being developed for use as DNA tags to prevent cross contamination when analyzing crime scene material. [3]

Contents

Background

Nullomers are naturally available but potentially unused sequences of DNA. Determining these "forbidden" sequences can improve the understanding of the basic rules that govern sequence evolution. [4] Sequencing the entire genome has shown that there is a high level of non-uniformity in genomic sequences. When a codon is artificially substituted for a synonymous codon, it often results in a lethal change and cell death. This is believed to be due to ribosomal stalling and early termination of protein synthesis. For example, both AGA and CGA code for arginine in bacteria; however, bacteria almost never use AGA, and when substituted it proves lethal. [5] Such codon biases have been seen in all species, [6] and are examples of constraints on sequence evolution. Other sequences may have selective pressure; for example, GG-rich sequences are used as sacrificial sinks for oxidative damage because oxidizing agents are attracted to regions with GG-rich sequences and then induce strand breakage. [7] Moreover, it has been shown that statistically significant nullomers (i.e. absent short sequences which are highly expected to exist) in virus genomes are restriction recognition sites indicating that viruses have probably got rid of these motifs to facilitate invasion of bacterial hosts. [8] Nullomers Database provides a comprehensive collection of minimal absent sequences from hundreds of species and viruses as well as the human and mouse proteomes.

Sequence of Human nullomers of 11bp in length [4]
No occurrence in the Human GenomeCGCTCGACGTA, GTCCGAGCGTA, CGACGAACGGT, CCGATACGTCG
One occurrence in the Human GenomeTACGCGCGACA, CGCGACGCATA, TCGGTACGCTA, TCGCGACCGTA, CGATCGTGCGA, CGCGTATCGGT
Two occurrences in the Human GenomeCGTCGCTCGAA, TCGCGCGAATA, TCGACGCGATA, ATCGTCGACGA, CTACGCGTCGA, CGTATACGCGA, CGATTACGCGA, CGATTCGGCGA, CGACGTACCGT, CGACGAACGAG, CGCGTAATACG, CGCGCTATACG
Three occurrences in the Human GenomeCGCGCATAATA, CGACGGCAGTA, CGAATCGCGTA, CGGTCGTACGA, GCGCGTACCGA, CGCGTAATCGA, CGTCGTTCGAC, CCGTCGAACGC, ACGCGCGATAT, CGAACGGTCGT, CGCGTAACGCG, CCGAATACGCG, CATATCGCGCG
Table of the number of nullomers present in different organisms and the nullomer length [4]
Organism10bp11bp12bp13bp
Arabidopsis 10723646116701220237388
C Elegans 27686115203823339534
Chicken 25901315154722702
Chimpanzee 0136459382426474
Cow 096450602432554
Dog 040252171868964
Fruitfly 020622161612399300
Human 080398522232448
Mouse 0178543832625646
Rat 050307081933220
Zebrafish 02155612469558

Cancer Treatment

Nullomers have been used as an approach to drug discovery and development. Nullomer peptides were screened for anti-cancer action. Absent sequences have short polyarginine tails added to increase solubility and uptake into the cell, producing peptides called PolyArgNulloPs. One successful sequence, RRRRRNWMWC, was demonstrated to have lethal effects in breast and prostate cancer. It damaged mitochondria by increasing ROS production, which reduced ATP production, leading to cell growth inhibition and cell death. Normal cells show a decreased sensitivity to PolyArgNulloPs over time. [2]

Forensics

Accidental transfer of biological material containing DNA can produce misleading results. This is a particularly important consideration in forensic and crime labs, where mistakes can cause an innocent person to be convicted of a crime. There was no way to detect if a reference sample was mislabeled as evidence or if a forensic sample is contaminated, but a nullomer barcode can be added to reference samples to distinguish them from evidence on analysis. Tagging can be carried out during sample collection without affecting genotype or quantification results. Impregnated filter paper with various nullomers can be used to soak up and store DNA samples from a crime scene, making the technology simple and effective. [3] Tagging with nullomers can be detected—even when diluted to a million-fold and spilled on evidence, these tags are still clearly detected. [3] Tagging in this way supports National Research Council's recommendations on quality control to reduce fraud and mistakes. [3]

Related Research Articles

Stop codon Codon that marks the end of a protein-coding sequence

In molecular biology, a stop codon is a codon that signals the termination of the translation process of the current protein. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide chain, which may ultimately become a protein; stop codons signal the termination of this process by binding release factors, which cause the ribosomal subunits to disassociate, releasing the amino acid chain.

Central dogma of molecular biology Explanation of the flow of genetic information within a biological system

The central dogma of molecular biology is an explanation of the flow of genetic information within a biological system. It is often stated as "DNA makes RNA, and RNA makes protein", although this is not its original meaning. It was first stated by Francis Crick in 1957, then published in 1958:

The Central Dogma. This states that once "information" has passed into protein it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.

Gene expression Conversion of a genes sequence into a mature gene product or products

Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) and small nuclear RNA (snRNA), the product is a functional non-coding RNA. Gene expression is summarized in the central dogma of molecular biology first formulated by Francis Crick in 1958, further developed in his 1970 article, and expanded by the subsequent discoveries of reverse transcription and RNA replication.

The coding region of a gene, also known as the coding DNA sequence(CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

Translation (biology) Cellular process of protein synthesis

In molecular biology and genetics, translation is the process in which ribosomes in the cytoplasm or endoplasmic reticulum synthesize proteins after the process of transcription of DNA to RNA in the cell's nucleus. The entire process is called gene expression.

In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases. EST approaches have largely been superseded by whole genome and transcriptome sequencing and metagenome sequencing.

Transfer RNA RNA that facilitates the addition of amino acids to a new protein

Transfer RNA is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length, that serves as the physical link between the mRNA and the amino acid sequence of proteins. Transfer RNA (tRNA) does this by carrying an amino acid to the protein synthesizing machinery of a cell called the ribosome. Complementation of a 3-nucleotide codon in a messenger RNA (mRNA) by a 3-nucleotide anticodon of the tRNA results in protein synthesis based on the mRNA code. As such, tRNAs are a necessary component of translation, the biological synthesis of new proteins in accordance with the genetic code.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

In molecular biology, an amplicon is a piece of DNA or RNA that is the source and/or product of amplification or replication events. It can be formed artificially, using various methods including polymerase chain reactions (PCR) or ligase chain reactions (LCR), or naturally through gene duplication. In this context, amplification refers to the production of one or more copies of a genetic fragment or target sequence, specifically the amplicon. As it refers to the product of an amplification reaction, amplicon is used interchangeably with common laboratory terms, such as "PCR product."

Xenobiology (XB) is a subfield of synthetic biology, the study of synthesizing and manipulating biological devices and systems. The name "xenobiology" derives from the Greek word xenos, which means "stranger, alien". Xenobiology is a form of biology that is not (yet) familiar to science and is not found in nature. In practice, it describes novel biological systems and biochemistries that differ from the canonical DNA–RNA-20 amino acid system. For example, instead of DNA or RNA, XB explores nucleic acid analogues, termed xeno nucleic acid (XNA) as information carriers. It also focuses on an expanded genetic code and the incorporation of non-proteinogenic amino acids into proteins.

Transfer-messenger RNA

Transfer-messenger RNA is a bacterial RNA molecule with dual tRNA-like and messenger RNA-like properties. The tmRNA forms a ribonucleoprotein complex (tmRNP) together with Small Protein B (SmpB), Elongation Factor Tu (EF-Tu), and ribosomal protein S1. In trans-translation, tmRNA and its associated proteins bind to bacterial ribosomes which have stalled in the middle of protein biosynthesis, for example when reaching the end of a messenger RNA which has lost its stop codon. The tmRNA is remarkably versatile: it recycles the stalled ribosome, adds a proteolysis-inducing tag to the unfinished polypeptide, and facilitates the degradation of the aberrant messenger RNA. In the majority of bacteria these functions are carried out by standard one-piece tmRNAs. In other bacterial species, a permuted ssrA gene produces a two-piece tmRNA in which two separate RNA chains are joined by base-pairing.

Triple-stranded DNA DNA structure

Triple-stranded DNA is a DNA structure in which three oligonucleotides wind around each other and form a triple helix. In triple-stranded DNA, the third strand binds to a B-form DNA double helix by forming Hoogsteen base pairs or reversed Hoogsteen hydrogen bonds.

Oncogenomics Sub-field of genomics

Oncogenomics is a sub-field of genomics that characterizes cancer-associated genes. It focuses on genomic, epigenomic and transcript alterations in cancer.

Untranslated region Non-coding regions on either end of mRNA

In molecular genetics, an untranslated region refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR, or if it is found on the 3' side, it is called the 3' UTR. mRNA is RNA that carries information from DNA to the ribosome, the site of protein synthesis (translation) within a cell. The mRNA is initially transcribed from the corresponding DNA sequence and then translated into protein. However, several regions of the mRNA are usually not translated into protein, including the 5' and 3' UTRs.

Ribosomal frameshifting, also known as translational frameshifting or translational recoding, is a biological phenomenon that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. The process can be programmed by the nucleotide sequence of the mRNA and is sometimes affected by the secondary, 3-dimensional mRNA structure. It has been described mainly in viruses, retrotransposons and bacterial insertion elements, and also in some cellular genes.

DNA and RNA codon tables List of standard rules to translate DNA encoded information into proteins

A codon table can be used to translate a genetic code into a sequence of amino acids. The standard genetic code is traditionally represented as an RNA codon table, because when proteins are made in a cell by ribosomes, it is messenger RNA (mRNA) that directs protein synthesis. The mRNA sequence is determined by the sequence of genomic DNA. In this context, the standard genetic code is referred to as translation table 1. It can also be represented in a DNA codon table. The DNA codons in such tables occur on the sense DNA strand and are arranged in a 5′-to-3′ direction. Different tables with alternate codons are used depending on the source of the genetic code, such as from a cell nucleus, mitochondrion, plastid, or hydrogenosome.

Periannan Senapathy is a molecular biologist, geneticist, author and entrepreneur. He is the founder, president and chief scientific officer at Genome International Corporation, a biotechnology, bioinformatics, and information technology firm based in Madison, Wisconsin, which develops computational genomics applications of next-generation DNA sequencing (NGS) and clinical decision support systems for analyzing patient genome data that aids in diagnosis and treatment of diseases.

Ribosome profiling

Ribosome profiling, or Ribo-Seq, is an adaptation of a technique developed by Joan Steitz and Marilyn Kozak almost 50 years ago that Nicholas Ingolia and Jonathan Weissman adapted to work with next generation sequencing that uses specialized messenger RNA (mRNA) sequencing to determine which mRNAs are being actively translated. A related technique that can also be used to determine which mRNAs are being actively translated is the Translating Ribosome Affinity Purification (TRAP) methodology, which was developed by Nathaniel Heintz at Rockefeller University. TRAP does not involve ribosome footprinting but provides cell type-specific information.

The ascidian mitochondrial code is a genetic code found in the mitochondria of Ascidia.

References

  1. Acquisti, Claudia; Poste, George; Curtiss, David; Kumar, Sudhir (2007). Salzberg, Steven (ed.). "Nullomers: Really a Matter of Natural Selection?". PLOS ONE. 2 (10): e1022. Bibcode:2007PLoSO...2.1022A. doi: 10.1371/journal.pone.0001022 . PMC   1995752 . PMID   17925870. Open Access logo PLoS transparent.svg
  2. 1 2 3 4 Alileche, Abdelkrim; Goswami, Jayita; Bourland, William; Davis, Michael; Hampikian, Greg (2012). "Nullomer derived anticancer peptides (NulloPs): Differential lethal effects on normal and cancer cells in vitro". Peptides. 38 (2): 302–11. doi:10.1016/j.peptides.2012.09.015. PMID   23000474. S2CID   4207067.
  3. 1 2 3 4 Goswami, Jayita; Davis, Michael C.; Andersen, Tim; Alileche, Abdelkrim; Hampikian, Greg (2013). "Safeguarding forensic DNA reference samples with nullomer barcodes". Journal of Forensic and Legal Medicine. 20 (5): 513–519. doi:10.1016/j.jflm.2013.02.003. PMID   23756524.
  4. 1 2 3 Hampikian, Greg; Andersen, Tim (2007). "Absent Sequences: Nullomers and Primes". Pacific Symposium on Biocomputing: 355–66. doi:10.1142/9789812772435_0034. ISBN   978-981-270-417-7. PMID   17990505.
  5. Cruz-Vera, Luis Rogelio; Magos-Castro, Marco Antonio; Zamora-Romo, Efraín; Guarneros, Gabriel (2004). "Ribosome stalling and peptidyl-tRNA drop-off during translational delay at AGA codons". Nucleic Acids Research. 32 (15): 4462–8. doi:10.1093/nar/gkh784. PMC   516057 . PMID   15317870.
  6. dos Reis, Mario; Savva, Renos; Wernisch, Lorenz (2004). "Solving the riddle of codon usage preferences: A test for translational selection". Nucleic Acids Research. 32 (17): 5036–44. doi:10.1093/nar/gkh834. PMC   521650 . PMID   15448185.
  7. Friedman, Keith A.; Heller, Adam (2001). "On the Non-Uniform Distribution of Guanine in Introns of Human Genes: Possible Protection of Exons against Oxidation by Proximal Intron Poly-G Sequences". The Journal of Physical Chemistry B. 105 (47): 11859–65. doi:10.1021/jp012043n.
  8. Koulouras, Grigorios; Frith, Martin C (2021-04-06). "Significant non-existence of sequences in genomes and proteomes". Nucleic Acids Research. 49 (6): 3139–3155. doi: 10.1093/nar/gkab139 . ISSN   0305-1048. PMC   8034619 . PMID   33693858.