Nullomers are short sequences of DNA that do not occur in the genome of a species (for example, humans), even though they are theoretically possible. [1] [2] Nullomers must be under selective pressure - for example, they may be toxic to the cell. [2] Some nullomers have been shown to be useful to treat leukemia, breast, and prostate cancer. They are not useful in healthy cells because normal cells adapt and become immune to them. [2] Nullomers are also being developed for use as DNA tags to prevent cross contamination when analyzing crime scene material. [3]
Nullomers are naturally occurring but potentially unused sequences of DNA. Determining these "forbidden" sequences can improve the understanding of the basic rules that govern sequence evolution. [4] Sequencing entire genomes has shown that there is a high level of non-uniformity in genomic sequences. When a codon is artificially substituted with a synonymous codon, it often results in a lethal change and cell death. This is believed to be due to ribosomal stalling and early termination of protein synthesis. For example, both AGA and CGA code for arginine in bacteria; however, bacteria almost never use AGA, and when substituted it proves lethal. [5] Such codon biases have been observed in all species, [6] and are examples of constraints on sequence evolution. Other sequences may have selective pressure; for example, GG-rich sequences are used as sacrificial sinks for oxidative damage because oxidizing agents are attracted to regions with GG-rich sequences and then induce strand breakage. [7] Moreover, it has been shown that statistically significant nullomers (i.e. absent short sequences which are highly expected to exist) in virus genomes are restriction recognition sites indicating that viruses have probably got rid of these motifs to facilitate invasion of bacterial hosts. [8] Nullomers Database provides a comprehensive collection of minimal absent sequences from hundreds of species and viruses as well as the human and mouse proteomes.
No occurrence in the Human Genome | CGCTCGACGTA, GTCCGAGCGTA, CGACGAACGGT, CCGATACGTCG |
---|---|
One occurrence in the Human Genome | TACGCGCGACA, CGCGACGCATA, TCGGTACGCTA, TCGCGACCGTA, CGATCGTGCGA, CGCGTATCGGT |
Two occurrences in the Human Genome | CGTCGCTCGAA, TCGCGCGAATA, TCGACGCGATA, ATCGTCGACGA, CTACGCGTCGA, CGTATACGCGA, CGATTACGCGA, CGATTCGGCGA, CGACGTACCGT, CGACGAACGAG, CGCGTAATACG, CGCGCTATACG |
Three occurrences in the Human Genome | CGCGCATAATA, CGACGGCAGTA, CGAATCGCGTA, CGGTCGTACGA, GCGCGTACCGA, CGCGTAATCGA, CGTCGTTCGAC, CCGTCGAACGC, ACGCGCGATAT, CGAACGGTCGT, CGCGTAACGCG, CCGAATACGCG, CATATCGCGCG |
Organism | 10bp | 11bp | 12bp | 13bp |
---|---|---|---|---|
Arabidopsis | 107 | 23646 | 1167012 | 20237388 |
C Elegans | 2 | 7686 | 1152038 | 23339534 |
Chicken | 2 | 590 | 131515 | 4722702 |
Chimpanzee | 0 | 136 | 45938 | 2426474 |
Cow | 0 | 96 | 45060 | 2432554 |
Dog | 0 | 40 | 25217 | 1868964 |
Fruitfly | 0 | 206 | 221616 | 12399300 |
Human | 0 | 80 | 39852 | 2232448 |
Mouse | 0 | 178 | 54383 | 2625646 |
Rat | 0 | 50 | 30708 | 1933220 |
Zebrafish | 0 | 2 | 15561 | 2469558 |
Nullomers have been used as an approach to drug discovery and development. Nullomer peptides were screened for anti-cancer action. Absent sequences have short polyarginine tails added to increase solubility and uptake into the cell, producing peptides called PolyArgNulloPs. One successful sequence, RRRRRNWMWC, was demonstrated to have lethal effects in breast and prostate cancer. It damaged mitochondria by increasing ROS production, which reduced ATP production, leading to cell growth inhibition and cell death. Normal cells show a decreased sensitivity to PolyArgNulloPs over time. [2]
Accidental transfer of biological material containing DNA can produce misleading results. This is a particularly important consideration in forensic and crime labs, where mistakes can cause an innocent person to be convicted of a crime. There was no way to detect if a reference sample was mislabeled as evidence or if a forensic sample is contaminated, but a nullomer barcode can be added to reference samples to distinguish them from evidence on analysis. Tagging can be carried out during sample collection without affecting genotype or quantification results. Impregnated filter paper with various nullomers can be used to soak up and store DNA samples from a crime scene, making the technology simple and effective. [3] Tagging with nullomers can be detected—even when diluted to a million-fold and spilled on evidence, these tags are still clearly detected. [3] Tagging in this way supports National Research Council's recommendations on quality control to reduce fraud and mistakes. [3]
The genetic code is the set of rules used by living cells to translate information encoded within genetic material into proteins. Translation is accomplished by the ribosome, which links proteinogenic amino acids in an order specified by messenger RNA (mRNA), using transfer RNA (tRNA) molecules to carry amino acids and to read the mRNA three nucleotides at a time. The genetic code is highly similar among all organisms and can be expressed in a simple table with 64 entries.
The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. Proteomics is the study of the proteome.
The central dogma of molecular biology deals with the flow of genetic information within a biological system. It is often stated as "DNA makes RNA, and RNA makes protein", although this is not its original meaning. It was first stated by Francis Crick in 1957, then published in 1958:
The Central Dogma. This states that once "information" has passed into protein it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information here means the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, and ultimately affect a phenotype. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) and small nuclear RNA (snRNA), the product is a functional non-coding RNA. The process of gene expression is used by all known life—eukaryotes, prokaryotes, and utilized by viruses—to generate the macromolecular machinery for life.
The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.
In biology, translation is the process in living cells in which proteins are produced using RNA molecules as templates. The generated protein is a sequence of amino acids. This sequence is determined by the sequence of nucleotides in the RNA. The nucleotides are considered three at a time. Each such triple results in addition of one specific amino acid to the protein being generated. The matching from nucleotide triple to amino acid is called the genetic code. The translation is performed by a large complex of functional RNA and proteins called ribosomes. The entire process is called gene expression.
In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases. EST approaches have largely been superseded by whole genome and transcriptome sequencing and metagenome sequencing.
Transfer RNA is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length. In a cell, it provides the physical link between the genetic code in messenger RNA (mRNA) and the amino acid sequence of proteins, carrying the correct sequence of amino acids to be combined by the protein-synthesizing machinery, the ribosome. Each three-nucleotide codon in mRNA is complemented by a three-nucleotide anticodon in tRNA. As such, tRNAs are a necessary component of translation, the biological synthesis of new proteins in accordance with the genetic code.
In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.
In molecular biology, a library is a collection of genetic material fragments that are stored and propagated in a population of microbes through the process of molecular cloning. There are different types of DNA libraries, including cDNA libraries, genomic libraries and randomized mutant libraries. DNA library technology is a mainstay of current molecular biology, genetic engineering, and protein engineering, and the applications of these libraries depend on the source of the original DNA fragments. There are differences in the cloning vectors and techniques used in library preparation, but in general each DNA fragment is uniquely inserted into a cloning vector and the pool of recombinant DNA molecules is then transferred into a population of bacteria or yeast such that each organism contains on average one construct. As the population of organisms is grown in culture, the DNA molecules contained within them are copied and propagated.
In molecular biology, an amplicon is a piece of DNA or RNA that is the source and/or product of amplification or replication events. It can be formed artificially, using various methods including polymerase chain reactions (PCR) or ligase chain reactions (LCR), or naturally through gene duplication. In this context, amplification refers to the production of one or more copies of a genetic fragment or target sequence, specifically the amplicon. As it refers to the product of an amplification reaction, amplicon is used interchangeably with common laboratory terms, such as "PCR product."
In molecular biology, reading frames are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames will be "open". Such an ORF may contain a start codon and by definition cannot extend beyond a stop codon. That start codon indicates where translation may start. The transcription termination site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete protein would be made during translation.
Xenobiology (XB) is a subfield of synthetic biology, the study of synthesizing and manipulating biological devices and systems. The name "xenobiology" derives from the Greek word xenos, which means "stranger, alien". Xenobiology is a form of biology that is not (yet) familiar to science and is not found in nature. In practice, it describes novel biological systems and biochemistries that differ from the canonical DNA–RNA-20 amino acid system. For example, instead of DNA or RNA, XB explores nucleic acid analogues, termed xeno nucleic acid (XNA) as information carriers. It also focuses on an expanded genetic code and the incorporation of non-proteinogenic amino acids, or “xeno amino acids” into proteins.
Triple-stranded DNA is a DNA structure in which three oligonucleotides wind around each other and form a triple helix. In triple-stranded DNA, the third strand binds to a B-form DNA double helix by forming Hoogsteen base pairs or reversed Hoogsteen hydrogen bonds.
Oncogenomics is a sub-field of genomics that characterizes cancer-associated genes. It focuses on genomic, epigenomic and transcript alterations in cancer.
In molecular genetics, an untranslated region refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR, or if it is found on the 3' side, it is called the 3' UTR. mRNA is RNA that carries information from DNA to the ribosome, the site of protein synthesis (translation) within a cell. The mRNA is initially transcribed from the corresponding DNA sequence and then translated into protein. However, several regions of the mRNA are usually not translated into protein, including the 5' and 3' UTRs.
A codon table can be used to translate a genetic code into a sequence of amino acids. The standard genetic code is traditionally represented as an RNA codon table, because when proteins are made in a cell by ribosomes, it is messenger RNA (mRNA) that directs protein synthesis. The mRNA sequence is determined by the sequence of genomic DNA. In this context, the standard genetic code is referred to as translation table 1. It can also be represented in a DNA codon table. The DNA codons in such tables occur on the sense DNA strand and are arranged in a 5′-to-3′ direction. Different tables with alternate codons are used depending on the source of the genetic code, such as from a cell nucleus, mitochondrion, plastid, or hydrogenosome.
In the fields of geometry and biochemistry, a triple helix is a set of three congruent geometrical helices with the same axis, differing by a translation along the axis. This means that each of the helices keeps the same distance from the central axis. As with a single helix, a triple helix may be characterized by its pitch, diameter, and handedness. Examples of triple helices include triplex DNA, triplex RNA, the collagen helix, and collagen-like proteins.
Ribosome profiling, or Ribo-Seq, is an adaptation of a technique developed by Joan Steitz and Marilyn Kozak almost 50 years ago that Nicholas Ingolia and Jonathan Weissman adapted to work with next generation sequencing that uses specialized messenger RNA (mRNA) sequencing to determine which mRNAs are being actively translated. A related technique that can also be used to determine which mRNAs are being actively translated is the Translating Ribosome Affinity Purification (TRAP) methodology, which was developed by Nathaniel Heintz at Rockefeller University. TRAP does not involve ribosome footprinting but provides cell type-specific information.
The ascidian mitochondrial code is a genetic code found in the mitochondria of Ascidia.