An ultra-conserved element (UCE) is a region of DNA that is identical in at least two different species.One of the first studies of UCEs showed that certain human DNA sequences of length 200 nucleotides or greater were entirely conserved (identical nucleic acid sequence) in human, rats, and mice. Despite often being noncoding DNA, some ultra-conserved elements have been found to be transcriptionally active, giving non-coding RNA molecules.
Perfect conservation of these long stretches of DNA is thought to imply evolutionary importance as these regions appear to have experienced strong negative (purifying) selection for 300-400 million years.The probability of finding ultra-conserved elements by chance (under neutral evolution) has been estimated at less than 10−22 in 2.9 billion bases.
481 ultra-conserved elements have been identified in the human genome.A database collecting genomic information about ultra-conserved elements (UCbase) that share 100% identity among human, mouse and rat is available at http://ucbase.unimore.it. A small number of those which are transcribed have been connected with human carcinomas and leukemias. For example, TUC338 is strongly upregulated in human hepatocellular carcinoma cells. Indeed, UCEs are often affected by copy number variation in cancer cells, much more than in healthy contexts, suggesting that altering the copy number of ultraconserved elements may be deleterious and associated with cancer. A study comparing ultra-conserved elements between humans and the Japanese puffer fish Takifugu rubripes proposed an importance in vertebrate development. Several ultra-conserved elements are located near transcriptional regulators or developmental genes. Other functions include enhancing and splicing regulation. Double-knockouts of UCEs near the ARX gene in mice caused a shrunken hippocampus in the brain. The knockout effects are not lethal in laboratory mice, but could be in the wild.
In the fields of molecular biology and genetics, a genome is all genetic material of an organism. It consists of DNA. The genome includes both the genes and the noncoding DNA, as well as mitochondrial DNA and chloroplast DNA. The study of the genome is called genomics.
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome, and the mitochondrial genome. Human genomes include both protein-coding DNA genes and noncoding DNA. Haploid human genomes, which are contained in germ cells consist of three billion DNA base pairs, while diploid genomes have twice the DNA content. While there are significant differences among the genomes of human individuals, these are considerably smaller than the differences between humans and their closest living relatives, the bonobos and chimpanzees.
Non-coding DNA sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functions of non-coding DNA include the transcriptional and translational regulation of protein-coding sequences, scaffold attachment regions, origins of DNA replication, centromeres and telomeres. Its RNA counterpart is non-coding RNA.
A non-coding RNA (ncRNA) is an RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non-coding RNAs include transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs), as well as small RNAs such as microRNAs, siRNAs, piRNAs, snoRNAs, snRNAs, exRNAs, scaRNAs and the long ncRNAs such as Xist and HOTAIR.
The coding region of a gene, also known as the CDS, is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.
Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by DNA duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons.
An Alu element is a short stretch of DNA originally characterized by the action of the Arthrobacter luteus (Alu) restriction endonuclease. Alu elements are the most abundant transposable elements, containing over one million copies dispersed throughout the human genome. Alu elements were thought to be selfish or parasitic DNA, because their sole known function is self reproduction. However they are likely to play a role in evolution and have been used as genetic markers. They are derived from the small cytoplasmic 7SL RNA, a component of the signal recognition particle. Alu elements are highly conserved within primate genomes and originated in the genome of an ancestor of Supraprimates.
Retrotransposons are a type of genetic component that copy and paste themselves into different genomic locations (transposon) by converting RNA back into DNA through the process reverse transcription using an RNA transposition intermediate.
In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids or proteins across species, or within a genome, or between donor and receptor taxa. Conservation indicates that a sequence has been maintained by natural selection.
In biology, a gene is a sequence of nucleotides in DNA or RNA that encodes the synthesis of a gene product, either RNA or protein.
The miR-16 microRNA precursor family is a group of related small non-coding RNA genes that regulates gene expression. miR-16, miR-15, mir-195 and miR-497 are related microRNA precursor sequences from the mir-15 gene family. This microRNA family appears to be vertebrate specific and its members have been predicted or experimentally validated in a wide range of vertebrate species.
In molecular biology miR-181 microRNA precursor is a small non-coding RNA molecule. MicroRNAs (miRNAs) are transcribed as ~70 nucleotide precursors and subsequently processed by the RNase-III type enzyme Dicer to give a ~22 nucleotide mature product. In this case the mature sequence comes from the 5' arm of the precursor. They target and modulate protein expression by inhibiting translation and / or inducing degradation of target messenger RNAs. This new class of genes has recently been shown to play a central role in malignant transformation. miRNA are downregulated in many tumors and thus appear to function as tumor suppressor genes. The mature products miR-181a, miR-181b, miR-181c or miR-181d are thought to have regulatory roles at posttranscriptional level, through complementarity to target mRNAs. miR-181 which has been predicted or experimentally confirmed in a wide number of vertebrate species as rat, zebrafish, and in the pufferfish.
Poly(rC)-binding protein 2 is a protein that in humans is encoded by the PCBP2 gene.
Long non-coding RNAs are a type of RNA, defined as being transcripts with lengths exceeding 200 nucleotides that are not translated into protein. This somewhat arbitrary limit distinguishes long ncRNAs from small non-coding RNAs such as microRNAs (miRNAs), small interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and other short RNAs. Long intervening/intergenic noncoding RNAs (lincRNAs) are sequences of lncRNA which do not overlap protein-coding genes.
UCbase is a database of ultraconserved sequences that were first described by Bejerano, G. et al. in 2004. They are highly conserved genome regions that share 100% identity among human, mouse and rat. UCRs are 481 sequences longer than 200 bases. They are frequently located at genomic regions involved in cancer, differentially expressed in human leukemias and carcinomas and in some instances regulated by microRNAs. The first release of UCbase was published by Taccioli, C. et al. in 2009. Recent updates include new annotation based on hg19 Human genome, information about disorders related to the chromosome coordinates using the SNOMED CT classification, a query tool to search for SNPs, and a new text box to directly interrogate the database using a MySQL interface. Moreover, a sequence comparison tool allows the researchers to match selected sequences against ultraconserved elements located in genomic regions involved in specific disorders. To facilitate the interactive, visual interpretation of UCR chromosomal coordinates, the authors have implemented the graph visualization feature of UCbase creating a link to UCSC genome browser. UCbase 2.0 does not provide microRNAs (miRNAs) information anymore focusing only on UCRs. The official release of UCbase 2.0 was published in 2014 and is accessible at http://ucbase.unimore.it
A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.
TUC338 is an ultra-conserved element which is transcribed to give a non-coding RNA. The TUC338 gene was first identified as uc.338, along with 480 other ultra-conserved elements in the human genome. Expression of this RNA gene has been found to dramatically increase in hepatocellular carcinoma (HCC) cells.
Arginine and glutamate-rich protein 1 is a protein that in humans is encoded by the ARGLU1 gene located at 13q33.3.
Chao-ting Wu is an American molecular biologist. After training at Harvard Medical School in genetics with William Gelbart, at Stanford Medical School with David Hogness, and in a fellowship at Massachusetts General Hospital in molecular biology, Wu began her independent academic career as an assistant professor in Anatomy and Cellular Biology and then Genetics at Harvard Medical School in 1993. After a period as Professor of Pediatrics in the Division of Molecular Medicine at the Boston Children's Hospital, she returned to the Department of Genetics at Harvard Medical School as a full professor in 2007.
Long interspersed nuclear elements (LINEs) are a group of non-LTR retrotransposons that are widespread in the genome of many eukaryotes. They make up around 21.1% of the human genome. LINEs make up a family of transposons, where each LINE is about 7,000 base pairs long. LINEs are transcribed into mRNA and translated into protein that acts as a reverse transcriptase. The reverse transcriptase makes a DNA copy of the LINE RNA that can be integrated into the genome at a new site.
Ryu et al. BMC Evolutionary Biology 2012 http://www.biomedcentral.com/1471-2148/12/236