Ultra-conserved element

Last updated

An ultra-conserved element (UCE) is a region of DNA that is identical in at least two different species. [1] One of the first studies of UCEs showed that certain human DNA sequences of length 200 nucleotides or greater were entirely conserved (identical nucleic acid sequence) in human, rats, and mice. [2] Despite often being noncoding DNA, [3] some ultra-conserved elements have been found to be transcriptionally active, giving non-coding RNA molecules. [4]

Contents

Evolution

Perfect conservation of these long stretches of DNA is thought to imply evolutionary importance as these regions appear to have experienced strong negative (purifying) selection for 300-400 million years. [2] [3] [5] The probability of finding ultra-conserved elements by chance (under neutral evolution) has been estimated at less than 10−22 in 2.9 billion bases. [2]

Functions

481 ultra-conserved elements have been identified in the human genome. [1] [2] A database collecting genomic information about ultra-conserved elements (UCbase) that share 100% identity among human, mouse and rat is available at http://ucbase.unimore.it. [6] A small number of those which are transcribed have been connected with human carcinomas and leukemias. [4] For example, TUC338 is strongly upregulated in human hepatocellular carcinoma cells. [7] Indeed, UCEs are often affected by copy number variation in cancer cells, [8] much more than in healthy contexts, [8] [9] [10] suggesting that altering the copy number of ultraconserved elements may be deleterious and associated with cancer. A study comparing ultra-conserved elements between humans and the Japanese puffer fish Takifugu rubripes proposed an importance in vertebrate development. [11] Several ultra-conserved elements are located near transcriptional regulators or developmental genes. [2] [12] Other functions include enhancing and splicing regulation. [1] Double-knockouts of UCEs near the ARX gene in mice caused a shrunken hippocampus in the brain. [13] The knockout effects are not lethal in laboratory mice, but could be in the wild.

See also

Related Research Articles

Genome Genetic material of organism

In the fields of molecular biology and genetics, a genome is all genetic material of an organism. It consists of DNA. The genome includes both the genes and the noncoding DNA, as well as mitochondrial DNA and chloroplast DNA. The study of the genome is called genomics.

Human genome Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome, and the mitochondrial genome. Human genomes include both protein-coding DNA genes and noncoding DNA. Haploid human genomes, which are contained in germ cells consist of three billion DNA base pairs, while diploid genomes have twice the DNA content. While there are significant differences among the genomes of human individuals, these are considerably smaller than the differences between humans and their closest living relatives, the bonobos and chimpanzees.

Non-coding DNA sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functions of non-coding DNA include the transcriptional and translational regulation of protein-coding sequences, scaffold attachment regions, origins of DNA replication, centromeres and telomeres. Its RNA counterpart is non-coding RNA.

Non-coding RNA Class of ribonucleic acid that is not translated into proteins

A non-coding RNA (ncRNA) is an RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non-coding RNAs include transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs), as well as small RNAs such as microRNAs, siRNAs, piRNAs, snoRNAs, snRNAs, exRNAs, scaRNAs and the long ncRNAs such as Xist and HOTAIR.

The coding region of a gene, also known as the CDS, is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

Pseudogene Functionless relative of a gene

Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by DNA duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons.

An Alu element is a short stretch of DNA originally characterized by the action of the Arthrobacter luteus (Alu) restriction endonuclease. Alu elements are the most abundant transposable elements, containing over one million copies dispersed throughout the human genome. Alu elements were thought to be selfish or parasitic DNA, because their sole known function is self reproduction. However they are likely to play a role in evolution and have been used as genetic markers. They are derived from the small cytoplasmic 7SL RNA, a component of the signal recognition particle. Alu elements are highly conserved within primate genomes and originated in the genome of an ancestor of Supraprimates.

Retrotransposon

Retrotransposons are a type of genetic component that copy and paste themselves into different genomic locations (transposon) by converting RNA back into DNA through the process reverse transcription using an RNA transposition intermediate.

Conserved sequence Similar DNA, RNA or protein sequences within genomes or among species

In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids or proteins across species, or within a genome, or between donor and receptor taxa. Conservation indicates that a sequence has been maintained by natural selection.

Gene Sequence of DNA or RNA that codes for an RNA or protein product

In biology, a gene is a sequence of nucleotides in DNA or RNA that encodes the synthesis of a gene product, either RNA or protein.

mir-16 microRNA precursor family

The miR-16 microRNA precursor family is a group of related small non-coding RNA genes that regulates gene expression. miR-16, miR-15, mir-195 and miR-497 are related microRNA precursor sequences from the mir-15 gene family. This microRNA family appears to be vertebrate specific and its members have been predicted or experimentally validated in a wide range of vertebrate species.

mir-181 microRNA precursor

In molecular biology miR-181 microRNA precursor is a small non-coding RNA molecule. MicroRNAs (miRNAs) are transcribed as ~70 nucleotide precursors and subsequently processed by the RNase-III type enzyme Dicer to give a ~22 nucleotide mature product. In this case the mature sequence comes from the 5' arm of the precursor. They target and modulate protein expression by inhibiting translation and / or inducing degradation of target messenger RNAs. This new class of genes has recently been shown to play a central role in malignant transformation. miRNA are downregulated in many tumors and thus appear to function as tumor suppressor genes. The mature products miR-181a, miR-181b, miR-181c or miR-181d are thought to have regulatory roles at posttranscriptional level, through complementarity to target mRNAs. miR-181 which has been predicted or experimentally confirmed in a wide number of vertebrate species as rat, zebrafish, and in the pufferfish.

PCBP2

Poly(rC)-binding protein 2 is a protein that in humans is encoded by the PCBP2 gene.

Long non-coding RNAs are a type of RNA, defined as being transcripts with lengths exceeding 200 nucleotides that are not translated into protein. This somewhat arbitrary limit distinguishes long ncRNAs from small non-coding RNAs such as microRNAs (miRNAs), small interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and other short RNAs. Long intervening/intergenic noncoding RNAs (lincRNAs) are sequences of lncRNA which do not overlap protein-coding genes.

UCbase is a database of ultraconserved sequences that were first described by Bejerano, G. et al. in 2004. They are highly conserved genome regions that share 100% identity among human, mouse and rat. UCRs are 481 sequences longer than 200 bases. They are frequently located at genomic regions involved in cancer, differentially expressed in human leukemias and carcinomas and in some instances regulated by microRNAs. The first release of UCbase was published by Taccioli, C. et al. in 2009. Recent updates include new annotation based on hg19 Human genome, information about disorders related to the chromosome coordinates using the SNOMED CT classification, a query tool to search for SNPs, and a new text box to directly interrogate the database using a MySQL interface. Moreover, a sequence comparison tool allows the researchers to match selected sequences against ultraconserved elements located in genomic regions involved in specific disorders. To facilitate the interactive, visual interpretation of UCR chromosomal coordinates, the authors have implemented the graph visualization feature of UCbase creating a link to UCSC genome browser. UCbase 2.0 does not provide microRNAs (miRNAs) information anymore focusing only on UCRs. The official release of UCbase 2.0 was published in 2014 and is accessible at http://ucbase.unimore.it

A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.

TUC338 is an ultra-conserved element which is transcribed to give a non-coding RNA. The TUC338 gene was first identified as uc.338, along with 480 other ultra-conserved elements in the human genome. Expression of this RNA gene has been found to dramatically increase in hepatocellular carcinoma (HCC) cells.

ARGLU1

Arginine and glutamate-rich protein 1 is a protein that in humans is encoded by the ARGLU1 gene located at 13q33.3.

Ting Wu American geneticist

Chao-ting Wu is an American molecular biologist. After training at Harvard Medical School in genetics with William Gelbart, at Stanford Medical School with David Hogness, and in a fellowship at Massachusetts General Hospital in molecular biology, Wu began her independent academic career as an assistant professor in Anatomy and Cellular Biology and then Genetics at Harvard Medical School in 1993. After a period as Professor of Pediatrics in the Division of Molecular Medicine at the Boston Children's Hospital, she returned to the Department of Genetics at Harvard Medical School as a full professor in 2007.

Long interspersed nuclear element

Long interspersed nuclear elements (LINEs) are a group of non-LTR retrotransposons that are widespread in the genome of many eukaryotes. They make up around 21.1% of the human genome. LINEs make up a family of transposons, where each LINE is about 7,000 base pairs long. LINEs are transcribed into mRNA and translated into protein that acts as a reverse transcriptase. The reverse transcriptase makes a DNA copy of the LINE RNA that can be integrated into the genome at a new site.

References

  1. 1 2 3 Reneker J, Lyons E, Conant GC, Pires JC, Freeling M, Shyu CR, Korkin D (2012). "Long identical multispecies elements in plant and animal genomes". Proceedings of the National Academy of Sciences. 109 (19): E1183–E1191. doi:10.1073/pnas.1121356109. ISSN   0027-8424. PMC   3358895 . PMID   22496592.
  2. 1 2 3 4 5 Bejerano, G; Pheasant, M; Makunin, I; Stephen, S; Kent, WJ; Mattick, JS; Haussler, D (2004-05-28). "Ultraconserved elements in the human genome". Science. 304 (5675): 1321–5. CiteSeerX   10.1.1.380.9305 . doi:10.1126/science.1098119. PMID   15131266.
  3. 1 2 Katzman, S; Kern, AD; Bejerano, G; Fewell, G; Fulton, L; Wilson, RK; Salama, SR; Haussler, D (2007-08-17). "Human genome ultraconserved elements are ultraselected". Science. 317 (5840): 915. doi:10.1126/science.1142430. PMID   17702936.
  4. 1 2 Calin GA, Liu CG, Ferracin M, Hyslop T, Spizzo R, Sevignani C, Fabbri M, Cimmino A, Lee EJ, Wojcik SE, Shimizu M, Tili E, Rossi S, Taccioli C, Pichiorri F, Liu X, Zupo S, Herlea V, Gramantieri L, Lanza G, Alder H, Rassenti L, Volinia S, Schmittgen TD, Kipps TJ, Negrini M, Croce CM (Sep 2007). "Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas". Cancer Cell. 12 (3): 215–29. doi:10.1016/j.ccr.2007.07.027. PMID   17785203.
  5. Sathirapongsasuti JF, Sathira N, Suzuki Y, Huttenhower C, Sugano S (2011). "Ultraconserved cDNA segments in the human transcriptome exhibit resistance to folding and implicate function in translation and alternative splicing". Nucleic Acids Res. 39 (6): 1967–79. doi:10.1093/nar/gkq949. PMC   3064809 . PMID   21062826.
  6. Taccioli C, Fabbri E, Visone R, Volinia S, Calin GA, Fong LY, Gambari R, Bottoni A, Acunzo M, Hagan J, Iorio MV, Piovan C, Romano G, Croce CM (Jan 2009). "UCbase & miRfunc: a database of ultraconserved sequences and microRNA function". Nucleic Acids Res. 37 (Database issue): D41–8. doi:10.1093/nar/gkn702. PMC   2686429 . PMID   18945703.
  7. Braconi C, Valeri N, Kogure T, Gasparini P, Huang N, Nuovo GJ, Terracciano L, Croce CM, Patel T (2011-01-11). "Expression and functional role of a transcribed noncoding RNA with an ultraconserved element in hepatocellular carcinoma". Proceedings of the National Academy of Sciences of the United States of America. 108 (2): 786–91. doi:10.1073/pnas.1011098108. PMC   3021052 . PMID   21187392.
  8. 1 2 McCole, Ruth B.; Fonseka, Chamith Y.; Koren, Amnon; Wu, C.-ting (2014-10-23). "Abnormal Dosage of Ultraconserved Elements Is Highly Disfavored in Healthy Cells but Not Cancer Cells". PLOS Genetics. 10 (10): e1004646. doi:10.1371/journal.pgen.1004646. ISSN   1553-7404. PMC   4207606 . PMID   25340765.
  9. Derti, Adnan; Roth, Frederick P; Church, George M; Wu, C-ting (2006). "Mammalian ultraconserved elements are strongly depleted among segmental duplications and copy number variants". Nature Genetics. 38 (10): 1216–1220. doi:10.1038/ng1888. PMID   16998490.
  10. Chiang, Charleston W. K.; Derti, Adnan; Schwartz, Daniel; Chou, Michael F.; Hirschhorn, Joel N.; Wu, C.-ting (2008-12-01). "Ultraconserved Elements: Analyses of Dosage Sensitivity, Motifs and Boundaries". Genetics. 180 (4): 2277–2293. doi:10.1534/genetics.108.096537. ISSN   0016-6731. PMC   2600958 . PMID   18957701.
  11. Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K, Walter K, Abnizova I, Gilks W, Edwards YJ, Cooke JE, Elgar G (Jan 2005). "Highly conserved non-coding sequences are associated with vertebrate development". PLoS Biology. 3 (1): e7. doi:10.1371/journal.pbio.0030007. PMC   526512 . PMID   15630479. Open Access logo PLoS transparent.svg
  12. "Unexpressed but Indispensable—The DNA Sequences That Control Development". PLoS Biology. 3 (1): e19. Jan 2005. doi:10.1371/journal.pbio.0030019. PMC   544543 . Open Access logo PLoS transparent.svg
  13. Elizabeth Pennisi (2017) Mysterious unchanging DNA finds a purpose in life, Science 02 Jun 2017]

Ryu et al. BMC Evolutionary Biology 2012 http://www.biomedcentral.com/1471-2148/12/236 Open Access logo PLoS transparent.svg