Mammalian-wide interspersed repeats (MIRs) are transposable elements in the genomes of some organisms and belong to the group of Short interspersed nuclear elements (SINEs).
MIRs are found in all mammals (including marsupials). [1]
It is estimated that there are around 368,000 MIRs in the human genome. [2]
The MIR consensus sequence is 260 basepairs long and has an A/T-rich 3' end. [1]
Like other Short interspersed nuclear elements (SINEs), MIR elements used the machinery of LINE elements for their propagation in the genome, which took place around 130 million years ago. They cannot retrotranspose anymore since the loss of activity of the required reverse transcriptase. [3]
MIR elements have been first described in human genome 1989-1991 [4] [5] [6] and were first referred as MB1 family repeats (mirror to sequences of mouse B1 repeat). Then this family repeats were found in other mammalian genomes. [7] Then this family was renamed as "Mammalian interspersed repeats" in 1992 [8] Later this family was shown to be common for vertebrate genomes. [9]
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.
A transposable element is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposition often results in duplication of the same genetic material. In the human genome, L1 and Alu elements are two examples. Barbara McClintock's discovery of them earned her a Nobel Prize in 1983. Its importance in personalized medicine is becoming increasingly relevant, as well as gaining more attention in data analytics given the difficulty of analysis in very high dimensional spaces.
Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses.
An Alu element is a short stretch of DNA originally characterized by the action of the Arthrobacter luteus (Alu) restriction endonuclease. Alu elements are the most abundant transposable elements, containing over one million copies dispersed throughout the human genome. Alu elements were thought to be selfish or parasitic DNA, because their sole known function is self reproduction. However, they are likely to play a role in evolution and have been used as genetic markers. They are derived from the small cytoplasmic 7SL RNA, a component of the signal recognition particle. Alu elements are highly conserved within primate genomes and originated in the genome of an ancestor of Supraprimates.
Repeated sequences are short or long patterns of nucleic acids that occur in multiple copies throughout the genome. In many organisms, a significant fraction of the genomic DNA is repetitive, with over two-thirds of the sequence consisting of repetitive elements in humans. Some of these repeated sequences are necessary for maintaining important genome structures such as telomeres or centromeres.
Retrotransposons are a type of genetic component that copy and paste themselves into different genomic locations (transposon) by converting RNA back into DNA through the reverse transcription process using an RNA transposition intermediate.
Interspersed repetitive DNA is found in all eukaryotic genomes. They differ from tandem repeat DNA in that rather than the repeat sequences coming right after one another, they are dispersed throughout the genome and nonadjacent. The sequence that repeats can vary depending on the type of organism, and many other factors. Certain classes of interspersed repeat sequences propagate themselves by RNA mediated transposition; they have been called retrotransposons, and they constitute 25–40% of most mammalian genomes. Some types of interspersed repetitive DNA elements allow new genes to evolve by uncoupling similar DNA sequences from gene conversion during meiosis.
60S ribosomal protein L40 (RPL40) is a protein that in humans is encoded by the UBA52 gene.
CUG triplet repeat, RNA binding protein 1, also known as CUGBP1, is a protein which in humans is encoded by the CUGBP1 gene.
Gamma-crystallin B is a protein that in humans is encoded by the CRYGB gene.
Three prime repair exonuclease 2 is an enzyme that in humans is encoded by the TREX2 gene.
PAX-interacting protein 1 is a protein that in humans is encoded by the PAXIP1 gene.
39S ribosomal protein L18, mitochondrial is a protein that in humans is encoded by the MRPL18 gene.
Homeobox protein SIX4 is a protein that in humans is encoded by the SIX4 gene.
39S ribosomal protein L10, mitochondrial is a protein that in humans is encoded by the MRPL10 gene.
L1Base is a database of functional annotations and predictions of active LINE1 elements.
TRANSFAC is a manually curated database of eukaryotic transcription factors, their genomic binding sites and DNA binding profiles. The contents of the database can be used to predict potential transcription factor binding sites.
Long interspersed nuclear elements (LINEs) are a group of non-LTR retrotransposons that are widespread in the genome of many eukaryotes. LINEs contain an internal Pol II promoter to initiate transcription into mRNA, and encode one or two proteins, ORF1 and ORF2. The functional domains present within ORF1 vary greatly among LINEs, but often exhibit RNA/DNA binding activity. ORF2 is essential to successful retrotransposition, and encodes a protein with both reverse transcriptase and endonuclease activity.
Short interspersed nuclear elements (SINEs) are non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates. SINEs compose about 13% of the mammalian genome.