An orphon is a gene located outside the main chromosomal locus, i.e., it may be dispersed to an unconnected genomic location. [1] [2]
Orphons have been found in both protein-coding and non-protein-coding gene families, which suggests that most gene transcription processes do not constitute a restriction on the development of orphons. Extensive polymorphism in this feature between individuals of the same species was shown. The gene class was first discovered in yeast, sea urchins, and fruitflies, [1] and has since been reported from the genome of many other eukaryote groups including molluscs, [3] amphibians, [4] and mammals including humans. [5]
Repeated sequences are short or long patterns of nucleic acids that occur in multiple copies throughout the genome. In many organisms, a significant fraction of the genomic DNA is repetitive, with over two-thirds of the sequence consisting of repetitive elements in humans. Some of these repeated sequences are necessary for maintaining important genome structures such as telomeres or centromeres.
Polyadenylation is the addition of a poly(A) tail to an RNA transcript, typically a messenger RNA (mRNA). The poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In eukaryotes, polyadenylation is part of the process that produces mature mRNA for translation. In many bacteria, the poly(A) tail promotes degradation of the mRNA. It, therefore, forms part of the larger process of gene expression.
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).
Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.
The U7 small nuclear RNA is an RNA molecule and a component of the small nuclear ribonucleoprotein complex. The U7 snRNA is required for histone pre-mRNA processing.
High-mobility group protein HMG-I/HMG-Y is a protein that in humans is encoded by the HMGA1 gene.
Histone H3.1 is a protein that in humans is encoded by the H3C2 gene.
DNA (cytosine-5)-methyltransferase 3A (DNMT3A) is an enzyme that catalyzes the transfer of methyl groups to specific CpG structures in DNA, a process called DNA methylation. The enzyme is encoded in humans by the DNMT3A gene.
Ubiquitin is a protein that in humans is encoded by the UBB gene.
YY1 is a transcriptional repressor protein in humans that is encoded by the YY1 gene.
High-mobility group protein B2 also known as high-mobility group protein 2 (HMG-2) is a protein that in humans is encoded by the HMGB2 gene.
Ig mu chain C region is a protein that in humans is encoded by the IGHM gene.
C-terminal-binding protein 2 also known as CtBP2 is a protein that in humans is encoded by the CTBP2 gene.
DNA (cytosine-5)-methyltransferase 3-like is an enzyme that in humans is encoded by the DNMT3L gene.
Spermatid nuclear transition protein 1 is a protein that in humans is encoded by the TNP1 gene.
DNA polymerase epsilon subunit 3 is an enzyme that in humans is encoded by the POLE3 gene.
The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.
In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.
Tc1/mariner is a class and superfamily of interspersed repeats DNA transposons. The elements of this class are found in all animals, including humans. They can also be found in protists and bacteria.