EVI5L | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | EVI5L , ecotropic viral integration site 5 like | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | HomoloGene: 71934 GeneCards: EVI5L | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
EVI5L (Ecotropic Viral Integration Site 5-Like) is a protein that in humans is encoded by the EVI5L gene. [3] EVI5L is a member of the Ras superfamily of monomeric guanine nucleotide-binding (G) proteins, and functions as a GTPase-activating protein (GAP) with a broad specificity. [4] [5] Measurement of in vitro Rab-GAP activity has shown that EVI5L has significant Rab2A- and Rab10-GAP activity. [6]
The EVI5L gene is 34,701 base pairs long and has an unprocessed mRNA that is 3,756 nucleotides in length. It consists of 19 exons that encode for an 805 amino acid protein. [7]
EVI5L is located on the short arm (p) of chromosome 19 in region 1, band 3, and sub-band 2 (19p13.2) starting at 7,830,275 base pairs and ending at 7,864,976 base pairs. It is encoded for on the plus strand. It is located near the CLEC4M (C-type lectin domain family 4, member M) gene, which is involved in peptide antigen transport. [8]
EVI5L contains a RAB-GAP TBC domain, which is involved with regulating membrane trafficking by cycling between inactive (GDP-bound) and active (GTP-bound) conformations. [9] It also has the apolipophorin-III and tetratricopeptide repeat (TPR) domains. Apolipophorin-III play vital roles in the transport of lipids and lipoprotein metabolism, [10] while TPR mediates protein-protein interactions and the assembly of multi protein complexes. [11] These three domains are highly conserved in EVI5L orthologs.
There are 7 moderately related proteins in humans that are paralogous to the RAB-GAP TBC domain of EVI5L. All of these proteins are in the guanosine nucleotide-binding protein family [12]
Paralogous Protein | Protein Name | Sequence Length (amino acids) | Amino Acid Identity |
---|---|---|---|
EVI5 | Ecotropic viral integration site 5 | 810 aa | 51% |
TBC1D14 | TBC1 domain family, member 14 | 693 aa | 19% |
RABGAP1 | RAB GTPase activating protein 1 | 1069 aa | 18% |
RABGAP1L | RAB GTPase activating protein 1-like | 815 aa | 18% |
TBC1D12 | TBC1 Domain Family Member 12 | 775 aa | 17% |
TBC1 | (Tre-2/USP6, BUB2, Cdc16) Domain Family, Member 1 | 1168 aa | 15% |
TBC1D4 | TBC1 domain family, member 4 | 1298 aa | 13% |
There are 63 [13] orthologs of EVI5L that have been identified including mammals, birds, reptiles, and fish. [14] EVI5L is highly conserved among its orthologs but is not present in insects, plants, bacteria, archea or protists.
The following table lists the homologs of EVI5L:
Genus and Species | Common Name | Accession Number | Seq. Length | Seq. Identity | Seq. Similarity | Time of Divergence |
---|---|---|---|---|---|---|
Homo sapiens | Humans | NM_001159944.2 | 3756 bp | - | - | - |
Pan troglodytes | Common Chimpanzee | XM_003316056.2 | 3874 bp | 99% | 99% | 6.3 mya |
Canis familiaris | Dog | XM_003432793.1 | 2430 bp | 98% | 99% | 94.2 mya |
Sus scrofa | Wild Boar | XM_003123194 | 3673 bp | 95% | 99% | 94.2 mya |
Chelonia mydas | Sea Turtle | EMP36617 | 3436 bp | 94% | 99% | 294.5 mya |
Alligator sinensis | Chinese Alligator | XM_006036467.1 | 6780 bp | 82% | 91% | 296.4 mya |
Ficedula albicollis | Collared Flycatcher | XM_005062373.1 | 2090 bp | 79% | 88% | 324.2 mya |
Haplochromis burtoni | Cichlid | XM_005934450.1 | 6638 bp | 70% | 84% | 400.1 mya |
Danio rerio | Zebrafish | XM_689590 | 2856 bp | 69% | 82% | 400.1 mya |
Oreochromis niloticus | Nile Tilapia | XM_003447957 | 6757 bp | 68% | 84% | 400.1 mya |
The protein of EVI5L consists of 805 [15] amino acid residues. The molecular weight of the mature protein is 92.5 kdal with an isoelectric point of 5.05. EVI5L has an unusually large amount of glutamic acid residues, compared to similar proteins. Most of the protein is neutral, with no positive charge, negative charge, or mixed charge clusters. [16] It has a very small negative hydrophobicity (-0.597019). EVI5L is a soluble protein [17] that localizes in the nucleus. [18] It contains no signal peptide, no mitochondrial targeting motifs and no peroxisomal targeting signal in the C-terminus. There is no transmembrane domain in EVI5L. [19]
EVI5L has two isoforms produced by alternative splicing. Isoform 2 is missing in-frame exon 11, making it shorter (794 amino acids). [20]
Post translational modifications of EVI5L that are evolutionarily conserved in majority of the orthologs include glycosylation (C-mannosylation), [21] glycation, [22] phosphorylation (non-kinase and kinase specific), [23] [24] and sumoylation. [25] There is also one leucine-rich nuclear export signal. [26]
The entire secondary structure of EVI5L is made up of alpha helices, with no beta sheets present. [27] [28] This is also true for EVI5Ls closest structural paralog, RABGAP1L. [29]
The predicted promoter for the EVI5L gene spans 664 base pairs from 7,910,867 to 7,911,530 with a predicted transcriptional start site that is 114 base pairs and spans from 7,911,346 to 7,911,460. [30] The promoter region and beginning of the EVI5L gene (7,910,997 to 7,911,843) is not conserved past primates. This region was used to determine transcription factor interactions.
Some of the main transcription factors predicted to bind to the promoter includes: activator-, mediator- and TBP-dependent core promoter element for RNA polymerase II transcription from TATA-less promoters, p53 tumor suppressor, brachyury gene, mesoderm developmental factor, EGR/nerve growth factor induced protein C & related factors, and GLI zinc finger family. [31]
Expression data from expressed sequence tag mapping, microarray and in situ hybridization shows EVI5L has ubiquitously low expression. [32] [33] [34] However, it has slightly higher expression in the testis and fetal brain.
The exact function of EVI5L is unknown. Given this, the paralogs of the gene are associated with starvation-induced autophagosome formation and trafficking and translocation of GLUT4-containing vesicles. [35] [36] Therefore, EVI5L is predicted to target endocytic vesicles.
EVI5L has been shown to interact with NUDT18 (nucleoside diphosphate linked moiety X)-type motif 18 [37] and SRPK2 (serine/threonine-protein kinase 2). [38] NUDT18 is a member of the Nudix hydrolase family. Nudix hydrolases eliminate potentially toxic nucleotide metabolites from the cell and regulate the concentrations and availability of many different nucleotide substrates, cofactors, and signaling molecules. [39] SRPK2 is a Serine/arginine rich protein-specific kinase which specifically phosphorylates its substrates at serine residues located in regions rich in arginine/serine dipeptides, known as RS domains and is involved in the phosphorylation of SR splicing factors and the regulation of splicing. [40]
Zebrafish deficient for Rab23 or its GTPase-activating protein, EVI5L, exhibit abnormal heart formation. This is attributed to the requirement of RAB23 in the differentiation of cardiac progenitor cells. RAB23 is required for normal development of the brain, spinal cord and heart, and without EVI5L to activate RAB23, abnormal formation of these organs ensues. [41]
Transmembrane protein 242 (TMEM242) is a protein that in humans is encoded by the TMEM242 gene. The tmem242 gene is located on chromosome 6, on the long arm, in band 2 section 5.3. This protein is also commonly called C6orf35, BM033, and UPF0463 Transmembrane Protein C6orf35. The tmem242 gene is 35,238 base pairs long, and the protein is 141 amino acids in length. The tmem242 gene contains 4 exons. The function of this protein is not well understood by the scientific community. This protein contains a DUF1358 domain.
The family with sequence similarity 43 member A (FAM43A) gene, also known as; GCO3P195887, GC03P194406, GC03P191784, and NM_153690.3, codes for a 423 bp protein that is conserved in primates, and orthologs have been found in vertebrate and invertebrate species. Three transcripts have been identified, two protein coding isoforms, and a non-coding transcript (cAug10). Molecular weight of 45.8 kdal in the unphosphorylated state and isoelectric point of 6.1.
Family with Sequence Similarity 203, Member B (FAM203B) is a protein encoded by the FAM203B gene (8q24.3) in humans. While FAM203B is only found in humans and possibly non-human primates, its paralog, FAM203A, is highly conserved. The FAM203B protein contains two conserved domains of unknown function, DUF383 and DUF384, and no transmembrane domains. This protein has no known function yet, although the homolog of FAM203A in Caenorhabditis elegans (Y54H5A.2) is thought to help regulate the actin cytoskeleton.
Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.
WW and C2 domain containing 2 (WWC2) is a protein that in humans is encoded by the WWC2 gene (4q35.1). Though function of WWC2 remains unknown, it has been predicted that WWC2 may play a role in cancer.
Coiled Coil Domain Containing protein 42B, also known as CCDC42B, is a protein encoded by the protein-coding gene CCDC42B.
Intermediate filament family orphan 1 is a protein that in humans is encoded by the IFFO1 gene. IFFO1 has uncharacterized function and a weight of 61.98 kDa. IFFO1 proteins play an important role in the cytoskeleton and the nuclear envelope of most eukaryotic cell types.
PRP36 is an extracellular protein in Homo sapiens that is encoded by the PRR36 gene that contains a domain of unknown function, DUF4596, towards the C terminus of the protein. The function of PRP36 is unknown, but high gene expression has been observed in various regions of the brain such as the prefrontal cortex, cerebellum, and the amygdala. PRP36 has one alias: Putative Uncharacterized Protein FLJ22184.
C14orf93 is a protein that is encoded in humans by the C14orf93 gene. It is a globular protein with a conserved C-terminus that is localized to the nucleus. While expressed relatively highly in all tissues except nervous tissue, it is expressed particularly highly in T cells and other immune tissues.
CRACD-like protein. previously known as KIAA1211L is a protein that in humans is encoded by the CRACDL gene. It is highly expressed in the cerebral cortex of the brain. Furthermore, it is localized to the microtubules and the centrosomes and is subcellularly located in the nucleus. Finally, CRACDL is associated with certain mental disorders and various cancers.
Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
Tubulin epsilon and delta complex 2 (TEDC2), also known as Chromosome 16 open reading frame 59 (C16orf59), is a protein that in humans is encoded by the TEDC2 gene. Its NCBI accession number is NP_079384.2.
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
In humans, the immunoglobulin super family containing leucine-rich repeat (ISLR) protein is encoded by the ISLR gene. Current RNA-seq studies show that the protein is highly expressed in the endometrium and ovary and shows expression among 25 other tissues. The protein is seen localized in the cytoplasm, plasma membrane, extracellular exosome, and platelet alpha granule lumen. Furthermore, the protein is known to play a role in platelet degranulation, cell adhesion, and response to elevated platelet cytosolic Ca2+.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
C2orf72 is a gene in humans that encodes a protein currently named after its gene, C2orf72. It is also designated LOC257407 and can be found under GenBank accession code NM_001144994.2. The protein can be found under UniProt accession code A6NCS6.
TBC1D30 is a gene in the human genome that encodes the protein of the same name. This protein has two domains, one of which is involved in the processing of the Rab protein. Much of the function of this gene is not yet known, but it is expressed mostly in the brain and adrenal cortex.
Chromosome 4 open reading frame 54 is a protein that in humans is coded by the c4orf54 gene. This gene is also known as FOPV and LOC285556. This protein is mostly expressed in the nucleus of muscle cells. Orthologs are found in vertebrates but not invertebrates.