KIAA2013 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | KIAA2013 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1924284; HomoloGene: 12668; GeneCards: KIAA2013; OMA:KIAA2013 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
KIAA2013, also known as Q8IYS2 [5] or MGC33867, [6] is a single-pass transmembrane protein encoded by the KIAA2013 gene in humans. [5] The complete function of KIAA2013 has not yet been fully elucidated.
The KIAA2013 gene is located on the short arm of chromosome 1, in location 36.22 (1p36.22). [7] It can be found on the minus strand of the previously mentioned chromosome, running from 11,986,485 to 11,979,643. [8] The gene contains 3 exons, 2 introns, and is 6,838 base pairs long. [7]
There are two alternate splice variants. One retains a transcript length of 2539 bp and the other retains a transcript length of 2170. [9]
The longest mRNA splice variant of the KIAA2013 protein contains 634 amino acid residues. The predicted weight of the protein is 69.2 kDa [10] and its isoelectric point is 8.44. There is also a lysine multiplet of six amino acid residues in a row, beginning in position 28. [11] This sequence, however, is located within the cleavable signal peptide and will most likely not remain a part of the mature protein.
KIAA2013 contains one conserved protein domain of unknown function by the name of DUF2152, or pfam10222. This protein has remained conserved from mammals to invertebrates. [12] The conserved domain extends from amino acid position 6 to 629.
Secondary structure as analyzed via GOR4: [13]
Structure | Percentage |
---|---|
Alpha helix | 38% |
Beta sheet | 61.2% |
An AlphaFold prediction has been generated that was further analyzed through the use of iCn3D. [14] The following images highlight the transmembrane regions of the KIAA2013 protein, as well as the three disulfide bridges that can be seen to form.
The singular human KIAA2013 promoter is a 1194 bp long sequence that precedes the gene. [15]
There are hundreds of possible transcription factor binding sites that can be found on the promoter sequence of KIAA2013. Here is a list of some that retain a high matrix similarity:
The KIAA2013 protein has been shown to be expressed ubiquitously across many differing human tissues. However, studies suggest that the small intestine, most specifically the duodenum, as well as the colon and kidneys express higher levels of this protein. [16] RNA-seq data has indicated that this gene is also expressed within the intestine of 20-week-old fetuses. [7] NCBI GEO data of preimplantation embryos indicates that KIAA2013 expression begins to be expressed in high amounts after the 4-cell embryo has developed. [17]
As can be seen in the image, this final portion of the KIAA2013 3' UTR contains the poly-A signal as well as multiple ELAVL1 miRNA binding sites. ELAVL1 is a necessary RNA binding protein during the process of placental branching and general embryonic development. Out of the womb, ELAVL1 promotes angiogenesis, or the formation of new blood vessels. [18]
The 5' UTR has two main conserved regions, located at the very beginning and very end of the sequence. Not only that, but it has two sequences coding for stop codons, as can be seen in the image. Most miRNA seem to congregate around the two conserved domains. EIF4B is known as eukaryotic translation initiation factor 4B and is needed to bind mRNAs to ribosomes as well as assist with the translation of longer 5' UTRs. [19] It binds to the mRNA in the presence of ATP. FUS actually mediates gene silencing. [20] It has also been clinically linked with ALS diagnosis cases. [20] Finally, RBM4 helps to control translation as well as alternative splicing events. Reduced expression of this miRNA has been linked to Down syndrome. [21]
KIAA2013 has been found to intracellularly localize to the Golgi apparatus and endoplasmic reticulum. This has been validated through the use of GFP fusion and antibody specific experimentation. [22] DeepLoc analysis has indicated that there is an 81.94% chance that this protein is found in the Golgi apparatus and 16.77% that it is localized to the endoplasmic reticulum. [23] The likelihood that KIAA2013 is a membrane protein sits at 99.98%. [23]
There is a predicted signal peptide spanning across amino acids 1-40. [24] The cleavage site for this signal peptide is located between amino acid positions 40 and 41. There are also a collection of post-translational modifications that can be connected with KIAA2013. They include:
Modification | Location |
---|---|
Glycosylation | T224 [25] |
Glycosylation | N363 [25] |
Phosphorylation | S159 [26] |
Phosphorylation | S381 [26] |
Ubiquitylation | K629 [27] |
There are currently no known paralogs of KIAA2013.
KIAA2013 has one pseudogene found within Homo sapiens named LOC728138. The length of this pseudogene is 633 amino acid residues and it shares a 96.8% sequence identity with KIAA2013. [28]
There are orthologs for KIAA2013 ranging from mammals all the way back to invertebrates. As of now, there are 419 organisms that are known to contain orthologs of this gene. [29]
KIAA2013 | Genus, species | Common Name | Divergence Date (MYA) | Accession Number | Protein Length | Seq. Identity | Seq. Similarity |
---|---|---|---|---|---|---|---|
Mammalia | Homo sapiens | Human | 0 | NP_612355.1 | 634 | 100% | 100% |
Mus caroli | Ryuku mouse | 90 | XP_021016690.1 | 634 | 91.8% | 95.9% | |
Mirounga leonina | Southern elephant seal | 94 | XP_034875674.1 | 629 | 94.8% | 96.8% | |
Felis catus | Cat | 96 | XP_003989629.3 | 634 | 94.8% | 97.1% | |
Aves | Falco rusticolus | Gyrfalcon | 312 | XP_037236550.1 | 612 | 61.7% | 71% |
Reptilia | Gopherus evgoodei | Goode's thornscrub tortoise | 312 | XP_030393408.1 | 623 | 64.9% | 74.9% |
Amphibian | Xenopus laevis | African clawed frog | 352 | XP_018083185.1 | 614 | 55.5% | 69.1% |
Microcaecelia unicolor | Tiny Cayenne Caecilian | 352 | XP_030078049.1 | 623 | 53% | 68.4% | |
Fish | Acipenser ruthenus | Sterlet | 435 | XP_033899255.2 | 610 | 56.5% | 69.3% |
Lepisosteus oculatus | Spotted gar | 435 | XP_006642029.2 | 623 | 55.1% | 68.1% | |
Invertebrates | Anopheles merus | Mosquito | 797 | XP_041777166.1 | 625 | 27.5% | 44.5% |
Pollicipes pollicipes | Goose Neck Barnacle | 797 | XP_037086897.1 | 639 | 27.6% | 44.3% | |
Drosophila subpulchrella | Fly | 797 | XP_037708712.1 | 637 | 26% | 43% | |
Limulus polyphemus | Atlantic Horseshoe crab | 797 | XP_013773544.2 | 516 | 21.8% | 36.8% |
The graph to the right illustrates the rate of divergence of the protein KIAA2013, as compared to cytochrome c and fibrinogen alpha. This graph utilized a molecular clock approach wherein the evolution of the protein KIAA2013 was compared to the rate of the two previously mentioned proteins. Cytochrome c has a much slower rate of divergence as compared to fibrinogen alpha, while KIAA2013 lies in between the two. [30]
KIAA2013 has been found to interact with two proteins: TMEM60 and IBP5 via a validated two-hybrid array. [31]
KIAA2013 has been found to play a role in the endocannabinoid system. This system is made up of cannabinoid receptors 1 and 2 (CB1 and CB2) as well as the various ligands and enzymes that interact. The protein KIAA2013 has been found to be expressed within CB2 expressing cells. [32] Both cannabinoid receptors are labeled as class A G protein-coupled receptors, and CB2 is highly expressed within the human spleen and leukocytes. CB2, and by extension KIAA2013, are therefore targets of interest for therapeutic studies looking into diseases such as inflammatory bowel disease and rheumatoid arthritis. [33]
Transmembrane protein 151B is a protein that in humans is encoded by the TMEM151B gene.
FAM76A is a protein that in Homo sapiens is encoded by the FAM76A gene. Notable structural characteristics of FAM76A include an 83 amino acid coiled coil domain as well as a four amino acid poly-serine compositional bias. FAM76A is conserved in most chordates but it is not found in other deuterostrome phlya such as echinodermata, hemichordata, or xenacoelomorpha—suggesting that FAM76A arose sometime after chordates in the evolutionary lineage. Furthermore, FAM76A is not found in fungi, plants, archaea, or bacteria. FAM76A is predicted to localize to the nucleus and may play a role in regulating transcription.
The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Sperm microtubule associated protein 1 is a protein which in humans is encoded by the SPMAP1 gene. The protein is derived from Homo sapiens chromosome 17. The SPMAP1 gene consists of a 6,302 base sequence. Its mRNA has three exons and no alternative splice sites. The protein has 154 amino acids, with no abnormal amino acid levels. SPMAP1 has a domain of unknown function (DUF4542) and is 17.6kDa in weight. SPMAP1 does not belong to any other families nor does it have any isoforms. The protein has orthologs with high percent similarity in mammals and reptiles. The protein has additional distantly related orthologs across the metazoan kingdom, culminating with the sponge family.
Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.
Family with sequence similarity 222 member A or Aggregatin is a protein of unknown function. In humans it is encoded by the gene FAM222A. Aggregatin's cellular function is not well understood, however it has been implicated in Alzheimer's disease.
Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. Isoelectric point was found to be 9.54. The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
TBC1D30 is a gene in the human genome that encodes the protein of the same name. This protein has two domains, one of which is involved in the processing of the Rab protein. Much of the function of this gene is not yet known, but it is expressed mostly in the brain and adrenal cortex.
Transmembrane protein 212 is a protein that in humans is encoded by the TMEM212 gene. The protein consists of five transmembrane domains and localizes in the plasma membrane and endoplasmic reticulum. TMEM212 has orthologs in vertebrates but not invertebrates. TMEM212 has been associated with sporadic Parkinson's disease, facial processing, and adiposity in African Americans.
Chromosome 5 open reading frame 22 (c5orf22) is a protein-coding gene of poorly characterized function in Homo sapiens. The primary alias is unknown protein family 0489 (UPF0489).
C4orf36 is a protein that in humans is encoded by the c4orf36 gene.
Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.
{{cite book}}
: |journal=
ignored (help)