This article may be too technical for most readers to understand.(May 2014) |
SANBR | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | SANBR , SANT and BTB domain regulator of CSR, KIAA1841 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1918925 HomoloGene: 19038 GeneCards: SANBR | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
KIAA1841 is a gene in humans that encodes a protein known as KIAA1841 (uncharacterized protein KIAA1841). KIAA1841 is targeted for the nucleus and it predicted to play a role in regulating transcription.
KIAA1841 is located on the long arm of chromosome 2 (2q14), starting at 61297486 and ending at 61349294. The KIAA1841 gene spans 52809 base pairs and is orientated on the ++ strand. The coding region is made up of 4292 base pairs and the protein sequence of 718 amino acids. [5]
Genes PEX13 and C2orf74 neighbor KIAA1841 on chromosome 2. [6]
KIAA1841 is highly expressed in reproductive structures and nervous tissue. These include the brain, prostate, cervix, ear and nervous tissue. It is intermediately expressed in the lungs and spinal cord. [7] [8] KIAA1841 is expressed at low levels in a wide range of tissues throughout the human body.
In humans, the KIAA1841 gene produces 18 alternatively spliced transcript variants as well as 3 unspliced. From the 18 spliced variants 4 form a protein product. The main transcript in humans is transcript ID ENST00000402291, or OTTHUMT00000325477. [9] [10] [11]
Below is a table of a variety of orthologs of the human KIAA1841. The table include closely, intermediately and distantly related orthologs.
Species | NCBI accession # | Sequence length | Protein identity | mRNA identity |
---|---|---|---|---|
Homo sapiens (Human) | NP_001123465.1 | 718 | 100% | 100% |
Odobenus rosmarus divergens (Walrus) | XP_004397774.1 | 718 | 94% | 97% |
Canis lupus familiaris (Grey wolf) | XP_538505 | 718 | 92% | 96% |
Equus caballus (Horse) | XP_001495879.1 | 765 | 93% | 96% |
Mus musculus (Mouse) | NP_082136.2 | 718 | 89% | 94% |
Echinops telfairi (Hedgehog) | XP_004710320.1 | 718 | 86% | 92% |
Pelodiscus sinensis (Soft-shelled turtle) | XP_006122225.1 | 718 | 78% | 88% |
Anas platyrhynchos (Mallard duck) | XP_005016968.1 | 719 | 78% | 87% |
Gallus gallus (Red junglefowl) | NP_001186348.1 | 718 | 76% | 87% |
Xenopus tropicalis (Western clawed frog) | XP_004914757.1 | 715 | 71% | 83% |
Danio rerio (Zebrafish) | XP_001333668.2 | 735 | 60% | 73% |
Drosophila melanogaster (Fruit fly) | NP_648346.1 | 889 | 38% | 62% |
Apis mellifera (Western honeybee) | XP_006559923.1 | 849 | 40% | 62% |
Anopheles gambiae (Mosquito) | XP_558222.3 | 806 | 40% | 61% |
Orthologs of the human protein KIAA1841 are listed above in descending order or date of divergence and then ascending order of percent identity. KIAA1841 is highly conserved throughout all orthologs, this is demonstrated with a 40% identity in the least similar ortholog. KAA1841 has evolved slowly and evenly over time. [13] [14]
The domain of unknown function 3342 is conserved in all orthologs. It is the highest conserved region of the protein. Conservation of this domain was traced all the way back to a fungus called Batrachochytrium dendrobatidis , which diverged 1216 million years ago from humans. [15]
The molecular weight of KIAA1841 is 82 kilodaltons. The isoelectric point is 6.5. The protein sequence is not rich or low in any amino acids. There are two stretches of non-polar regions, which are capable of being transmembrane regions. There is a stretch of 21 0’s from 254-275 and a stretch of 24 0’s from 420–444.1 The DUF3342 domain stretches from 147–449. [16]
KIAA1841 | DUF3342 | N Terminus | C Terminus | |
---|---|---|---|---|
Isoelectric point | 6.5 | 6.74 | 5.5 | 8.2 |
Positive charge (%) | 13.8 | 13.2 | 11.7 | 15.9+ |
Negative charge (%) | 14.4 | 13.5 | 14.2 | 14.5 |
Net charge | -0.6 | -0.3 | -2.5 | 1.4 |
Major hydrophobics (%) | 24.8 | 28.7 | 27.2 | 22.3 |
There is an even distribution of amino acids comprising KIAA1841. The percent composition of each amino acid is fairly consistent throughout the orthologs of the protein. The most distant ortholog displays the most variance in amino acid composition. There is a higher percent composition of alanine, histidine and leucine and a lower composition of lysine.
The protein sequence of KIAA1841 is not rich or low in any amino acids. The same is true in Mus musculus , Danio rerio , Drosophila melanogaster but not true for the most distantly related. Batrachochytrium dendrobatidis is rich is histidine. Humans and closely related orthologs are composed of 2.2% to 3.8% histidine compared to 5% in Batrachochytrium dendrobatidis.
The DUF3342 domain stretches from 147-449 on KIAA1841 and has a molecular weight of 35.7 kdal. The DUF domain is low in G (2%) and rich is C (6.3%). Both of the non-polar stretches in the protein are located within the DUF domain. One at the beginning and one at the end.
The domain (DUF3342) of unknown function is a part of the pfam11822 family. This family of proteins has yet to be functionally characterized and it is found in bacteria. This domain is usually between 170 amino acids and 303 amino acids in length. The N terminal half of this protein family is a BTB-like domain. BTB domains multifunctional protein-protein interaction motif that is involved in a number of different cellular functions, including roles in regulating transcription, cytoskeleton dynamics, gating and assembly of ion channels and is involved with ubiquitination of proteins. BTB domain structures are highly conversed and are found on proteins that only have one or two other types of domains.
KIAA1841 is highly phosphorylated post modification. There are 37 predicted phosphorylated sites. There is one leucine-rich nuclear export signal toward the end of the protein. There is one sulfated tyrosine, which strengthens protein-protein interactions. Two motifs with high probability of post translational modification sumoylation sites were found. Sumoylation sites are involved in a number of cellular processes, including nuclear-cytosolic transport, transcriptional regulation and protein stability.
KIAA1841 is primarily composed of alpha helices and beta sheets. Alpha helices comprise the majority of the protein, this is true for the DUF domain and both terminuses. The DUF domain has slightly less beta sheets compared to the protein as a whole and the C terminus has an even smaller amount of beta sheets comprising its secondary structure. [17] [18]
KIAA1841 | DUF3342 | N Terminus | C Terminus | |
---|---|---|---|---|
Alpha helix (%) | 68 | 68 | 63.9 | 70.1 |
Beta sheet (%) | 61 | 49.2 | 60.8 | 35.8 |
Beta turn (%) | 14.1 | 9.9 | 12.8 | 15.4 |
KIAA1841 was found to interact with SRPK1 (Serine/arginine- rich protein-specific kinase 1) [20] The interaction was detected via a protein kinase assay. SRPK1 localizes to the nucleus and the cytoplasm. By regulating intracellular localization of splicing factors it is thought to play a role in regulating both constitutive and alternative splicing. KIAA1841 is also found in the nucleus and is thought to play a role in regulating transcription.
Diseases associated with this gene are Crohn’s disease, celiac disease and inflammatory bowel disease. [21] [22]
FAM83H is a protein, which in humans is encoded by the FAM83H gene. The protein is also known as uncharacterized protein FAM83H. FAM83H is targeted for the nucleus. It is predicted to play a role in the structural development and calcification of tooth enamel.
Coiled-coil domain 47 (CCDC47) is a gene located on human chromosome 17, specifically locus 17q23.3 which encodes for the protein CCDC47. The gene has several aliases including GK001 and MSTP041. The protein itself contains coiled-coil domains, the SEEEED superfamily, a domain of unknown function (DUF1682) and a transmembrane domain. The function of the protein is unknown, but it has been proposed that CCDC47 is involved in calcium ion homeostasis and the endoplasmic reticulum overload response.
Transmembrane protein 134 is a protein encoded by the TMEM134 gene. TMEM134 does not have any other known aliases. There are two transmembrane domains and a domain of unknown function (DUF872). Evolutionary, the majority of the organisms that have this gene are primates and mammals, although there are some organisms dating back to Drosophila and C. elegans. Through current research, there has not been any confirmed function of TMEM134.
Family with sequence similarity 63, member A is a protein that, is encoded by the FAM63A gene in humans,. It is located on the minus strand of chromosome 1 at locus 1q21.3.
C8orf48 is a protein that in humans is encoded by the C8orf48 gene. C8orf48 is a nuclear protein specifically predicted to be located in the nuclear lamina. C8orf48 has been found to interact with proteins that are involved in the regulation of various cellular responses like gene expression, protein secretion, cell proliferation, and inflammatory responses. This protein has been linked to breast cancer and papillary thyroid carcinoma.
GPATCH11 is a protein that in humans is encoded by the G-patch domain containing protein 11 gene. The gene has four transcript variants encoding two functional protein isoforms and is expressed in most human tissues. The protein has been found to interact with several other proteins, including two from a splicing pathway. In addition, GPATCH11 has orthologs in all taxa of the eukarya domain.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
PROSER1 is a protein that in humans is encoded by the PROSER1 gene.
Coiled-coil domain containing protein 180 (CCDC180) is a protein that in humans is encoded by the CCDC180 gene. This protein is known to localize to the nucleus and is thought to be involved in regulation of transcription as are many proteins containing coiled-coil domains. As it is expressed most highly in the testes and is regulated by SRY and SOX transcription factors, it could be involved in sex determination.
Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.
C21orf62 is a protein that, in humans, is encoded by the C21orf62 gene. C21orf62 is found on human chromosome 21, and it is thought to be expressed in tissues of the brain and reproductive organs. Additionally, C21orf62 is highly expressed in ovarian surface epithelial cells during normal regulation, but is not expressed in cancerous ovarian surface epithelial cells.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
CRACD-like protein. previously known as KIAA1211L is a protein that in humans is encoded by the CRACDL gene. It is highly expressed in the cerebral cortex of the brain. Furthermore, it is localized to the microtubules and the centrosomes and is subcellularly located in the nucleus. Finally, CRACDL is associated with certain mental disorders and various cancers.
Chromosome 8 open reading frame 58 is an uncharacterised protein that in humans is encoded by the C8orf58 gene. The protein is predicted to be localized in the nucleus.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
KIAA2012 is a protein which, in humans, is encoded by the KIAA2012 gene. KIAA2012 is expressed at very low levels throughout the body, but it is primarily expressed in the ovary, lungs, and brain.