LOC100287387

Last updated

LOC100287387 is a protein that in humans is encoded by the gene LOC100287387. The function of the protein is not yet understood in the scientific community. The gene is located on the q arm of chromosome 2. [1]

Contents

Gene

Chromosome 2 of the Human Genome Ideogram human chromosome 2.svg
Chromosome 2 of the Human Genome

The human LOC100287387 gene is located on the minus strand of the q arm of chromosome 2 at 2q37.3. [1] It overlaps the TWIST2 gene family on the plus strand of chromosome 2. [2] The gene is formed by three exons, with two introns near the start codon. [2]

mRNA

There are no alternative splicings of the LOC100287387 gene (isoforms). [2]

Protein

LOC100287387 is located at 2q37.3 2q37.3.png
LOC100287387 is located at 2q37.3
Human LOC100287387 predicted protein modification sites from MotifScan. Diagram created using IBS1.0.3 from GPS. "CK2P", "CampP", and "PKC" are phosphorylation sites. "M" are myristoylation sites. "SUMO" is a sumoylation site. LOC100287387 Predicted Modification Sites.pdf
Human LOC100287387 predicted protein modification sites from MotifScan. Diagram created using IBS1.0.3 from GPS. "CK2P", "CampP", and "PKC" are phosphorylation sites. "M" are myristoylation sites. "SUMO" is a sumoylation site.

The LOC100287387 protein is formed by a 423 amino acid peptide sequence. The molecular mass is 44.4 kdal, [5] and the isoelectric point is 10.77. [6] There is a G-patch domain and a short domain of unknown function within the peptide sequence. There are many predicted modification sites within the amino acid sequence including cAMP- dependent phosphorylation sites (CampP), casein kinase 2 (CK2), and protein kinase C (PKC) phosphorylation sites, O-linked beta-N-acetylglucosamine sites, and a sumoylation site. [3] [7] The predicted secondary structure of the protein includes 8 short alpha-helices (15.6% of the protein), 14 short extended strands (12.1%), and the rest as random coils (72%). [8]

Expression

In humans, there is low expression of LOC100287387 in all tissues. Highest expression is in the skin and central nervous system tissue such as the pons, superior cervical ganglion, trigeminal ganglion, and globus pallidus. However, expression was inconsistent among patients. [9]

Regulation

The promoter region of the LOC100287387 gene contains binding sites for many transcription factors which affect transcription levels of the gene. Within the promoter region, there are three TFIIB binding sites (initiates transcription), a cysteine-serine-rich nuclear protein 1 site (an activator), a Kruppel-like zinc finger protein 219 site (repressor), a stimulating protein 1 site (activator), and many more. [10]

Homology

Orthologs to the human LOC100287387 gene are found only in mammals, and the protein sequence is not highly conserved. Conservation is highest in primates, and falls drastically among other mammals. [11] Conservation between species is highest at the nuclear localization signal and towards the end of the coding sequence at the G Patch domain and DUF308 which indicates these are the most functionally important parts of the sequence. [11]

Orthologs of Human LOC100287387
Genus and SpeciesCommon NameDivergence from Homo

sapiens (Million Years) [12]

Polypeptide Length [11] Sequence Identity (%) [11]
Homo sapiensHuman0423100
Pan PaniscusBonobo6.432593
Nomascus leucogenysGibbon19.430790
Oryctalagus cuniculusEuropean Rabbit8824157
Tursiops truncatesBottlenose Dolphin9430064
Orcinus orcaKiller Whale9430064
Delphinapterus leucasBeluga Whale9433857
Mustela putorious furoFerret9430357
Canis lupusDog94184 & 63

(No continuous reading frame)

38

There are no paralogs of the human gene LOC100287387. [13]

Function

The protein contains a nuclear localization signal, and most likely acts in the nucleus. [14] There are no confirmed protein interactions or associations to diseases.

Related Research Articles

<span class="mw-page-title-main">C11orf16</span> Protein-coding gene in the species Homo sapiens

Gene C11orf16, chromosome 11 open reading frame 16, is a protein in humans that is encoded by the C11orf16 gene. It has 7 exons, and the size of 467 amino acids.

<span class="mw-page-title-main">Morn repeat containing 1</span> Protein-coding gene in the species Homo sapiens

MORN1 containing repeat 1, also known as Morn1, is a protein that in humans is encoded by the MORN1 gene.

<span class="mw-page-title-main">CCDC130</span> Protein found in humans

Coiled-coil domain containing 130 is a protein that in humans is encoded by the CCDC130 gene. It is part of the U4/U5/U6 tri-snRNP in the U5 portion. This tri-snRNP comes together with other proteins to form complex B of the mature spliceosome. The mature protein is approximately 45 kilodaltons (kDa) and is extremely hydrophilic due to the abnormally high number of charged and polar amino acids. CCDC130 is a highly conserved protein, it has orthologous genes in some yeasts and plants that were found using nucleotide and protein versions of the basic local alignment search tool (BLAST) from the National Center for Biotechnology Information. GEO profiles for CCDC130 have shown that this protein is ubiquitously expressed, but the highest levels of expression are found in T-lymphocytes.

<span class="mw-page-title-main">FAM203B</span> Protein-coding gene in the species Homo sapiens

Family with Sequence Similarity 203, Member B (FAM203B) is a protein encoded by the FAM203B gene (8q24.3) in humans. While FAM203B is only found in humans and possibly non-human primates, its paralog, FAM203A, is highly conserved. The FAM203B protein contains two conserved domains of unknown function, DUF383 and DUF384, and no transmembrane domains. This protein has no known function yet, although the homolog of FAM203A in Caenorhabditis elegans (Y54H5A.2) is thought to help regulate the actin cytoskeleton.

<span class="mw-page-title-main">FAM214A</span> Protein-coding gene in the species Homo sapiens

Protein FAM214A, also known as protein family with sequence similarity 214, A (FAM214A) is a protein that, in humans, is encoded by the FAM214A gene. FAM214A is a gene with unknown function found at the q21.2-q21.3 locus on Chromosome 15 (human). The protein product of this gene has two conserved domains, one of unknown function (DUF4210) and another one called Chromosome_Seg. Although the function of the FAM214A protein is uncharacterized, both DUF4210 and Chromosome_Seg have been predicted to play a role in chromosome segregation during meiosis.

<span class="mw-page-title-main">CCDC82</span> Protein found in humans

Coiled-Coil Domain Containing protein 82 (CCDC82) is a protein that in humans, is encoded for by the gene of the same name, CCDC82. The CCDC82 gene is expressed in nearly all of human tissues at somewhat low rates. As of today, there are no patents involving CCDC82 and the function remains unknown.

<span class="mw-page-title-main">FAM98A</span> Protein-coding gene in the species Homo sapiens

Family with sequence similarity 98, member A, or FAM98A, is a gene that in the human genome encodes the FAM98A protein. FAM98A has two paralogs in humans, FAM98B and FAM98C. All three are characterized by DUF2465, a conserved domain shown to bind to RNA. FAM98A is also characterized by a glycine-rich C-terminal domain. FAM98A also has homologs in vertebrates and invertebrates and has distant homologs in choanoflagellates and green algae.

<span class="mw-page-title-main">EVI5L</span> Protein-coding gene in the species Homo sapiens

EVI5L is a protein that in humans is encoded by the EVI5L gene. EVI5L is a member of the Ras superfamily of monomeric guanine nucleotide-binding (G) proteins, and functions as a GTPase-activating protein (GAP) with a broad specificity. Measurement of in vitro Rab-GAP activity has shown that EVI5L has significant Rab2A- and Rab10-GAP activity.

Leukocyte Receptor Cluster Member 9 is an uncharacterized protein encoded by the LENG9 gene. In humans, LENG9 is predicted to play a role in fertility and reproductive disorders associated with female endometrium structures.

UPF0575 protein C19orf67 is a protein which in humans is encoded by the C19orf67 gene. Orthologs of C19orf67 are found in many mammals, some reptiles, and most jawed fish. The protein is expressed at low levels throughout the body with the exception of the testis and breast tissue. Where it is expressed, the protein is predicted to be localized in the nucleus to carry out a function. The highly conserved and slowly evolving DUFF3314 region is predicted to form numerous alpha helices and may be vital to the function of the protein.

<span class="mw-page-title-main">Fam89A</span> Human protein and gene

ProteinFAM89A is a protein which in humans is encoded by the FAM89A gene. It is also known as chromosome 1 open reading frame 153 (C1orf153). Highest FAM89A gene expression is observed in the placenta and adipose tissue. Though its function is largely unknown, FAM89A is found to be differentially expressed in response to interleukin exposure, and it is implicated in immune responses pathways and various pathologies such as atherosclerosis and glioma cell expression.

<span class="mw-page-title-main">C14orf119</span> Protein-coding gene in the species Homo sapiens

C14orf119 is a protein that in humans is encoded by the c14orf119 gene. The c14orf119 protein is predicted to be localized in the nucleus. Additionally, c14orf119 expression is decreased in individuals with systemic lupus erythematosus (SLE) when compared with healthy individual and is increased in individuals with various types of lymphomas when compared to healthy individuals.

<span class="mw-page-title-main">FAM98C</span> Gene

Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.

FAM237A is a protein coding gene which encodes a protein of the same name. Within Homo sapiens, FAM237A is believed to be primarily expressed within the brain, with moderate heart and lesser testes expression,. FAM237A is hypothesized to act as a specific activator of receptor GPR83.

<span class="mw-page-title-main">PANO1</span> Mammalian protein found in Homo sapiens

PANO1 is a protein which in humans is encoded by the PANO1 gene. PANO1 is an apoptosis inducing protein that is able to regulate the function of tumor suppressor. More specifically, P14ARF is a protein in which in humans is modulated by the PANO1 gene. P14ARF is known to function as a tumor suppressor. When PANO1 is highly expressed in the cells, it is able to modulate p14ARF by stabilizing it and protecting it from degradation. With a confidence level of 5 out of 5, PANO1 has been theorized to be expressed in the nucleolus of the cell. PANO1 is an intron-less gene. Intron-less genes only make up about 3% of the human genome. A functional analysis of these types of genes revealed that they often have tissue-specific expression in tissues such as the nervous system and testis. This kind of expression is commonly associated with neuropathies, disease, and cancer. The tissue types that PANO1 has the highest expression in, are the cerebellum regions of the brain as well as pituitary and testis tissues.

<span class="mw-page-title-main">C11orf98</span> Protein-coding gene in the species Homo sapiens

C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.

Chromosome 4 open reading frame 54 is a protein that in humans is coded by the c4orf54 gene. This gene is also known as FOPV and LOC285556. This protein is mostly expressed in the nucleus of muscle cells. Orthologs are found in vertebrates but not invertebrates.

<span class="mw-page-title-main">C13orf42</span> C13orf42 gene page

C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

Human protein 53 intron 1 (Hp53int1) is a protein encoded by the Hp53int1 gene in humans.

References

  1. 1 2 "LOC100287387 Gene (Protein Coding)". genecards.org. Retrieved February 4, 2018.
  2. 1 2 3 "LOC100287387 uncharacterized LOC100287387 [Homo sapiens (humans)]". September 3, 2017. Retrieved April 26, 2018.
  3. 1 2 Pagni, Marco; Ioannidis, Vassilios; Cerutti, Lorenzo; Zahn-Zabal, Monique; Jongeneel, C. Victor; Hau, Jörg; Martin, Olivier; Kuznetsov, Dmitri; Falquet, Laurent (2007). "MyHits: improvements to an interactive resource for analyzing protein sequences". Nucleic Acids Research. 35 (Web Server issue): W433–W437. doi:10.1093/nar/gkm352. ISSN   0305-1048. PMC   1933190 . PMID   17545200.
  4. Liu, Wenzhong; Xie, Yubin; Ma, Jiyong; Luo, Xiaotong; Nie, Peng; Zuo, Zhixiang; Lahrmann, Urs; Zhao, Qi; Zheng, Yueyuan (2015-06-10). "IBS: an illustrator for the presentation and visualization of biological sequences: Fig. 1". Bioinformatics. 31 (20): 3359–3361. doi:10.1093/bioinformatics/btv362. ISSN   1367-4803. PMC   4595897 . PMID   26069263.
  5. Brendel, V.; Bucher, P.; Nourbakhsh, I. R.; Blaisdell, B. E.; Karlin, S. (1992-03-15). "Methods and algorithms for statistical analysis of protein sequences". Proceedings of the National Academy of Sciences. 89 (6): 2002–2006. Bibcode:1992PNAS...89.2002B. doi: 10.1073/pnas.89.6.2002 . ISSN   0027-8424. PMC   48584 . PMID   1549558.
  6. Kozlowski, Lukasz P. (2016-10-21). "IPC – Isoelectric Point Calculator". Biology Direct. 11 (1): 55. doi: 10.1186/s13062-016-0159-9 . ISSN   1745-6150. PMC   5075173 . PMID   27769290.
  7. Xue, Yu; Ren, Jian; Gao, Xinjiao; Jin, Changjiang; Wen, Longping; Yao, Xuebiao (2008-09-01). "GPS 2.0, a Tool to Predict Kinase-specific Phosphorylation Sites in Hierarchy". Molecular & Cellular Proteomics. 7 (9): 1598–1608. doi: 10.1074/mcp.M700574-MCP200 . ISSN   1535-9476. PMC   2528073 . PMID   18463090.
  8. Garnier, J.; Gibrat, J. F.; Robson, B. (1996). "GOR method for predicting protein secondary structure from amino acid sequence". Computer Methods for Macromolecular Sequence Analysis. Methods in Enzymology. Vol. 266. pp. 540–553. doi:10.1016/S0076-6879(96)66034-0. ISBN   978-0-12-182167-8. ISSN   0076-6879. PMID   8743705.
  9. Al, Sue; Wiltshire, T (April 20, 2004). "Large-scale analysis of the human transcriptome (HG-U133A)".
  10. "Gene TF Analysis". genomatix.de.
  11. 1 2 3 4 Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. (1997-09-01). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". Nucleic Acids Research. 25 (17): 3389–3402. doi:10.1093/nar/25.17.3389. ISSN   0305-1048. PMC   146917 . PMID   9254694.
  12. "Timetree: The Timescale of Life". Institute for Genomics and Evolutionary Medicine, Temple University.
  13. Kent, W. James (2002-04-01). "BLAT—The BLAST-Like Alignment Tool". Genome Research. 12 (4): 656–664. doi:10.1101/gr.229202. ISSN   1088-9051. PMC   187518 . PMID   11932250.
  14. Horton, Paul (1999). "PSORT: Protein Subcellular Localization Prediction Tool". www.genscript.com. Retrieved 2018-04-23.