LINC02915 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | LINC02915 , chromosome 15 open reading frame 54, chromosome 15 open reading frame 54 (putative), chromosome 15 putative open reading frame 54, long intergenic non-protein coding RNA 2915, C15orf54 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | HomoloGene: 131352; GeneCards: LINC02915; OMA:LINC02915 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
C15orf54 (Chromosome 15 Open Reading Frame 54) is a protein in humans that is encoded by the C6orf54 gene. This gene is mostly conserved in mammals, primarily primates. While the function of the gene is currently unknown, the gene has shown high expression in the prostate, thymus, appendix, bone marrow, and lungs. [3]
C15orf54 is located on chromosome 15 from 39542870 to 39547048 on the direct strand. This gene is 4,180 bases in length. The gene is otherwise known as LOC400360 or FLJ39531. The gene contains 2 distinct gt-ag introns and two exons with two alternatively spliced mRNAs, both encoding the same protein. [3] The NCBI accession number is NC_000015.10. [4]
C15orf54 has a total of 2 isoforms: variant 1 and variant 2. Variant 1 represents the longer transcript and variant 2 uses an alternate splice site in the 3' exon compared to variant 1. [3]
The complete mRNA is 3095 bp long and contains 2 exons. The 5' UTR contains 383 bp with an in frame stop 48 bp before the Met. The 3' UTR contains 2160 bp followed by the polyA. The standard AATAAA polyadenylation signal is seen about 23 bp before the polyA. The predicted protein product has 183 aa. [3]
The sequence for the C15orf54 protein is as follows: [3]
MEVKFITGKHGGRRPQRAEPQRICRALWLTPWPSLILKLLSWIILSNLFLHLRATHHMTE
LPLRFLYIALSEMTFREQTSHQIIQQMSLSNKLEQNQLYGEVINKETDNPVISSGLTLLF
AQKPQSPGWKNMSSTKRVCTILADSCRAQAHAADRGERGHFGVQILHHFIEVFNVMAVRS
NPF
The dominant protein product is 183 amino acids long and has a predicted molecular weight of 21 kDa. The isoelectric point is 9.87. [3] C15orf54 has a relatively high frequency of leucine at 12.0% and a relatively low frequency of tyrosine at 1.1%. [5] The number of multiplets in this sequence is 12. There are no unusual spacings in this protein. [5]
Analysis of C15orf54 showed a globular domain with multiple motif functional sites. One site is the MAPK-docking motif, which consists of one or more basic and two to four hydrophobic residues in adjacent groups. These motifs regulate specific interactions in the MAPK cascade. Another such site is the LIR motif which is a part of the Atg8 protein family ligands and plays a role in selective autophagy by recruiting specific adaptors bound to ubiquitylated proteins, organelles, or pathogens for degradation. [6]
C15orf54 is non-myristoylated. There was also no sulfinated sites found in this protein. One motif with a high probability of post translational modification sumoylation sites were found. Sumoylation sites are involved in nuclear-cytosolic transport, transcriptional regulation and protein stability. [7]
C15orf54 is composed of both alpha helices and beta sheets, as well as turns and some coils. Alpha helices constituted the majority of the protein. [8]
The membrane topology was determined to be type 1b with a cytoplasmic tail from 34 to 183, indicating that the C-terminal side will be inside. There was a transmembrane region located from 34 to 50. There were dileucine motifs found in the tail at 39 and 118. [9]
Two interacting proteins were found, lsd2_drome and npfr_drome. Lsd2_drome is a lipid storage droplet surface binding protein and npfr_drome is a neuropeptide F receptor.
C15orf54 has one predicted promoter sequence. GXP_6084 is located from 39249718 to 39250757 on the plus strand of chromosome 15 and is composed of 1040 bp. [10]
The following table displays the transcription factors most likely to bind to the GXP_6084 promoter for C15orf54. [10]
Matrix Family | Detailed Family Information |
---|---|
TALE | TG-interacting factor belonging to TALE class of homeodomain factors |
CART | Binding site for S8 type homeodomains |
HAND | T-cell acute lymphocytic leukemia 1, SCL |
ZFHX | AREB6 (Atp1a1 regulatory element binding factor 6) |
TZAP | Zinc finger and BTB domain containing 48 |
SAL4 | Spalt like 4, DRRS, HSAL4, ZNF797 |
TEAF | TEA domain family member 4, TEF-3 |
RUSH | SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 3 |
EGRF | Wilms Tumor Suppressor |
The gene has shown high expression in the prostate, thymus, appendix, bone marrow, and lungs. NCBI AceView shows that the gene is moderately expressed. [3]
TargetScan showed that miRNA hsa-miR-375 was highly conserved across various organisms. This miRNA is specifically expressed in the pancreatic islets, brain, and spinal cord. This miRNA has also been shown to be associated with different cancers, including breast and gastric cancer. [11]
No paralogs of C15orf54 have been detected in the human genome.
Orthologs were primarily found in primates, although many different mammals also exhibited sizeable sequence similarity to the human C15orf54 sequence. Below is a table of selected orthologs sorted by date of divergence for the C15orf54 gene, including closely and distantly related orthologs. [12] [13] C15orf54 was shown to evolve relatively quickly and evenly over time with a faster rate than both Cytochrome C and Fibrinogen Alpha.
Genus and species | Common Name | Taxonomic Group | Date of Divergence - Est. Time (MYA) | Accession Number | Sequence length (aa) | Sequence Identity (%) | Sequence Similarity (%) | n | m |
---|---|---|---|---|---|---|---|---|---|
Homo sapiens | Humans | Primates | 0 | NP_001027544.1 | 803 | 100 | 100 | 0 | 0.0 |
Macaca mulatta | Rhesus Macaque | Primates | 29.44 | AFE75666.1 (extended) | 767 | 91.9 | 93.9 | 8.1 | 8.4 |
Fukomys damarensis | Damara Mole Rat | Rodentia | 90 | XP_010621546.2 | 753 | 73.6 | 79.3 | 26.4 | 30.7 |
Camelus ferus | Wild Bactrian Camel | Artiodactyla | 94 | XP_006175095.2 | 802 | 81.8 | 88.6 | 18.2 | 20.1 |
Odobenus rosmarus divergens | Pacific Walrus | Carnivora | 96 | XP_012418040.1 | 806 | 84.5 | 90.3 | 15.5 | 16.8 |
Mirounga leonina | Southern Elephant Seal | Carnivora | 96 | XP_034842573.1 | 806 | 83.7 | 89.8 | 16.3 | 17.8 |
Manis javanica | Malayan Pangolin | Pholidota | 96 | XP_017502667.1 | 584 | 49.4 | 53.0 | 50.6 | 70.5 |
Echinops telfairi | Lesser Hedgehog Tenrec | Afrosoricida | 102 | XP_030742207.1 | 419 | 31.9 | 38.9 | 68.1 | 114.3 |
Denticeps clupeoides | Denticle herring | Actinoptergyii/Clupeiformes | 435 | XP_028809248.1 | 3037 | 12.4 | 16.7 | 87.6 | 208.7 |
Beroe forskalii | Cigar comb jellies | Ctenophora/Beroida | 540 | AHA51259.1 | 212 | 12.0 | 18.2 | 88.0 | 212.0 |
Araneus ventricosus | Orb weaving Spider | Araneae | 736 | GBN07005.1 | 543 | 30.2 | 38.9 | 69.8 | 119.7 |
Capitella teleta | Segmented annelid worm | Annelida | 797 | ELT92884.1 | 537 | 31.3 | 43.3 | 68.7 | 116.2 |
Thelazia callipaeda | Parasitic nematode | Nematoda/Rhabditida | 797 | VDN04867.1 | 418 | 25.2 | 32.9 | 74.8 | 137.8 |
Drosophila melanogaster | Fruit flies | Diptera | 797 | NP_650197.1 | 501 | 24.4 | 34.8 | 75.6 | 141.1 |
Octopus sinensis | Common octopus | Octopada/Mollusca | 797 | XP_029652221.1 | 5045 | 6.8 | 9.2 | 93.2 | 268.8 |
Nematostella vectensis | Starlet Sea Anemone | Cnidaria/Anthozoa | 824 | EDO31838.1 | 482 | 30.7 | 42.6 | 69.3 | 118.1 |
Macrostomum lignano | Flatworm | Platyhelminthes/Macrostomida | 824 | PAA81016.1 | 477 | 27.0 | 35.6 | 73.0 | 130.9 |
Salpingoeca rosetta | Choanoflagellates | Choanoflagelletes | 1023 | XP_004989424.1 | 480 | 20.5 | 28.3 | 79.5 | 158.5 |
Rhizophagus clarus | Arbuscular mycorrhizal fungi | Fungi/Glomerales | 1105 | GBB86324.1 | 717 | 30.1 | 41.8 | 69.9 | 120.1 |
Salmonella enterica | Gram Negative Bacteria | Salmonella/Enterobacterales | 4290 | EDQ2188565.1 | 310 | 12.8 | 18.2 | 87.2 | 205.6 |
C15orf54 was associated with hypertrophy-associated polymorphisms in heart failure risk [14] and Atherosclerosis risk. [15] C15orf54 was also positively correlated with higher survival rates in patients with gastric cancer. [16] It was also shown to be a locus of interest in determining the glomerular filtration rate in a pool of individuals with Mongolian ancestry [17]
C11orf49 is a protein coding gene that in humans encodes for the C11orf49 protein. It is heavily expressed in brain tissue and peripheral blood mononuclear cells, with the latter being an important component of the immune system. It is predicted that the C11orf49 protein acts as a kinase, and has been shown to interact with HTT and APOE2.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.
Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.
Testis expressed 55 (TEX55) is a human protein that is encoded by the C3orf30 gene located on the forward strand of human chromosome three, open reading frame 30 (3q13.32). TEX55 is also known as Testis-specific conserved, cAMP-dependent type II PK anchoring protein (TSCPA), and uncharacterized protein C3orf30.
ZNF337, also known as zinc finger protein 337, is a protein that in humans is encoded by the ZNF337 gene. The ZNF337 gene is located on human chromosome 20 (20p11.21). Its protein contains 751 amino acids, has a 4,237 base pair mRNA and contains 6 exons total. In addition, alternative splicing results in multiple transcript variants. The ZNF337 gene encodes a zinc finger domain containing protein, however, this gene/protein is not yet well understood by the scientific community. The function of this gene has been proposed to participate in a processes such as the regulation of transcription (DNA-dependent), and proteins are expected to have molecular functions such as DNA binding, metal ion binding, zinc ion binding, which would be further localized in various subcellular locations. While there are no commonly associated or known aliases, an important paralog of this gene is ZNF875.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.
SMIM19, also known as Small Integral Membrane Protein 19, encodes the SMIM19 protein. SMIM19 is a confirmed single-pass transmembrane protein passing from outside to inside, 5' to 3' respectively. SMIM19 has ubiquitously high to medium expression with among varied tissues or organs. The validated function of SMIM19 remains under review because of on sub-cellular localization uncertainty. However, all linked proteins research to interact with SMIM19 are associated with the endoplasmic reticulum (ER), presuming SMIM19 ER association
Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. Isoelectric point was found to be 9.54. The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
C11orf98 is a protein-encoding gene on chromosome 11 in humans of unknown function. It is otherwise known as c11orf48. The gene spans the chromosomal locus from 62,662,817-62,665,210. There are 4 exons. It spans across 2,394 base pairs of DNA and produces an mRNA that is 646 base pairs long.
C4orf36 is a protein that in humans is encoded by the c4orf36 gene.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.