C1orf185 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C1orf185 , chromosome 1 open reading frame 185 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1914896 HomoloGene: 49856 GeneCards: C1orf185 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Chromosome 1 open reading frame 185, also known as C1orf185, is a protein that in humans is encoded by the C1orf185 gene. In humans, C1orf185 is a lowly expressed protein that has been found to be occasionally expressed in the circulatory system. [5] [6]
C1orf185 is located on chromosome 1 in humans on the positive strand between bases 51,102,221 and 51,148,086. [7] There are 5 exons in the main splice isoform, however the number and selection of exons varies based on the isoform [7]
C1orf185 has 5 different splice isoforms in humans. [7]
Isoform | mRNA Accession | Protein Accession | Transcript Length (bp) | Protein Length (AA) |
---|---|---|---|---|
uncharacterized protein C1orf185 | NM_001136508.2 | NP_001129980.1 | 921 | 199 |
uncharacterized protein C1orf185 isoform X1 | XM_011541282.2 | XP_011539584.1 | 787 | 195 |
uncharacterized protein C1orf185 isoform X2 | XM_024446525.1 | XP_024302293.1 | 586 | 116 |
uncharacterized protein C1orf185 isoform X3 | XM_024446528.1 | XP_024302296.1 | 420 | 116 |
uncharacterized protein C1orf185 isoform X4 | XM_024446529.1 | XP_024302297.1 | 367 | 107 |
C1orf185 is a member of the pfam15842 protein family, containing a domain of unknown function, DUF4718. [10] This family of proteins is between 130 and 224 amino acids long, and is found only in eukaryotes..
The main splice isoform of C1orf185 has a molecular weight of 22.4 kDa [11] and an isoelectric point of 7.67. [12] It contains a transmembrane domain spanning from positions 15 to 37. [7] There is also a conserved serine-rich region from S123 to S142, which could possibly indicate function as a "splicing activator". [13]
C1orf185 contains 3 primary subcellular domains: an extracellular domain which spans the amino acids from positions 1 to 14, a transmembrane domain from positions 15–37, and a large intracellular domain from positions 38–199. [14]
Below are predicted secondary and tertiary structures of C1orf185, modeled using the Chou-Fasman [15] secondary structure prediction tool and the I-TASSER [16] protein structure and function prediction tool. Chou-Fasman predicts a mixture of α-helices, β-sheets, and other structural turns and coils, which can be seen modeled on the I-TASSER prediction.
Below is a diagram showing the locations of predicted transcription factor binding sites in the C1orf185 promoter, along with a table describing the attributes of each individual binding site. The transcription factors were found and analyzed using the ElDorado tool from Genomatix. [17]
Transcription Factor | Detailed matrix info | Matrix similarity | Sequence | +/- |
VTATA.02 | Mammalian C-type LTR TATA box | 0.91 | tgtcaTAAAaacattcc | + |
NKX25.05 | Homeodomain factor Nkx-2.5/Csx | 0.986 | tttttTGAGtgaagtcttg | - |
CDX1.01 | Intestine specific homeodomain factor CDX-1 | 0.988 | ttgccctTTTAtgaaaaaa | + |
VTATA.02 | Mammalian C-type LTR TATA box | 0.914 | tacttTAAAaataagca | - |
ERG.02 | v-ets erythroblastosis virus E26 oncogene homolog | 0.942 | gtctcaaaGGAAaataaaaag | - |
SPI1.02 | SPI-1 proto-oncogene; hematopoietic transcription factor PU.1 | 0.992 | attaaagaGGAAgtctcaaag | - |
FHXB.01 | Fork head homologous X binds DNA with a dual sequence specificity (FHXA and FHXB) | 0.831 | ttctaaATAAcacattt | - |
TGIF.01 | TG-interacting factor belonging to TALE class of homeodomain factors | 1 | tctataaatGTCAatta | + |
ZNF219.01 | Kruppel-like zinc finger protein 219 | 0.913 | ctccaCCCCcgtcagcccaaagg | + |
ZBP89.01 | Zinc finger transcription factor ZBP-89 | 0.956 | catctccaCCCCcgtcagcccaa | + |
CREB.02 | cAMP-responsive element binding protein | 0.922 | cctttgggcTGACgggggtgg | - |
FOXP1_ES.01 | Alternative splicing variant of FOXP1, activated in ESCs | 1 | tcataaaAACAttccag | - |
VTATA.02 | Mammalian C-type LTR TATA box | 0.895 | tgtcaTAAAaacattcc | - |
CREB1.02 | cAMP-responsive element binding protein 1 | 0.949 | tggaaGTGAtgtcataaaaac | - |
SPI1.02 | SPI-1 proto-oncogene; hematopoietic transcription factor PU.1 | 0.979 | atttgagtGGAAgtgatgtca | - |
NKX25.05 | Homeodomain factor Nkx-2.5/Csx | 0.994 | gaattTGAGtggaagtgat | - |
MESP1_2.01 | Mesoderm posterior 1 and 2 | 0.917 | cagtCATAtggct | + |
MESP1_2.01 | Mesoderm posterior 1 and 2 | 0.929 | aagcCATAtgact | - |
DELTAEF1.01 | deltaEF1 | 0.99 | gcttcACCTaaag | + |
ERG.02 | v-ets erythroblastosis virus E26 oncogene homolog | 0.93 | gaagaagaGGAAaatatattt | + |
Matrix similarity correlates to the confidence in the prediction for each individual binding sites. +/- correlates to presence on either the positive or negative strand. The transcription factors are listed in order of appearance from beginning to end of the promoter.
C1orf185 has a very low expression pattern, with the only site in the body showing any signs of expression being the circulatory system. Two NCBI GEO profiles have shown that C1orf185 was consistently overexpressed in whole blood samples within a group of postmenopausal women, [18] as well as being somewhat overexpressed in the peripheral blood of Parkinson's patients compared to controls. [19]
Below is a figure produced by mfold [20] showing predicted mRNA structure of the 3' UTR of C1orf185.
C1orf185 has one conserved miRNA binding site of type 7mer-A1 among several orthologs. [21] The presence of a 7mer-A1 binding site indicates that C1orf185 is likely to be post-transcriptionally repressed. [22]
Below is a figure and table showing predicted post-translational modification sites for C1orf185.
Type of Modification | Tool | Positions in Homo sapiens |
---|---|---|
Phosphorylation | NetPhos [23] | S61, S69, S104, S130, S142, S147, S165, S186 |
Glycation | NetGlycate, [24] NetNGlyc [25] | K5, K50, K98, K113 |
O-GlcNAc | YinOYang [26] | T121, S122, S130 |
The presence of multiple leucine glycation sites indicate that there may be ways to deter the function of the protein, as glycation has been associated with the loss of protein function in blood vessels [27]
C1orf185 has been shown to play a role in the circulatory system, likely in a more reactive role, as it is lowly expressed across many species. It appears in studies surrounding atrial fibrillation [6] and abnormal QRS duration, [5] which implies it may play a role in those circulatory diseases.
Below is a table showing C1orf185 orthologs across a variety of conserved species. Orthologs were found using NCBI BLAST, [28] the dates of divergence were found using TimeTree, [29] and the global sequence identities and similarities were found using the Clustal Omega multiple sequence alignment tool. [30]
Genus and Species | Common Name | Taxonomic Group | Date of Divergence (MYA) | Accession Number | Sequence Length (aa) | Sequence Identity (Global) | Sequence Similarity (Global) |
---|---|---|---|---|---|---|---|
Homo sapiens | Human | Primates | 0 | NP_001129980.1 | 199 | 100% | 100% |
Pongo abelii | Sumatran orangutan | Primates | 15.76 | PNJ53823.1 | 195 | 93.50% | 95.50% |
Cebus capucinus imitator | Capuchin | Primates | 43.2 | XP_017404303.1 | 229 | 77.00% | 79.60% |
Galeopterus variegatus | Sunda flying lemur | Dermoptera | 76 | XP_008578352.1 | 203 | 73.70% | 77.90% |
Oryctolagus cuniculus | Rabbit | Lagomorpha | 90 | XP_008263491.1 | 225 | 69.90% | 76.40% |
Dipodomys ordii | Ord's kangaroo rat | Rodentia | 90 | XP_012877642.1 | 188 | 52.20% | 59.40% |
Mastomys coucha | Southern multimammate mouse | Rodentia | 90 | XP_031234037 | 263 | 51.50% | 61.50% |
Mus musculus | House mouse | Rodentia | 90 | NP_001186019.1 | 226 | 47.40% | 59.50% |
Peromyscus leucopus | White-footed mouse | Rodentia | 90 | XP_028745885.1 | 295 | 41% | 48.20% |
Phyllostomus discolor | Pale spear-nosed bat | Chiroptera | 96 | XP_028367083.1 | 191 | 73.40% | 80.40% |
Myotis davidii | David's myotis | Chiroptera | 96 | XP_006768446.1 | 196 | 71.40% | 78.40% |
Equus caballus | Horse | Perissodactyla | 96 | XP_023485921.1 | 243 | 63.80% | 68.30% |
Muntiacus muntjak | Indian muntjac | Artiodactyla | 96 | KAB0362285.1 | 200 | 59.40% | 65.90% |
Hipposideros armiger | Great roundleaf bat | Chiroptera | 96 | XP_019487867.1 | 157 | 54.90% | 59.20% |
Tursiops truncatus | Bottlenose dolphin | Artiodactyla | 96 | XP_033708766.1 | 189 | 54.10% | 59.00% |
Sarcophilus harrisii | Tasmanian devil | Dasyuromorhpia | 159 | XP_031825005.1 | 333 | 18.20% | 27.70% |
Ornithorhynchus anatinus | Platypus | Monotremata | 180 | XP_028902271 | 309 | 26.80% | 37.40% |
Pelodiscus sinensis | Chinese softshell turtle | Reptilia | 312 | XP_025042106.1 | 890 | 7.40% | 11.40% |
Gopherus evgoodei | Sinaloan thornscrub tortoise | Reptilia | 312 | XP_030429802.1 | 777 | 4.00% | 6.30% |
Chrysemys picta bellii | Western painted turtle | Reptilia | 312 | XP_023960730.1 | 748 | 3.70% | 5.80% |
Compared to other genes, C1orf185 appears to be evolving and changing relatively quickly, as it is only conserved in mammals and a few turtles, and more distant mammals have quite distant similarities. Primates are the only taxonomic group that heavily conserves this gene with regards to the human sequence, while other mammals and turtles only heavily conserve the transmembrane domain (positions 15–37). As primates and mammals are warm-blooded, this may further support the evidence showing a possible role in the circulatory system.
C11orf49 is a protein coding gene that in humans encodes for the C11orf49 protein. It is heavily expressed in brain tissue and peripheral blood mononuclear cells, with the latter being an important component of the immune system. It is predicted that the C11orf49 protein acts as a kinase, and has been shown to interact with HTT and APOE2.
TSBP1 is a protein that in humans is encoded by the TSBP1 gene. C6orf10 is an open reading frame on chromosome 6 containing a protein that is ubiquitously expressed at low levels in the adult genome and may play a role during fetal development. C6orf10 has been found to be linked to both neurodegenerative and autoimmune diseases in adults. Expression of this gene is highest in the testis but is also seen in other tissue types such as the brain, lens of the eye and the medulla. TSBP1 was previously known as C6orf10.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Coiled-coil domain containing protein 180 (CCDC180) is a protein that in humans is encoded by the CCDC180 gene. This protein is known to localize to the nucleus and is thought to be involved in regulation of transcription as are many proteins containing coiled-coil domains. As it is expressed most highly in the testes and is regulated by SRY and SOX transcription factors, it could be involved in sex determination.
Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.
Chromosome 9 open reading frame 25 (C9orf25) is a domain that encodes the FAM219A gene. The terms FAM219A and C9orf25 are aliases and can be used interchangeably. The function of this gene is not yet completely understood.
Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.
Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.
Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.
Uncharacterized Protein C15orf32 is a protein which in humans is encoded by the C15orf32 gene and is located on chromosome 15, location 15q26.1. Variants of C15orf32 have been linked to bipolar disorder, alcohol use disorder, and acute myeloid leukemia.
Transmembrane protein 155 is a protein that in humans is encoded by the TMEM155 gene. It is located on human chromosome 4, spanning 6,497 bases. It is also referred to as FLJ30834 and LOC132332. This protein is known to be expressed mainly in the brain, placenta, and lymph nodes and is conserved throughout most placental mammals. The function and structure of this protein is still not well understood, but its level of expression has been studied pertaining to various pathologies.
C5orf46 is a protein coding gene located on chromosome 5 in humans. It is also known as sssp1, or skin and saliva secreted protein 1. There are two known isoforms known in humans, with isoform 2 being the longer of the two. The protein encoded is predicted to have one transmembrane domain, and has a predicted molecular weight of 9,692 Da, and a basal isoelectric point of 4.67.
C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.
C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.
C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.
ProteinFAM89A is a protein which in humans is encoded by the FAM89A gene. It is also known as chromosome 1 open reading frame 153 (C1orf153). Highest FAM89A gene expression is observed in the placenta and adipose tissue. Though its function is largely unknown, FAM89A is found to be differentially expressed in response to interleukin exposure, and it is implicated in immune responses pathways and various pathologies such as atherosclerosis and glioma cell expression.
Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.
C3orf56 is a protein encoding gene found on chromosome 3. Although, the structure and function of the protein is not well understood, it is known that the C3orf56 protein is exclusively expressed in metaphase II of oocytes and degrades as the oocyte develops towards the blastocyst stage. Degradation of the C3orf56 protein suggests that this gene plays a role in the progression from maternal to embryonic genome and in embryonic genome activation.