C1orf131 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C1orf131 , chromosome 1 open reading frame 131 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1913773 HomoloGene: 11982 GeneCards: C1orf131 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Uncharacterized protein C1orf131 is a protein that in humans is encoded by the gene C1orf131. The first ortholog of this protein was discovered in humans. [5] [6] Subsequently, through the use of algorithms and bioinformatics, homologs of C1orf131 have been discovered in numerous species, and as a result, the name of the majority of the proteins in this protein family is Uncharacterized protein C1orf131 homolog.
In humans C1orf131 is located on the minus strand of chromosome 1 and on the cytogenetic band 1q42.2 along with 193 other genes. [7] Notably, the gene upstream of C1orf131 is GNPAT , and the gene downstream of C1orf131 is TRIM67. When this gene is transcribed in humans, C1orf131 most often forms an mRNA of 1458 base pairs long which is composed of seven exons. There are at least nine others alternative splice forms in humans that produce proteins. They range in size from 129 base pairs (2 exons) to 1458 base pairs (7 exons). [8]
In the C1orf131 protein family, the proteins are between 93 and 450 amino acids long; however, the majority tend to be between 160-295 amino acids long. They have a molecular weight between 10.6 and 49.0 kDa with the majority between 18.6 and 32.7 kDa. They have an isoelectric point between 9.6 and 11.2. [9] Over 30 orthologs from mammals, birds and lizards have been identified as having a poly(A) RNA binding site. [10] All orthologs in this protein family have a domain of unknown function DUF4602. [10] [11] The human protein has been shown to be both phosphorylated and acetylated. [12] [13] [14] [15] [16] [17] These proteins are lysine-rich, charged amino acids (D E H K R), and basic charged amino acids (H K R). [18] The secondary structure of these proteins primarily consist of alpha helices and coils with a small percentage of beta strands. [19] C1orf131 has been shown to interact with ubiquitin [20] through affinity capture followed by mass spectrometry and APP (amyloid beta (A4) precursor protein) [21] through reconstituted complex.
DUF4602 (PF15375) is generally 120+ amino acids long. [22] There is typically only one gene that contains this DUF domain;however, the DUF domain has been identified in two different proteins in several species. In Trichuris suis DUF4602 is found in both hypothetical protein M5114_09117 and tRNA pseudouridine synthase D, and in Echinocuccus granulosus DUF4602 has been found in hypothetical protein EGR 05135 and expressed conserved protein. DUF4602 has been found primarily in eukaryotes; however, DUF4602 has been identified in the virus DRHN1, Bacillus sp. UNC41MFS5, Enterococcus faecalis , and Enterococcus faecalis 13-SD-W-01. In the C1orf131 orthologs the DUF domains are typically located in the middle of the gene toward the C-terminus side in larger proteins (250+ residues) and in smaller orthologs (160-250 residues) the DUF domain is located near the N-terminus. Also in larger orthologs there are regions of low complexity which could indicate that these proteins are intrinsically disordered proteins.
This gene family exists only in eukaryotes. There are no paralogs of this gene; however, there are a few pseudogenes of C1orf131. Thus far they have only been found in orangutans, mouse lemurs, and sloths. [11] When this gene family is compared to cytochrome C, a slow evolving gene, [23] and fibrinogen gamma chain, a fast evolving gene [24] it is shown to evolve at a faster rate than fibrinogen.
TCAIM is a protein that in humans is encoded by the TCAIM gene.
Uncharacterized protein C14orf80 is a protein which in humans is encoded by the chromosome 14 open reading frame 80, C14orf80, gene.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Chromosome 10 open reading frame 67 (C10orf67), also known as C10orf115, LINC01552, and BA215C7.4, is an un-characterized human protein-coding gene. Several studies indicate a possible link between genetic polymorphisms of this and several other genes to chronic inflammatory barrier diseases such as Crohn's Disease and sarcoidosis.
SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20. The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long.
Protein FAM208B is a protein that in humans is encoded by the FAM208B gene. The gene is also known as "chromosome 10 open reading frame 18" (c10orf18). FAM208B is expressed throughout the body however its function has not been established. FAM208b has been observed to be differentially regulated in various cancers and throughout development. While the exact role of the protein is yet to be established, the significant presence of the protein within humans and throughout the phylogenetic tree depicts a central importance of the gene in normal function.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein is 1,984 amino acids long. The gene contains 1 exon and is located at 2p23.3. Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence.
C22orf31 is a protein which in humans is encoded by the C22orf31 gene. The C22orf31 mRNA transcript has an upstream in-frame stop codon, while the protein has a domain of unknown function (DUF4662) spanning the majority of the protein-coding region. The protein has orthologs with high percent similarity in mammals. The most distant orthologs are found in species of bony fish, but C22orf31 is not found in any species of birds or amphibians.
Serum amyloid A-like 1 is a protein in humans encoded by the SAAL1 gene.
C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
Transmembrane protein 101 (TMEM101) is a protein that in humans is encoded by the TMEM101 gene. The TMEM101 protein has been demonstrated to activate the NF-κB signaling pathway. High levels of expression of TMEM101 have been linked to breast cancer.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.
Transmembrane epididymal protein 1 is a transmembrane protein encoded by the TEDDM1 gene. TEDDM1 is also commonly known as TMEM45C and encodes 273 amino acids that contains six alpha-helix transmembrane regions. The protein contains a 118 amino acid length family of unknown function. While the exact function of TEDDM1 is not understood, it is predicted to be an integral component of the plasma membrane.
C1orf159 is a protein that in human is encoded by the C1orf159 gene located on chromosome 1. This gene is also found to be an unfavorable prognosis marker for renal and liver cancer, and a favorable prognosis marker for urothelial cancer.
Chromosome 4 Open Reading Frame 45 (C4orf45) is a protein which in humans is encoded by the C4orf45 gene. It is predicted to be localized in the cytoplasm and nucleus of a cell
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.
C10orf53 is a protein that in humans is encoded by the C10orf53 gene. The gene is located on the positive strand of the DNA and is 30,611 nucleotides in length. The protein is 157 amino acids and the gene has 3 exons. C10orf53 orthologs are found in mammals, birds, reptiles, amphibians, fish, and invertebrates. It is primarily expressed in the testes and at very low levels in the cerebellum, liver, placenta, and trachea.