CXorf49 is a protein, which in humans is encoded by the gene chromosome X open reading frame 49(CXorf49).
The CXorf49 gene has one alias CXorf49B. [1] The recname A8MYA2 also refers to the protein coded by CXorf49 or CXorf49B. [2]
CXorf49 is located on the X chromosome at Xq13.1. It is 3912 base pairs long and the gene sequence has 6 exons. [3] CXorf49 has one protein coding transcript. [4]
The protein has 514 amino acids and a molecular mass of 54.4 kDa. [5] The isoelectric point is 9.3. Compared to other human proteins CXorf49 is glycine- and proline-rich, but the protein has lower levels of asparagine, isoleucine, tyrosine and threonine(Statistical Analysis of Protein Sequences, SAPS [6] ).
The domain of unknown function, DUF4641, is almost the entire protein. It is 433 amino acids long, from amino acid 80 until amino acid number 512. [7] DUF4641 is a part of pfam15483. [8] The domain is proline- and arginine-rich, but DUF4641 has lower levels of isoleucine, tyrosine and threonine compared to other proteins in human (Analysis of Protein Sequences, SAPS [6] ). DUF4641 has an unusual spacing between lysine residues and positive charged amino acids (Analysis of Protein Sequences, SAPS [6] ).
CXorf49 is predicted to have several post-translational sites. This include sites for N-acetyltransferase (NetAcet 1- [9] ), glycation of ε amino groups of lysines (NetGlycate 1.0 [10] ), mucin type GalNAc O-glycosylation (NetOglyc 4.0 [11] ), phosphorylation (NetPhos 2.0 [12] ), sumoylation (SUMOplot Analysis Program [13] ) and O-ß-GlcNAc attachment(YinOYang WWW [14] ).
The CXorf49 protein has been predicted to be located in the cell nucleus (PSORT II [15] ).
The promoter region of CXorf49 is located between base pair 71718051 and 71718785 on the minus strand of the X chromosome and it is 735 bp long (Genomatix’s ElDorado program [16] ). One of the most frequent transcription factor binding-sites in the promoter region are sites for Y-box binding factor.
Though expression of CXorf49 is very low in human cells, is it somewhat higher in connective tissues, testis and uterus(NCBI-Unigene [17] ).
The protein CXorf49 has not yet been shown to interact with other proteins (PSICQUIC [18] ).
CXorf49 is found to be one of the components of a small group of the HL-60 cell proteome that were most prone to form 4-Hydroxy-2-nonenal(HNE) adducts, upon exposure to nontoxic (10 μM) HNE concentrations, along with heat shock 60 kDa protein 1. [19]
Using BLAST [20] no orthologs for CXorf49 are found in single celled organisms, fungi or plants whose genomes have been sequenced. For multicellular organisms orthologs are found in mammals. The table below show a selection of the mammal orthologs. They are listed after time of divergence from human.
Genus and species name | Common name | Accession Number | Sequence length | Identity to human protein |
---|---|---|---|---|
Pan troglodytes | Chimpanzee | XP_001137982 | 514 aa | 98 % |
Callithrix jacchus | Common marmoset | XP_008987719 | 487 aa | 65 % |
Galeopterus variegatus | Malayan flying lemur | XP_008574823 | 525 aa | 54 % |
Tupaia chinensis | Chinese tree shrew | XP_006168003 | 527 aa | 35 % |
Chinchilla lanigera | Long-tailed chinchilla | XP_013358263 | 307 aa | 49 % |
Mus musculus | House mouse | NP_081944 | 513 aa | 36 % |
Canis lupus familiaris | Dog | XP_850392 | 526 aa | 54 % |
Odobenus rosmarus divergens | Pacific walrus | XP_012422579 | 530 aa | 51 % |
Mustela putorius furo | Ferret | XP_004777306 | 544 aa | 50 % |
Lipotes vexillifer | Chinese river dolphin | XP_007452050 | 529 aa | 45 % |
Ovis areis | Sheep | XP_004022229 | 536 aa | 45 % |
Capra hircus | Goat | XP_005700711 | 538 aa | 44 % |
Myotis lucifugus | Little brown bat | XP_006083036 | 500 aa | 42 % |
Myotis davidii | David's myotis | XP_006759573 | 495 aa | 42 % |
Bos taurus | Cattle | NP_001092664 | 534 aa | 42 % |
Equus asinus | Asinus | XP_014707878 | 723 aa | 42 % |
Trichechus manatus latirostris | Florida manatee | XP_012415455 | 505 aa | 44 % |
Dasypus novemcinctus | Nine-banded armadillo | XP_004475873 | 497 aa | 44 % |
Orycteropus afer afer | Aardvark | XP_007957133 | 477 aa | 38 % |
CXorf49 has developed from aardvarks, to the human protein over 105.0 million years.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Coiled-coil domain containing protein 180 (CCDC180) is a protein that in humans is encoded by the CCDC180 gene. This protein is known to localize to the nucleus and is thought to be involved in regulation of transcription as are many proteins containing coiled-coil domains. As it is expressed most highly in the testes and is regulated by SRY and SOX transcription factors, it could be involved in sex determination.
Leukocyte Receptor Cluster Member 9 is an uncharacterized protein encoded by the LENG9 gene. In humans, LENG9 is predicted to play a role in fertility and reproductive disorders associated with female endometrium structures.
Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
LOC101059915 is a protein, which in humans is encoded by the LOC101059915 gene. It is located on the X chromosome and has restricted expression in the testis.
Testis-expressed protein 9 is a protein that in humans is encoded the TEX9 gene. TEX9 that encodes a 391-long amino acid protein containing two coiled-coil regions. The gene is conserved in many species and encodes orthologous proteins in eukarya, archaea, and one species of bacteria. The function of TEX9 is not yet fully understood, but it is suggested to have ATP-binding capabilities.
Chromosome 1 open reading frame 141, or C1orf141 is a protein which, in humans, is encoded by gene C1orf141. It is a precursor protein that becomes active after cleavage. The function is not yet well understood, but it is suggested to be active during development
c7orf26 is a gene in humans that encodes a protein known as c7orf26. Based on properties of c7orf26 and its conservation over a long period of time, its suggested function is targeted for the cytoplasm and it is predicted to play a role in regulating transcription.
Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
Proline-rich protein 16 (PRR16) is a protein coding gene in Homo sapiens. The protein is known by the alias Largen.
C5orf46 is a protein coding gene located on chromosome 5 in humans. It is also known as sssp1, or skin and saliva secreted protein 1. There are two known isoforms known in humans, with isoform 2 being the longer of the two. The protein encoded is predicted to have one transmembrane domain, and has a predicted molecular weight of 9,692 Da, and a basal isoelectric point of 4.67.
C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.
C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.
C1orf122 is a gene in the human genome that encodes the cytosolic protein ALAESM.. ALAESM is present in all tissue cells and highly up-regulated in the brain, spinal cord, adrenal gland and kidney. This gene can be expressed up to 2.5 times the average gene in its highly expressed tissues. Although the function of C1orf122 is unknown, it is predicted to be used for mitochondria localization.
C12orf24 is a gene in humans that encodes a protein known as FAM216A. This gene is primarily expressed in the testis and brain, but has constitutive expression in 25 other tissues. FAM216A is an intracellular protein that has been predicted to reside within the nucleus of cells. The exact function of C12orf24 is unknown. FAM216A is highly expressed in Sertoli cells of the testis as well as different stage spermatids.
ProteinFAM89A is a protein which in humans is encoded by the FAM89A gene. It is also known as chromosome 1 open reading frame 153 (C1orf153). Highest FAM89A gene expression is observed in the placenta and adipose tissue. Though its function is largely unknown, FAM89A is found to be differentially expressed in response to interleukin exposure, and it is implicated in immune responses pathways and various pathologies such as atherosclerosis and glioma cell expression.
Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.
C12orf54 is a protein in humans that is encoded by the C12orf54 gene.