EPCIP | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | EPCIP , B37, C21orf120, PRED81, chromosome 21 open reading frame 62, C21orf62 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1921637; HomoloGene: 49594; GeneCards: EPCIP; OMA:EPCIP - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Exosomal polycystin-1-interacting protein is a protein that, in humans, is encoded by the EPCIP gene. [6] EPCIP is found on human chromosome 21, and it is thought to be expressed in tissues of the brain and reproductive organs. [7] Additionally, EPCIP is highly expressed in ovarian surface epithelial cells during normal regulation, but is not expressed in cancerous ovarian surface epithelial cells. [7]
Common aliases of EPCIP are C21orf62, C21orf120, PRED81, and B37. [6] EPCIP is located on chromosome 21 in humans, and is specifically at the q22.11 position. [8] The EPCIP gene is 4132 base pairs in length and contains five exons. [6]
The mRNA sequence of EPCIP in humans has one known isoform. This isoform is called uncharacterized protein C21orf62 isoform X1. This isoform is 458 base pairs, or 104 amino acids, in length, and it is significantly shorter than the most observed sequence of EPCIP in humans. In addition to having an isoform, EPCIP also has splice variants. All splice variants encode the same gene, but the differences in splice variant sequences occur in the 5' untranslated region of the mRNA sequence. [6]
The EPCIP protein in humans has a sequence that is 219 amino acids in length. [9] The primary sequence of EPCIP in humans has a molecular weight of 24.9 kDa and an isoelectric point of 8. [10] [11] When it's cleavable signal peptide, which spans amino acids 1-19, is removed, it has a molecular weight of 22.8 kDa and an isoelectric point of 7.8. [10] [11] [12] [13]
EPCIP in humans has higher cysteine and lower valine concentrations than expected compared to other human proteins. This trend, as showed in Table 1, is the same for other mammals. It does not, however, occur in taxa other than mammalia . [14]
Genus and Species | Common Name | Organism Clade | % Cysteine | Amino Acid Concentration of Cysteine Compared to Expected | % Valine | Amino Acid Concentration of Valine Compared to Expected | Other Amino Acids with High or Low Concentration Compared to Expected |
---|---|---|---|---|---|---|---|
Homo sapiens | Human | Mammalia | 4.6% | High | 3.2% | Low | - |
Mus musculus | House Mouse | Mammalia | 4.3% | High | 3.5% | Low | Glutamic Acid (1.7%, low) |
Canis lupus familiaris | Dog | Mammalia | 4.1% | High | 2.7% | Low | Leucine (14.2%, high) |
Physeter catodon | Sperm Whale | Mammalia | 4.6% | High | 4.1% | Expected | Serine (11.9%, high) |
Gallus gallus | Chicken | Aves | 3.1% | Expected | 6.7% | Expected | Alanine (2.2%, low) Glycine (3.1%, low) Proline (1.8%, low) Phenylalanine (7.1%, high) Serine (12.4%, high) Threonine (9.8%) |
Chelonia mydas | Green Sea Turtle | Reptilia | 3.6% | Expected | 5.8% | Expected | Alanine (1.8%, low) Serine (11.2%, high) |
The protein structure of EPCIP in humans consists of a combination of alpha helices and beta sheets. [15] [16] Figure 1 shows a predicted structure of the protein. [5]
EPCIP has a myristoylation site from amino acid 26–31. [17] It has a sumoylation site from amino acid 132–135. [17] [18] Additionally, it has a nuclear export signal from amino acid 98-104. [19]
EPCIP is expressed in human tissues of the brain and reproductive organs. [6]
EPCIP in humans is moderately expressed in the brain, kidneys, pancreas, prostate, testes, and ovaries. [6] [20] [21]
EPCIP is expressed during blastocyst, fetus, and adult states of human development. [20] It is overexpressed during some tumor states, including pancreatic, gastrointestinal, germ cell, and glioma tumors. [20]
The specific function of EPCIP in humans is not yet well understood. [6]
EPCIP is thought to potentially interact with nine other proteins. [22] These interactions are shown in Table 2, and they were found through text mining.
Protein Full Name | Protein Name Symbol | Brief Protein Description [6] |
---|---|---|
BCL2 Interacting Protein Like | BNIPL | May function as a bridge molecule that promotes cell death. |
Thymosin Beta 4, X-linked Pseudogene 4 | TMSB4XP4 | Potentially influences actin polymerization. |
Synovial Sarcoma X Family Member 4 | SSX4 | May function as a repressor of transcription, and can be useful targets in cancer vaccine-based immunotherapy. |
Crystallin Beta A2 | CRYBA2 | A major protein in vertebrate eyes that maintains lens transparency and reflective index. |
Oral Cancer Overexpressed 1 | ORAOV1 | A gene that is frequently overexpressed in esophageal squamous cell cancer. |
Oligodendrocyte Transcription Factor 1 | OLIG1 | May be expressed during the time from process extension through membrane maintenance in oligodendrocytes. |
PAX3 and PAX7 Binding Protein 1 | GCFC1 (PAXBP1) | The encoded protein potentially binds to GC-rich DNA sequences. It is suggested that this gene is involved in the regulation of transcription. |
Relaxin/Insulin Like Family Peptide Receptor 1 and 2 | RXFP1 and RXFP2 | Encoded protein is a receptor for the protein hormone relaxin that influences sperm motility and pregnancy. |
EPCIP over or under expression is linked to some types of cancerous cells and tumors. [7] [20]
There are no known paralogs of EPCIP in humans at this time. [6]
There are currently 193 organisms that are known to be orthologs of EPCIP. [6] The orthologs of EPCIP are deuterostome animals in the clade Chordata . [6] Table 3 shows a range of EPCIP orthologs, their NCBI accession numbers, sequence lengths, and sequence identity to the EPCIP human protein. At this time, EPCIP is not known to have any protostome or invertebrate orthologs. [6]
Genus and Species | Common Name | Organism Clade | Estimated Date of Divergence from Humans (Millions of Years Ago) [23] | Accession Number [9] | Amino Acid Sequence Length [9] | Corrected Sequence Identity to Human Protein [24] [25] |
---|---|---|---|---|---|---|
Homo sapiens | Human | mammalia | 0 | NP_001155967.2 | 219 | 100% |
Mus musculus | House Mouse | mammalia | 90 | NP_083181.1 | 230 | 68.2% |
Meleagris gallopavo | Wild Turkey | aves | 312 | XP_010721230.1 | 225 | 56.4% |
Chelonia mydas | Green Sea Turtle | reptilia | 312 | XP_007063646.1 | 224 | 60.8% |
Xenopus tropicalis | Western Clawed Frog | tetrapoda | 352 | NP_001004889.1 | 207 | 48.9% |
Latimeria chalumnae | West Indian Ocean Coelacanth | sarcopterygii | 413 | XP_005993681.2 | 237 | 45.0% |
Ictalurus punctatus | Channel Catfish | actinopterygii | 435 | XP_017326002.1 | 214 | 29.6% |
Callorhinchus milii | Australian Ghostshark | condrichthyes | 473 | XP_007904174.1 | 222 | 40.4% |
EPCIP has an evolution rate that is faster than cytochrome C and fibrinogen. Figure 2 shows the rate of evolution of the EPCIP gene over the past 473 million years.
TSR3, or TSR3 Ribosome Maturation Factor, is a hypothetical human protein found on chromosome 16. Its protein is 312 amino acids long and its cDNA has 1214 base pairs. It was previously designated C16orf42.
C2CD4D, or C2 calcium-dependent domain-containing protein 4D is a protein product of the human genome. The gene that codes for this protein is found on chromosome 1, from 150,076,963 to 150,079,657. The gene contains 2 exons and encodes 353 amino acids. Synonyms for C2CD4D are "FAM148D" and NP_001129475. C2CD4D contains a conserved metal binding domain that is a known as Protein kinase C conserved region 2, subgroup 1. This motif is known to be a member of the C2 superfamily, which is present in phospholipases, protein kinases C, and synaptotagmins. The amino acid sequence of C2CD4D can be accessed at Prior to any post translational modification, C2CD4D has a molecular weight of 37.6 kdal. Although scientists have not yet determined where C2CD4D functions within the cell, C2CD4D has a predicted isoelectric point of 11.636 which severely limits the places in which it can be effective. In addition, C2CD4D does not contain any predicted transmembrane domains or any predicted signal peptides.
HIKESHI is a protein important in lung and multicellular organismal development that, in humans, is encoded by the HIKESHI gene. HIKESHI is found on chromosome 11 in humans and chromosome 7 in mice. Similar sequences (orthologs) are found in most animal and fungal species. The mouse homolog, lethal gene on chromosome 7 Rinchik 6 protein is encoded by the l7Rn6 gene.
PROSER2, also known as proline and serine rich 2, is a protein that in humans is encoded by the PROSER2 gene. PROSER2, or c10orf47(Chromosome 10 open reading frame 47), is found in band 14 of the short arm of chromosome 10 (10p14) and contains a highly conserved SARG domain. It is a fast evolving gene with two paralogs, c1orf116 and specifically androgen-regulated gene protein isoform 1. The PROSER2 protein has a currently uncharacterized function however, in humans, it may play a role in cell cycle regulation, reproductive functioning, and is a potential biomarker of cancer.
Transmembrane protein 268 is a protein that in humans is encoded by TMEM268 gene. The protein is a transmembrane protein of 342 amino acids long with eight alternative splice variants. The protein has been identified in organisms from the common fruit fly to primates. To date, there has been no protein expression found in organisms simpler than insects.
Ankyrin repeat domain-containing protein 24 is a protein in humans that is coded for by the ANKRD24 gene. The gene is also known as KIAA1981. The protein's function in humans is currently unknown. ANKRD24 is in the protein family that contains ankyrin-repeat domains.
TMEM156 is a gene that encodes the transmembrane protein 156 (TMEM156) in Homo sapiens. It has the clone name of FLJ23235.
Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.
FAM210B is a gene that which in Homo sapiens encodes the protein FAM210B. It has been conserved throughout evolutionary history, and is highly expressed in multiple tissues within the human body. FAM210B's primary location is the endoplasmic reticulum.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.
FAM71F2 or Family with Sequence Similarity 71 member F2 is a protein that in humans is encoded by the Family with Sequence Similarity 71 member F2 gene. This gene is highly active in the reproductive tissues, specifically the testis, and may serve as a potential biomarker for determining metastatic testicular cancer.
BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.
Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.
The Family with sequence similarity 149 member B1 is an uncharacterized protein encoded by the human FAM149B1 gene, with one alias KIAA0974. The protein resides in the nucleus of the cell. The predicted secondary structure of the gene contains multiple alpha-helices, with a few beta-sheet structures. The gene is conserved in mammals, birds, reptiles, fish, and some invertebrates. The protein encoded by this gene contains a DUF3719 protein domain, which is conserved across its orthologues. The protein is expressed at slightly below average levels in most human tissue types, with high expression in brain, kidney, and testes tissues, while showing relatively low expression levels in pancreas tissues.
Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
Cilia- and flagella-associated protein 299 (CFAP299) is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.
Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.