EPCIP (gene)

Last updated

EPCIP
Identifiers
Aliases EPCIP , B37, C21orf120, PRED81, chromosome 21 open reading frame 62, C21orf62
External IDs MGI: 1921637; HomoloGene: 49594; GeneCards: EPCIP; OMA:EPCIP - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001162495
NM_001162496
NM_019596

NM_001163695
NM_028905

RefSeq (protein)

NP_001155967
NP_001155968
NP_062542

NP_001157167
NP_083181

Location (UCSC) Chr 21: 32.79 – 32.81 Mb Chr 16: 91.05 – 91.1 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse
Figure 1. Predicted Structure of C21orf62 Gene.png
Figure 1.

Exosomal polycystin-1-interacting protein is a protein that, in humans, is encoded by the EPCIP gene. [6] EPCIP is found on human chromosome 21, and it is thought to be expressed in tissues of the brain and reproductive organs. [7] Additionally, EPCIP is highly expressed in ovarian surface epithelial cells during normal regulation, but is not expressed in cancerous ovarian surface epithelial cells. [7]

Contents

Gene

Common aliases of EPCIP are C21orf62, C21orf120, PRED81, and B37. [6] EPCIP is located on chromosome 21 in humans, and is specifically at the q22.11 position. [8] The EPCIP gene is 4132 base pairs in length and contains five exons. [6]

mRNA

The mRNA sequence of EPCIP in humans has one known isoform. This isoform is called uncharacterized protein C21orf62 isoform X1. This isoform is 458 base pairs, or 104 amino acids, in length, and it is significantly shorter than the most observed sequence of EPCIP in humans. In addition to having an isoform, EPCIP also has splice variants. All splice variants encode the same gene, but the differences in splice variant sequences occur in the 5' untranslated region of the mRNA sequence. [6]

Protein

General protein characteristics

The EPCIP protein in humans has a sequence that is 219 amino acids in length. [9] The primary sequence of EPCIP in humans has a molecular weight of 24.9 kDa and an isoelectric point of 8. [10] [11] When it's cleavable signal peptide, which spans amino acids 1-19, is removed, it has a molecular weight of 22.8 kDa and an isoelectric point of 7.8. [10] [11] [12] [13]

Protein composition

EPCIP in humans has higher cysteine and lower valine concentrations than expected compared to other human proteins. This trend, as showed in Table 1, is the same for other mammals. It does not, however, occur in taxa other than mammalia . [14]

Table 1. [14] Unusual amino acid concentrations of EPCIP in humans and orthologs.
Genus and SpeciesCommon NameOrganism Clade% CysteineAmino Acid Concentration of Cysteine Compared to Expected% ValineAmino Acid Concentration of Valine Compared to ExpectedOther Amino Acids with High or Low Concentration Compared to Expected
Homo sapiens HumanMammalia4.6%High3.2%Low-
Mus musculus House MouseMammalia4.3%High3.5%Low Glutamic Acid (1.7%, low)
Canis lupus familiaris DogMammalia4.1%High2.7%Low Leucine (14.2%, high)
Physeter catodon Sperm WhaleMammalia4.6%High4.1%Expected Serine (11.9%, high)
Gallus gallus ChickenAves3.1%Expected6.7%Expected Alanine (2.2%, low)

Glycine (3.1%, low)

Proline (1.8%, low)

Phenylalanine (7.1%, high)

Serine (12.4%, high)

Threonine (9.8%)

Chelonia mydas Green Sea Turtle Reptilia 3.6%Expected5.8%Expected Alanine (1.8%, low)

Serine (11.2%, high)

Protein structure

The protein structure of EPCIP in humans consists of a combination of alpha helices and beta sheets. [15] [16] Figure 1 shows a predicted structure of the protein. [5]

Post-translational modifications

EPCIP has a myristoylation site from amino acid 26–31. [17] It has a sumoylation site from amino acid 132–135. [17] [18] Additionally, it has a nuclear export signal from amino acid 98-104. [19]

Expression

Tissue expression

EPCIP is expressed in human tissues of the brain and reproductive organs. [6]

Expression level

EPCIP in humans is moderately expressed in the brain, kidneys, pancreas, prostate, testes, and ovaries. [6] [20] [21]

Regulation of expression

EPCIP is expressed during blastocyst, fetus, and adult states of human development. [20] It is overexpressed during some tumor states, including pancreatic, gastrointestinal, germ cell, and glioma tumors. [20]

Function

The specific function of EPCIP in humans is not yet well understood. [6]

Interacting proteins

EPCIP is thought to potentially interact with nine other proteins. [22] These interactions are shown in Table 2, and they were found through text mining.

Table 2. [22] Proteins with Evidence of Interaction with EPCIP
Protein Full NameProtein Name SymbolBrief Protein Description [6]
BCL2 Interacting Protein Like BNIPLMay function as a bridge molecule that promotes cell death.
Thymosin Beta 4, X-linked Pseudogene 4 TMSB4XP4Potentially influences actin polymerization.
Synovial Sarcoma X Family Member 4 SSX4May function as a repressor of transcription, and can be useful targets in cancer vaccine-based immunotherapy.
Crystallin Beta A2CRYBA2A major protein in vertebrate eyes that maintains lens transparency and reflective index.
Oral Cancer Overexpressed 1ORAOV1A gene that is frequently overexpressed in esophageal squamous cell cancer.
Oligodendrocyte Transcription Factor 1 OLIG1May be expressed during the time from process extension through membrane maintenance in oligodendrocytes.
PAX3 and PAX7 Binding Protein 1GCFC1 (PAXBP1)The encoded protein potentially binds to GC-rich DNA sequences. It is suggested that this gene is involved in the regulation of transcription.
Relaxin/Insulin Like Family Peptide Receptor 1 and 2 RXFP1 and RXFP2Encoded protein is a receptor for the protein hormone relaxin that influences sperm motility and pregnancy.

Clinical significance

EPCIP over or under expression is linked to some types of cancerous cells and tumors. [7] [20]

Homology

Paralogs

There are no known paralogs of EPCIP in humans at this time. [6]

Orthologs

There are currently 193 organisms that are known to be orthologs of EPCIP. [6] The orthologs of EPCIP are deuterostome animals in the clade Chordata . [6] Table 3 shows a range of EPCIP orthologs, their NCBI accession numbers, sequence lengths, and sequence identity to the EPCIP human protein. At this time, EPCIP is not known to have any protostome or invertebrate orthologs. [6]

Table 3. Orthologs of Human Protein EPCIP
Genus and SpeciesCommon NameOrganism CladeEstimated Date of Divergence from Humans (Millions of Years Ago) [23] Accession Number [9] Amino Acid Sequence Length [9] Corrected Sequence Identity to Human Protein [24] [25]
Homo sapiens Humanmammalia0NP_001155967.2219100%
Mus musculus House Mousemammalia90NP_083181.123068.2%
Meleagris gallopavo Wild Turkeyaves312XP_010721230.122556.4%
Chelonia mydas Green Sea Turtlereptilia312XP_007063646.122460.8%
Xenopus tropicalis Western Clawed Frogtetrapoda352NP_001004889.120748.9%
Latimeria chalumnae West Indian Ocean Coelacanthsarcopterygii413XP_005993681.223745.0%
Ictalurus punctatus Channel Catfishactinopterygii435XP_017326002.121429.6%
Callorhinchus milii Australian Ghostsharkcondrichthyes473XP_007904174.122240.4%
Figure 2. Evolution of EPCIP in humans over time. C21orf62 Gene Evolution in Humans.png
Figure 2. Evolution of EPCIP in humans over time.

Evolution rate

EPCIP has an evolution rate that is faster than cytochrome C and fibrinogen. Figure 2 shows the rate of evolution of the EPCIP gene over the past 473 million years.

Related Research Articles

<span class="mw-page-title-main">TSR3</span> Hypothetical human protein

TSR3, or TSR3 Ribosome Maturation Factor, is a hypothetical human protein found on chromosome 16. Its protein is 312 amino acids long and its cDNA has 1214 base pairs. It was previously designated C16orf42.

<span class="mw-page-title-main">C2CD4D</span> Mammalian protein found in Homo sapiens

C2CD4D, or C2 calcium-dependent domain-containing protein 4D is a protein product of the human genome. The gene that codes for this protein is found on chromosome 1, from 150,076,963 to 150,079,657. The gene contains 2 exons and encodes 353 amino acids. Synonyms for C2CD4D are "FAM148D" and NP_001129475. C2CD4D contains a conserved metal binding domain that is a known as Protein kinase C conserved region 2, subgroup 1. This motif is known to be a member of the C2 superfamily, which is present in phospholipases, protein kinases C, and synaptotagmins. The amino acid sequence of C2CD4D can be accessed at Prior to any post translational modification, C2CD4D has a molecular weight of 37.6 kdal. Although scientists have not yet determined where C2CD4D functions within the cell, C2CD4D has a predicted isoelectric point of 11.636 which severely limits the places in which it can be effective. In addition, C2CD4D does not contain any predicted transmembrane domains or any predicted signal peptides.

<span class="mw-page-title-main">HIKESHI</span> Protein-coding gene in the species Homo sapiens

HIKESHI is a protein important in lung and multicellular organismal development that, in humans, is encoded by the HIKESHI gene. HIKESHI is found on chromosome 11 in humans and chromosome 7 in mice. Similar sequences (orthologs) are found in most animal and fungal species. The mouse homolog, lethal gene on chromosome 7 Rinchik 6 protein is encoded by the l7Rn6 gene.

<span class="mw-page-title-main">Proser2</span> Protein-coding gene in the species Homo sapiens

PROSER2, also known as proline and serine rich 2, is a protein that in humans is encoded by the PROSER2 gene. PROSER2, or c10orf47(Chromosome 10 open reading frame 47), is found in band 14 of the short arm of chromosome 10 (10p14) and contains a highly conserved SARG domain. It is a fast evolving gene with two paralogs, c1orf116 and specifically androgen-regulated gene protein isoform 1. The PROSER2 protein has a currently uncharacterized function however, in humans, it may play a role in cell cycle regulation, reproductive functioning, and is a potential biomarker of cancer.

<span class="mw-page-title-main">Transmembrane protein 268</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 268 is a protein that in humans is encoded by TMEM268 gene. The protein is a transmembrane protein of 342 amino acids long with eight alternative splice variants. The protein has been identified in organisms from the common fruit fly to primates. To date, there has been no protein expression found in organisms simpler than insects.

<span class="mw-page-title-main">ANKRD24</span> Protein-coding gene in the species Homo sapiens

Ankyrin repeat domain-containing protein 24 is a protein in humans that is coded for by the ANKRD24 gene. The gene is also known as KIAA1981. The protein's function in humans is currently unknown. ANKRD24 is in the protein family that contains ankyrin-repeat domains.

TMEM156 is a gene that encodes the transmembrane protein 156 (TMEM156) in Homo sapiens. It has the clone name of FLJ23235.

Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.

<span class="mw-page-title-main">FAM210B</span> Protein-coding gene in the species Homo sapiens

FAM210B is a gene that which in Homo sapiens encodes the protein FAM210B. It has been conserved throughout evolutionary history, and is highly expressed in multiple tissues within the human body. FAM210B's primary location is the endoplasmic reticulum.

<span class="mw-page-title-main">PRR29</span> Protein-coding gene in the species Homo sapiens

PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.

Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.

<span class="mw-page-title-main">FAM71F2</span> Protein-coding gene in the species Homo sapiens

FAM71F2 or Family with Sequence Similarity 71 member F2 is a protein that in humans is encoded by the Family with Sequence Similarity 71 member F2 gene. This gene is highly active in the reproductive tissues, specifically the testis, and may serve as a potential biomarker for determining metastatic testicular cancer.

BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.

<span class="mw-page-title-main">C2orf73</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.

The Family with sequence similarity 149 member B1 is an uncharacterized protein encoded by the human FAM149B1 gene, with one alias KIAA0974. The protein resides in the nucleus of the cell. The predicted secondary structure of the gene contains multiple alpha-helices, with a few beta-sheet structures. The gene is conserved in mammals, birds, reptiles, fish, and some invertebrates. The protein encoded by this gene contains a DUF3719 protein domain, which is conserved across its orthologues. The protein is expressed at slightly below average levels in most human tissue types, with high expression in brain, kidney, and testes tissues, while showing relatively low expression levels in pancreas tissues.

<span class="mw-page-title-main">C6orf62</span> Protein-coding gene in the species Homo sapiens

Chromosome 6 open reading frame 62 (C6orf62), also known as X-trans-activated protein 12 (XTP12), is a gene that encodes a protein of the same name. The encoded protein is predicted to have a subcellular location within the cytosol.

<span class="mw-page-title-main">C19orf44</span> Mammalian protein found in Homo sapiens

Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.

<span class="mw-page-title-main">CFAP299</span> Protein found in humans

Cilia- and flagella-associated protein 299 (CFAP299) is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.

<span class="mw-page-title-main">C17orf78</span> Mammalian protein found in Homo sapiens

Uncharacterized protein C17orf78 is a protein encoded by the C17orf78 gene in humans. The name denotes the location of the parent gene, being at the 78th open reading frame, on the 17th human chromosome. The protein is highly expressed in the small intestine, especially the duodenum. The function of C17orf78 is not well defined.

<span class="mw-page-title-main">FAM98C</span> Gene

Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.

References

  1. 1 2 3 ENSG00000205929 GRCh38: Ensembl release 89: ENSG00000262938, ENSG00000205929 Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000039851 Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 Kelley L. "PHYRE2 Protein Fold Recognition Server". www.sbg.bio.ic.ac.uk. Retrieved 2017-05-07.
  6. 1 2 3 4 5 6 7 8 9 10 11 12 "EPCIP exosomal polycystin 1 interacting protein [ Homo sapiens (human) ]". www.ncbi.nlm.nih.gov. Retrieved 2024-05-15.
  7. 1 2 3 "Home - GEO Profiles - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2017-05-07.
  8. Database GH. "C21orf62 Gene - GeneCards | CU062 Protein | CU062 Antibody". www.genecards.org. Retrieved 2017-05-07.
  9. 1 2 3 4 "Protein". www.ncbi.nlm.nih.gov. Retrieved 2017-05-07.
  10. 1 2 Kramer J (1990). "AASTATS". Biology Workbench.
  11. 1 2 Toldo L. "PI Isoelectric Point Determination Program". Biology Workbench.
  12. "PSORT II server - GenScript". www.genscript.com. Retrieved 2017-05-07.
  13. Charpilloz JL. "TERMINUS - Welcome to terminus". terminus.unige.ch. Retrieved 2017-05-07.
  14. 1 2 Brendel V (1992). "Statistical Analysis of PS". Biology Workbench. Archived from the original on 2003-08-11. Retrieved 2017-02-06.
  15. Pearson WR (September 1998). "CHOFAS Analysis". Biology Workbench. Archived from the original on 2003-08-11. Retrieved 2017-02-06.
  16. Pappas GJ Jr (1974–1996). "PELE: Protein Structure Prediction". Biology Workbench. Archived from the original on 2003-08-11. Retrieved 2017-02-06.
  17. 1 2 "Motif Scan". myhits.isb-sib.ch. Retrieved 2017-05-07.
  18. The Cucko Workgroup (May 1, 2017). "GPS-SUMO 2.0 Online Service". sumosp.biocuckoo.org/online.php. Archived from the original on February 17, 2019. Retrieved May 5, 2017.
  19. la Cour T, Kiemer L, Mølgaard A, Gupta R, Skriver K, Brunak S (2004). "Analysis and prediction of leucine-rich nuclear export signals". Protein Eng. Des. Sel. 17 (6): 527–36. doi: 10.1093/protein/gzh062 . PMID   15314210.
  20. 1 2 3 4 "Home - UniGene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2017-05-07.
  21. "The Human Protein Atlas". www.proteinatlas.org. Retrieved 2017-05-07.
  22. 1 2 "STRING: functional protein association networks". string-db.org. Retrieved 2017-05-07.
  23. "TimeTree :: The Timescale of Life". timetree.org. Retrieved 2017-05-07.
  24. "BLAST: Basic Local Alignment Search Tool". blast.ncbi.nlm.nih.gov. Retrieved 2017-05-07.
  25. Myers EW, Miller W (March 1988). "Optimal alignments in linear space". Computer Applications in the Biosciences. 4 (1): 11–17. doi: 10.1093/bioinformatics/4.1.11 . S2CID   8140207.