Glutamate-rich protein 4 is encoded by the gene ERICH4 and can be otherwise known as chromosome 19 open reading frame 69 (C19orf69). [1] ERICH4 is highly conserved in mammals and exhibits overexpression in tissues of the kidneys, terminal ileum, and duodenum. [1] [2] The function of ERICH4 has yet to be well understood by the scientific community but is suggested to contribute to immune inflammatory responses.
ERICH4 is located on the sense strand of 19q13.2 in humans, consists of 2,340 base pairs, and contains 2 exons. [3] ERICH4, on the sense strand, is located within DMAC2 and next PCAT19 and B3NT8 which are all on the antisense strand. [1]
The promoter is predicted to begin 1,806 bp upstream from the 5' UTR and consists of 1,819 bp which overlaps with the coding sequence by 13 bp. [4]
Matrix ID | TF Name | Genomic Position with Human ERICH4 Promoter | Strand of q19 | Matrix Similarity | Literature Supported Function |
---|---|---|---|---|---|
V$CEBPA.01 | CCAAT/Enhancer-binding Protein Alpha | 41,441,752-41,441,766 | Sense (+) | 0.962 | Recruit co-activators that in turn can open up chromatin structure or recruit basal transcription factors. [5] [6] |
O$VTATA.01 | Vertebrate TATA-binding Factor | 41,442,466-41,442,482 | Antisense (-) | 0.915 | Required for initiation of transcription and is associated with a variety of different transcription factors. [7] |
V$SOX1.04 | SRY (Sex Determining Region)-Box 1 | 41,442,480-41,442,502 | Sense (+) | 0.801 | Involved in the regulation of embryonic development and in the determination of the cell fate. [8] |
V$HSF1.04 | Heat Shock Factor 1 | 41,442,536-41,442,560 | Sense (+) | 0.769 | Activation in cellular stress. [9] |
V$BTEB3.01 | Krueppel-like Factor 13 (KLF13) | 41,442,243-41,442,261 | Sense (+) | 0.934 | KLF13 knock-out mice show a defect in lymphocyte survival as KLF13 is a regulator of Bcl-xL expression. [10] |
V$PAX5.01 | B-cell Specific Activator Protein | 41,442,955-41,442,983 | Sense (+) | 0.796 | Key role in B-lymphocyte development. [11] |
V$GLI3.02 | GLI-Kruppel Family Member GLI3 | 41,443,115-41,443,131 | Sense (+) | 0.915 | Thought to play a role during embryogenesis. [12] |
V$NR2F6.01 | Nuclear Receptor subfamily 2 group F member 6 (NR2F6) | 41,442,507-41,442,531 | Sense (+) | 0.851 | Transcriptional repressor of IL17 expression in Th17-differentiated CD4-positive T cells in-vitro and in-vivo. [13] |
V$MAFB.01 | MAFB/Leucine Zipper Transcription Factor | 41,442,601-41,442,625 | Sense (+) | 0.923 | Regulation of lineage-specific hematopoiesis. Represses ETS1-mediated transcription of erythroid-specific genes in myeloid cells. [14] |
V$AP4.03 | Activating Enhancer Binding Protein 4 (TFAP4) | 41,443,003-41,443,019 | Sense (+) | 0.993 | Regulates the expression of genes involved in the regulation of cellular proliferation, stemness, and epithelial-mesenchymal transition. [15] |
V$EVI1.05 | Ecotropic Viral Integration Site 1 (EVI1) Encoded Factor | 41,442,583-41,442,599 | Sense (+) | 0.821 | Regulation of hematopoietic stem cell renewal. Controls several aspects of embryonic development. [16] |
V$MRE.01 | Mineralcorticoid Receptor Response Element | 41,442,844-41,442,862 | Sense (+) | 0.939 | Involved in water electrolyte homeostasis, blood pressure regulation, inflammation, and fibrosis in the renocardiovascular system. [17] |
V$ARE.03 | Androgene Receptor Binding Site, IR3 Sites | 41,442,844-41,442,862 | Antisense (-) | 0.946 | Ligand-dependent transcription factor that controls the expression of specific genes. The binding of the AR to its native ligands 5α-dihydrotestosterone (DHT) and testosterone initiates male sexual development and differentiation. [18] |
V$ZNF217.01 | Zinc Finger Protein 217 | 41,443,023-41,443,035 | Sense (+) | 0.911 | Promotes cell proliferation and antagonizes cell death. [19] |
V$RORA.02 | RAR-related Orphan Receptor Alpha, Homodimer DR5 Binding Site | 41,442,783-41,442,807 | Sense (+) | 0.831 | Possible role in lymphocyte development. Possible function in negatively regulating inflammation due to a report of positive relation in the expression of IKBa, a negative regulator of the NF-kB signaling pathway. [20] |
V$STAT6.01 | Signal Transducer and Activator of Transcription 6 (STAT6) | 41,443,042-41,443,060 | Sense (+) | 0.961 | Plays a central role in exerting IL4-mediated responses. [21] |
V$ZF5.01 | Zinc Finger/POZ Transcription Factor | 41,442,874-41,442,888 | Sense (+) | 0.957 | Role in development, oncogenesis, apoptosis, and transcription repression. [22] |
The ERICH4 mRNA sequence is 955 nucleotides in length with a fold energy predicted as -139.80 kcal/mol with -0.258 energy/base. [23]
ERICH4 has one different protein-encoding transcript variant, or isoform. [1]
Name | mRNA Length (bp) | Protein Length (aa) | Mass (Da) |
---|---|---|---|
Glutamate-rich protein 4 | 955 | 130 | 14,447 |
Glutamate-rich protein 4 isoform X1 | 1741 | 155 | N/A |
The primary encoded protein consists of 130 amino acids and has a predicted molecular mass of 14.5 kDa and isoelectric point of 4 pI. [24] As suggested by the protein's name, glutamate-rich protein 4, the protein is most highly composed of glutamic acid amino acids at 17.7% of the protein's composition followed by leucine at 14.6%, and then proline at 9.2%. [25] ERICH4 has no positive or negative charge clusters. [25] The human protein has one identifiable mixed cluster from amino acid 91 to 116 with 3 positively-charged, 15 negatively-charged, and 8 neutral amino acids. [25] The same mixed cluster region in humans is frequently negative within ERICH4's orthologous proteins. [25] This protein contains no significant hydrophobic or transmembrane segments which are supported with comparison to five of ERICH4's orthologs (Graymouse lemur, Sheep, House mouse, African elephant, and Opossum). [25]
ERICH4 has one identified domain of unknown function, DUF4530, which is found in eukaryotes. [26] Proteins in this family are typically 140 amino acids in length and ERICH4 is a known human member of this family. [27]
A cross-program analysis determines ERICH4 protein to be composed of five separated alpha helixes and five interspersing coils. The alpha helix segments span from amino acids 2-9, 21-24, 47-58, 61-94, and 104-111 in the protein sequence. ERICH4 is not predicted to contain beta-sheets. [28] [29] [30] [31]
Program analysis in SWISS-Model proposes a tertiary structure for ERICH4 by matching the protein against the template of NLRP6 with a sequence identity of 25.79%, sequence similarity of 0.30, and coverage of 0.43 for amino acids 43-92 in ERICH4. [32]
ERICH4 has proposed phosphorylation at serine amino acids 28 and 96 and amino acid 36, a threonine, by casein kinase II and protein kinase c, respectively. [33] [34] ERICH4 is not predicted to be undergo a methionine cleavage or acetylation. [35]
This protein is predicted to be intracellular without any transmembrane regions. Sub-cellular localization is predicted to be mostly localized to the cytoplasm with a reliability score of 70.6 via the Reinhardt's method. [36] No significant O-GlcNAc site and N-myristoylation predictions. [37]
ERICH4’s highest levels of expression are within human tissue of the duodenum and small intestine, followed by the kidneys. [1] [38] Notably, expression within the small intestines is highest in the twentieth week of human fetal development. [1] [39] Within a representative set of mouse (Mus musculus) tissues, Erich4 is most highly expressed within the kidneys, followed by and in decreasing expression, the large intestines, adult duodenum, and adult small intestine. [40] The Sigma-Aldrich antibody product, HPA042632, derived from rabbit, has a strong granular cytoplasmic positivity in cytoplasmic structure in glandular cells (goblet cells) of the rectum. [41]
ERICH4 has high expression within normal tissue and low-to-medium expression with renal cell carcinoma tissue. [42]
An analysis examining ERICH4 was reviewed in tissues of the ileum and colon that were either normal or afflicted with Crohn's disease or ulcerative colitis. ERICH4 had high (~90%) expression within the ileum for all states (normal/control, Crohn's disease, and ulcerative colitis). [43] ERICH4 also has a higher expression in Crohn's disease than in either normal tissue or ulcerative colitis.
The function of ERICH4 has yet to be well understood by the scientific community and therefore, requires further research.
According to STRING analysis, ERICH4 has multiple predicted interactions with other proteins including proteins with associated immune function and expression within the gastrointestinal tract or testes from textmining. [44] No experimentally confirmed protein interactions yet.
Predicted Partner Protein | Score | Associated Functions |
---|---|---|
Tetratricopeptide Repeat Domain 29 (TTC29) | 0.680 | Shown to be significantly upregulated during wound healing of human masticatory mucosa. [45] |
Transmembrane Protein 184A (TMEM184A) | 0.552 | Functions as a heparin receptor and mediates anti-inflammatory responses of ECs involving decreased JNK and p38 activity. [46] |
Insulin-like Growth Factor binding protein Acid Labile Subunit (IGFALS) | 0.509 | Serum protein that binds insulin-like growth factors, increasing their half-life and their vascular localization. [47] |
Serine Peptidase Inhibitor, Kazal type 4 (SPINK4) | 0.500 | Has been shown to exhibit Celiac disease pathology-related differential gene expression, likely derived from altered goblet cell activity. [48] |
Protein Disulfide-Isomerase-Like protein of the Testis (PDILT) | 0.497 | Catalyzes protein folding and thiol-disulfide interchange reactions. This protein lacks oxidoreductase activity in vitro and is suspected to function as a chaperone. [49] |
Orthologs have been identified in most mammals for which complete genome data is available. Notably, ERICH4 orthologs are only present in placental and marsupial mammals but absent in monotremes. [2] The most distant ortholog was identified in the gray short-tailed opossum which is a marsupial mammal. [2]
No significant similarities were found in the vertebrates Aves, Reptilia, Amphibia, Chondrichthyes, Osteichthyes or Agnatha. Searching to exclude vertebrates in BLAST and BLAT produced no significant ortholog findings for invertebrates, fungi, and bacteria. [2]
Species | Common Name | NCBI Accession Number | Sequence Length (AA) | Millions of Years since LCA | % Identity | % Similarity | Taxonomic Group |
---|---|---|---|---|---|---|---|
Homo sapiens | Human | NP_001123986.1 | 130 | --- | 100 | 100 | Primates |
Microcebus murinus | Gray mouse lemur | XP_012616209.1 | 179 | 73 | 72 | 80 | Primates |
Tupaia chinensis | Northern treeshrew | XP_006165343.1 | 137 | 85 | 63 | 72 | Scandentia |
Mus pahari | Gairdner's shrewmouse | XP_021075502.1 | 141 | 88 | 57 | 63 | Rodentia |
Meriones unguiculatus | Mongolian gerbil | XP_021519873.1 | 141 | 88 | 60 | 64 | Rodentia |
Rattus norvegicus | Brown rat | NP_001102923.1 | 147 | 88 | 61 | 65 | Rodentia |
Mus musculus | House mouse | NP_001034332.2 | 140 | 88 | 62 | 71 | Rodentia |
Microtus ochrogaster | Prairie vole | XP_005361243.1 | 140 | 88 | 62 | 71 | Rodentia |
Erinaceus europaeus | European hedgehog | XP_007536664.1 | 129 | 94 | 61 | 68 | Eulipotyphla |
Orcinus orca | Killer whale | XP_004271419.2 | 121 | 94 | 62 | 72 | Cetacea |
Physeter catodon | Sperm whale | XP_007128192.1 | 121 | 94 | 64 | 72 | Cetacea |
Desmodus rotundus | Common vampire bat | XP_024433457.1 | 180 | 94 | 66 | 73 | Chiroptera |
Ovis aries | Sheep | XP_012045823.2 | 131 | 94 | 69 | 75 | Artiodactyla |
Bos taurus | Cattle | XP_002695042.1 | 131 | 94 | 69 | 75 | Artiodactyla |
Pteropus alecto | Black flying fox | XP_006910763.1 | 133 | 94 | 71 | 78 | Chiroptera |
Hipposideros armiger | Great roundleaf bat | XP_019488166.1 | 134 | 94 | 71 | 79 | Chiroptera |
Loxodonta africana | African bush elephant | XP_003420798.1 | 127 | 102 | 58 | 66 | Proboscidea |
Monodelphis domestica | Gray short-tailed opossum | XP_007492011.1 | 106 | 160 | 45 | 61 | Didelphimorphia |
Phascolarctos cinereus | Koala | XP_020834126.1 | 109 | 160 | 48 | 63 | Diprotodontia |
Vombatus ursinus | Common wombat | XP_027701859.1 | 109 | 160 | 49 | 64 | Diprotodontia |
The m value, or number of corrected amino acid changes per 100 residues, for the gene ERICH4 was plotted against the divergence of species in millions of years. When compared to the data of hemoglobin, fibrinogen alpha chain, and cytochrome C, it was determined that the gene has the closest progression to fibrinogen alpha chain, suggesting a relatively rapid pace of evolution. M values for ERICH4 were derived from percentage of identity of species protein sequences compared to the human sequence using the formula derived from the Molecular Clock Hypothesis.
Transmembrane protein 98 is a single-pass membrane protein that in humans is encoded by the TMEM98 gene. The function of this protein is currently unknown. TMEM98 is also known as UNQ536/PRO1079.
UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.
Transmembrane protein 33 is a protein that in humans, is encoded by the TMEM33 gene, also known as SHINC3. Another name for the TMEM33 protein is DB83.
Family with sequence similarity 167, member A is a protein in humans that is encoded by the FAM167A gene located on chromosome 8. FAM167A and its paralogs are protein encoding genes containing the conserved domain DUF3259, a protein of unknown function. FAM167A has many orthologs in which the domain of unknown function is highly conserved.
TMEM143 is a protein that in humans is encoded by TMEM143 gene. TMEM143, a dual-pass protein, is predicted to reside in the mitochondria and high expression has been found in both human skeletal muscle and the heart. Interaction with other proteins indicate that TMEM143 could potentially play a role in tumor suppression/expression and cancer regulation.
C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.
UPF0575 protein C19orf67 is a protein which in humans is encoded by the C19orf67 gene. Orthologs of C19orf67 are found in many mammals, some reptiles, and most jawed fish. The protein is expressed at low levels throughout the body with the exception of the testis and breast tissue. Where it is expressed, the protein is predicted to be localized in the nucleus to carry out a function. The highly conserved and slowly evolving DUFF3314 region is predicted to form numerous alpha helices and may be vital to the function of the protein.
C16orf82 is a protein that, in humans, is encoded by the C16orf82 gene. C16orf82 encodes a 2285 nucleotide mRNA transcript which is translated into a 154 amino acid protein using a non-AUG (CUG) start codon. The gene has been shown to be largely expressed in the testis, tibial nerve, and the pituitary gland, although expression has been seen throughout a majority of tissue types. The function of C16orf82 is not fully understood by the scientific community.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
Transmembrane protein 171 (TMEM171) is a protein that in humans is encoded by the TMEM171 gene.
C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein is 1,984 amino acids long. The gene contains 1 exon and is located at 2p23.3. Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence.
Ski/Dach domain-containing protein 1 is a protein that in humans is encoded by the SKIDA1 gene. It is also known as C10orf140 and DLN-1. It has orthologs in vertebrates. It has two domains: the Ski/Sno/Dac domain and a domain of unknown function, DUF4854. It is associated with multiple types of cancer, like leukemia, ovarian cancer, and colon cancer. It's predicted to be a nuclear protein. It may interact with PRC2.
Coiled-coil domain containing 60 is a protein that in humans is encoded by the CCDC60 gene that is most highly expressed in the trachea, salivary glands, bladder, cervix, and epididymis.
Serum amyloid A-like 1 is a protein in humans encoded by the SAAL1 gene.
C4orf19 is a protein which in humans is encoded by the C4orf19 gene.
Chromosome 20 open reading frame 85, or most commonly known as C20orf85 is a gene that encodes for the C20orf85 Protein. This gene is not yet well understood by the scientific community.
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Transmembrane protein 248, also known as C7orf42, is a gene that in humans encodes the TMEM248 protein. This gene contains multiple transmembrane domains and is composed of seven exons.TMEM248 is predicted to be a component of the plasma membrane and be involved in vesicular trafficking. It has low tissue specificity, meaning it is ubiquitously expressed in tissues throughout the human body. Orthology analyses determined that TMEM248 is highly conserved, having homology with vertebrates and invertebrates. TMEM248 may play a role in cancer development. It was shown to be more highly expressed in cases of colon, breast, lung, ovarian, brain, and renal cancers.
SPMIP10 is a protein that in Homo sapiens is encoded by the SPMIP10 gene.
{{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: Cite journal requires |journal=
(help)CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link)