CLIP4 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | CLIP4 , RSNL2, CAP-Gly domain containing linker protein family member 4 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1919100; HomoloGene: 11662; GeneCards: CLIP4; OMA:CLIP4 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
CAP-Gly Domain Containing Linker Protein Family Member 4 is a protein that in humans is encoded by the CLIP4 gene. [6] In terms of conserved domains, the CLIP4 gene contains primarily ankyrin repeats and the eponymous CAP-Gly domains. [6] The structure of the CLIP4 protein is largely made up of coil, with alpha helices dominating the rest of the protein. [7] CLIP4 mRNA expression occurs largely in the adrenal cortex and atrioventricular node. [8] The literature encompassing CLIP4's conserved domains and paralogs points toward microtubule regulation as a possible function of CLIP4.
The human CLIP4 gene, also known as Restin-Like Protein 2 (RSNL2), [9] is located on the plus strand of the short (p) arm of chromosome 2 at region 2, band 3 [9] from base pair 29,096,676 to base pair 29,189,643. CLIP4 is 92,968 base pairs in length and consists of 23 exons. [9]
Transcript | mRNA size (nucleotides) |
CLIP4 transcript variant 1 [10] | 4299 |
CLIP4 transcript variant 2 [11] | 4295 |
CLIP4 transcript variant 3 [12] | 2353 |
The human CLIP4 protein is 705 amino acids in length and is composed of two main types of conserved domains: Two CAP-Gly domains and numerous ankyrin repeats. [9] The secondary structure of CLIP4 consists largely of random coil, with alpha helices as the second-most abundant structure and beta sheets as the third-most abundant structure. [7]
The isoelectronic point of the unprocessed CLIP4 protein is slightly basic (8.62 pI), meaning there is a slight excess of basic amino acids compared to acidic amino acids. [13] The molecular weight is about 65 kD. [13] The most abundant amino acid in CLIP4 is Serine, which makes up 10.7% of the protein. [14] Aligned matching blocks of separated, tandem, and periodic repeats are found between positions 340-345 and 542-547, as well as 447-547 and 564-568. [14] The unusual 9-figure periodic element of a singular Lysine followed by eight other amino acids occurs five times within the protein when compared to the swp23s.q dataset. [14] Another unusual phenomenon is a 7-figure periodic element of a negatively charged amino acid followed by six other hydrophobic amino acids, which occurs six times within the protein when compared to the swp23s.q dataset. [14] There are two instances of Serine spacing and two instances of Phenylalanine spacing that comprise unusually large distances when compared to the swp23s.q dataset. [14]
Isoform | Protein size (amino acids) |
CLIP4 isoform 1 [15] | 705 |
CLIP4 isoform 2 [16] | 599 |
CLIP4 RNA expression is consistently measured to a high degree in the thyroid. [6] Additionally, high degrees of transcription occur in the adrenal cortex and atrioventricular node. [8] The Human Protein Atlas points toward high RNA expression values in the muscle tissues, as well as some in the skin, endocrine tissues, and proximal digestive tract. [17] Greatest protein expression values appeared in the muscle tissues as well, in addition to some in the lung, gastrointestinal tract, liver & gallbladder, and bone marrow & lymphoid tissues. [17]
CLIP4 protein expression seems to be highly expressed during Ada3 deficiency. [18] There also exists a higher trend towards higher CLIP4 expression in the absence of U28. [18]
These transcription factors were chosen and organized based on proximity to the promoter and matrix similarity. [19]
Transcription Factor | Detailed Matrix Info | Anchor Base | Matrix Similarity | Sequence |
---|---|---|---|---|
NOLF | Early B-cell factor 1 | 17 | 0.98 | taagagTCCCcagggcagaaaca |
PAX2 | Zebrafish PAX2 paired domain protein | 18 | 0.8 | aagagtccccagggcagAAACaa |
AP2F | Transcription factor AP-2, alpha | 16 | 0.98 | ctgcCCTGgggactc |
AP2F | Transcription factor AP-2, beta | 16 | 0.899 | gagTCCCcagggcag |
SORY | SRY (sex-determining region Y) box 9, dimeric binding sites | 35 | 0.768 | aAACAaaatccagtgagggagag |
HNF6 | CUT-homeodomain transcription factor Onecut-2 | 32 | 0.827 | aaacaaAATCcagtgag |
PAX5 | B-cell-specific activator protein | 40 | 0.815 | acaaaaTCCAgtgagggagagatgcaggg |
ZF16 | PR/SET domain 15 | 36 | 0.852 | aaatccagtgaGGGA |
SORY | HMGI(Y) high-mobility-group protein I (Y), architectural transcription factor organizing the framework of a nuclear protein-DNA transcriptional complex | 78 | 0.945 | tggaAATTttctaccttaggagc |
NFAT | Nuclear factor of activated T-cells 5 | 83 | 0.955 | ttttGGAAattttctacct |
NFAT | Nuclear factor of activated T-cells 5 | 83 | 0.871 | aggtAGAAaatttccaaaa |
CEBP | CCAAT/enhancer binding protein (C/EBP), epsilon | 89 | 0.975 | agccttttGGAAatt |
CAAT | Cellular and viral CCAAT box | 110 | 0.91 | gcagCCATttaatct |
CAAT | Avian C-type LTR CCAAT box | 165 | 0.875 | cccaCCAAgcagtgg |
CEBP | CCAAT/enhancer binding protein (C/EBP), gamma | 650 | 0.866 | ctaaTTGCtcaacgt |
CEBP | CCAAT/enhancer binding protein alpha | 651 | 0.971 | cacgttgaGCAAtta |
VTBP | Mammalian C-type LTR TATA box | 680 | 0.903 | tgctgTAAAaggcctaa |
TF2B | Transcription factor II B (TFIIB) recognition element | 983 | 1 | ccgCGCC |
TF2B | Transcription factor II B (TFIIB) recognition element | 1157 | 1 | ccgCGCC |
TF2B | Transcription factor II B (TFIIB) recognition element | 1228 | 1 | ccgCGCC |
The human CLIP4 mRNA sequence has 12 stem-loop structures in its 5' UTR and 13 stem-loop structures in its 3' UTR. Of those secondary structures, there are 12 conserved stem-loop secondary structures in the 5'UTR as well as 1 conserved stem-loop secondary structure in the 3' UTR. [20]
The human CLIP4 protein is localized within the cellular nuclear membrane. [21] CLIP4 does not have a signal peptide due to its intracellular localization. [22] It also does not have N-linked glycosylation sites for that same reason. [23] CLIP4 is not cleaved. [24] However, numerous O-linked glycosylation sites are present. [25] A high density of phosphorylation sites are present in the 400-599 amino acid positions on the CLIP4 protein, although many are also present throughout the rest of the protein. [26]
CAP-Gly domains are often associated with microtubule regulation. [27] In addition, ankyrin repeats are known to mediate protein-protein interactions. [28] Furthermore, CLIP1, a paralog of CLIP4 in humans, is known to bind to microtubules and regulate the microtubule cytoskeleton. [29] The CLIP4 protein is also predicted to interact with various microtubule-associated proteins. [30] As a result, it is likely that the CLIP4 protein, although uncharacterized, is associated with microtubule regulation.
The CLIP4 protein is predicted to interact with many proteins associated with microtubules; namely, MAPRE1, MAPRE2, and MAPRE3. It is also predicted to interact with CKAP5 and DCTN1, a cytoskeleton-associated protein and dynactin-associated protein respectively. [30]
CLIP4 activity is correlated with the spread of renal cell carcinomas (RCCs) within the host and could therefore be a potential biomarker for RCC metastasis in cancer patients. [31] Additionally, measurement of promotor methylation levels of CLIP4 using a Global Methylation DNA Index reveals that higher methylation of CLIP4 is associated with an increase in severity of gastritis to possibly gastric cancer. [32] This indicates that CLIP4 could be used for early detection of gastric cancer. [33] A similar finding was also documented for prostate cancer, in which CLIP4 was found to be hypermethylated in patients with prostate cancer. [34]
The presence of CLIP4 was found to be highly increased in samples with predicted severe fibrosis as a result of Chronic Hepatitis C virus (HCV). [35] Additionally, the presence of CLIP4 as a novel self-antigen in Systemic Lupus Arythematosus points to it having a potential role in the disease mechanism. [36]
These orthologs were chosen and organized based on estimated date of divergence from the human protein as well as the global sequence identity. [37]
Binomial Nomenclature | Common Name | Taxonomic Group | Estimated DoD from Human (MYA) | Accession Number | Sequence Length (AA) | Global Sequence Identity to Human Protein (%) | Global Sequence Similarity to Human Protein (%) |
Homo sapiens (Hsa) | Human | Primate | 0 | AAP97312 | 601 | 100 | 100 |
Aotus nancymaae (Ana) | Ma's night monkey | Primate | 43.2 | XP_012330895 | 704 | 83.5 | 83.7 |
Sorex araneus (Sar) | Common shrew | Eulipotyphla | 96 | XP_004620056 | 707 | 74 | 78.5 |
Antrostomus carolinensis (Aca) | Chuck-will's-widow | Aves | 312 | XP_028942997 | 702 | 66.5 | 75.4 |
Gekko japonicus (Gja) | Schlegel's Japanese gecko | Reptilia | 312 | XP_015270366 | 702 | 63.8 | 73.1 |
Rhinatrema bivittatum (Rbi) | Two-lined caecilian | Amphibians | 351.8 | XP_029448862 | 707 | 59.5 | 70.5 |
Callorhinchus milii (Cmi) | Elephant shark | Chondrichthyes | 473 | XP_007895016 | 715 | 52.5 | 65.6 |
Branchiostoma floridae (Bfl) | Florida lancelet | Leptocardii | 684 | XP_002606824 | 481 | 40.4 | 52.8 |
Saccoglossus kowalevskii (Sko) | Acorn worm | Enteropneusta | 684 | XP_006822686 | 648 | 35.7 | 47.5 |
Ixodes scapularis (Isc) | Black-legged tick | Arachnid | 797 | XP_029831090 | 527 | 38.9 | 53 |
Limulus polyphemus (Lpo) | Atlantic horseshoe crab | Arachnid | 797 | XP_013786376 | 462 | 38 | 51.6 |
Lottia gigantea (Lgi) | Owl limpet | Gastropods | 797 | XP_009046843 | 669 | 36.3 | 49.3 |
Mizuhopecten yessoensis (Mye) | Yesso scallop | Bivalvia | 797 | XP_021359747 | 633 | 35.4 | 47.2 |
Parasteatoda tepidariorum (Pte) | Common house spider | Arachnid | 797 | XP_015914966 | 616 | 34.7 | 47.6 |
Aplysia californica (Aca) | California sea hare | Gastropods | 797 | XP_012945346 | 653 | 33.7 | 45.7 |
Crassostrea virginica (Cvi) | Eastern oyster | Bivalvia | 797 | XP_022315879 | 646 | 32.7 | 45.1 |
Tetranychus urticae (Tur) | Two-spotted spider mite | Arachnid | 797 | XP_015790536 | 652 | 31.9 | 43.5 |
Centruroides sculpturatus (Csc) | Bark scorpion | Arachnid | 797 | XP_023229484 | 605 | 30.6 | 43.4 |
Penaeus vannamei (Pva) | Pacific white shrimp | Malacostracans | 797 | XP_027206746 | 681 | 22.9 | 34 |
Monosiga brevicollis (Mbr) | Choanoflagellate | Choanoflagellatea | 1023 | XP_001748580 | 576 | 25.3 | 40.8 |
Ankyrin repeat domain-containing protein 24 is a protein in humans that is coded for by the ANKRD24 gene. The gene is also known as KIAA1981. The protein's function in humans is currently unknown. ANKRD24 is in the protein family that contains ankyrin-repeat domains.
Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.
Zinc finger protein 684 is a protein that in humans is encoded by the ZNF684 gene.
The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Glutamate Rich Protein 2 is a protein in humans encoded by the gene ERICH2. This protein is expressed heavily in male tissues specifically in the testes, and proteins are specifically found in the nucleoli fibrillar center and the vesicles of these testicular cells. The protein has multiple protein interactions which indicate that it may play a role in histone modification and proper histone functioning.
Sperm microtubule associated protein 1 is a protein which in humans is encoded by the SPMAP1 gene. The protein is derived from Homo sapiens chromosome 17. The C17orf98 gene consists of a 6,302 base sequence. Its mRNA has three exons and no alternative splice sites. The protein has 154 amino acids, with no abnormal amino acid levels. C17orf98 has a domain of unknown function (DUF4542) and is 17.6kDa in weight. C17orf98 does not belong to any other families nor does it have any isoforms. The protein has orthologs with high percent similarity in mammals and reptiles. The protein has additional distantly related orthologs across the metazoan kingdom, culminating with the sponge family.
Zinc finger CCHC-type containing 18 (ZCCHC18) is a protein that in humans is encoded by ZCCHC18 gene. It is also known as Smad-interacting zinc finger protein 2 (SIZN2), para-neoplastic Ma antigen family member 7b (PNMA7B), and LOC644353. Other names such as zinc finger, CCHC domain containing 12 pseudogene 1, P0CG32, ZCC18_HUMAN had been used to describe this protein.
TMEM44 is a protein that in humans is encoded by the TMEM44 gene. DKFZp686O18124 is a synonym of TMEM44.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
Uncharacterized Protein C15orf32 is a protein which in humans is encoded by the C15orf32 gene and is located on chromosome 15, location 15q26.1. Variants of C15orf32 have been linked to bipolar disorder, alcohol use disorder, and acute myeloid leukemia.
C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.
C1orf122 is a gene in the human genome that encodes the cytosolic protein ALAESM.. ALAESM is present in all tissue cells and highly up-regulated in the brain, spinal cord, adrenal gland and kidney. This gene can be expressed up to 2.5 times the average gene in its highly expressed tissues. Although the function of C1orf122 is unknown, it is predicted to be used for mitochondria localization.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
SMIM15(small integral membrane protein 15) is a protein in humans that is encoded by the SMIM15 gene. It is a transmembrane protein that interacts with PBX4. Deletions where SMIM15 is located have produced mental defects and physical deformities. The gene has been found to have ubiquitous but variable expression in many tissues throughout the body.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
Ankyrin Repeat And MYND Domain Containing 1 (ANKMY1) is a protein that in humans is encoded by the ANKMY1 gene. Known aliases of ANKMY1 include Zinc Finger Myeloid, Nervy and DEAF-1 or ZMYND13.
Proline-Rich Protein 23A is a protein that is encoded by the Proline-Rich 23A (PRR23A) gene.
{{cite journal}}
: Cite journal requires |journal=
(help){{cite journal}}
: Cite journal requires |journal=
(help){{cite journal}}
: Cite journal requires |journal=
(help)