CHID1 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | CHID1 , SI-CLP, SICLP, GL008, chitinase domain containing 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | OMIM: 615692 MGI: 1915288 HomoloGene: 12229 GeneCards: CHID1 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Chitinase domain-containing protein 1 (CHID1) is a highly conserved protein of unknown function located on the short (p) arm of chromosome 11 near the telomere. [5] The protein has 27 introns, which allows for many isoforms of this gene. It has several aliases, the most common of which is Stabilin-1 interacting chitinase-like protein (SI-CLP). As indicated by the alias, CHID1 is known to interact with the protein STAB1. [6] CHID1 is expressed ubiquitously at levels nearly 6 times the average gene, [7] and is conserved very far back to organisms such as Caenorhabditis elegans and possibly some prokaryotes. This protein is known to have carbohydrate binding sites, which could be involved in carbohydrate catabolysis.
CHID1 is located on chromosome 11 at the location p15.5. [8] It is just downstream of TSPAN4 and upstream of AP2A2. [9] CHID1 is ubiquitously expressed at a high levels. Through microarray analysis, it has been shown that CHID1 is generally expressed at 5.7 times the average gene. [7] CHID1 has many known variants, which is attributed to its 37 exons. There are no inherent repeats or hairpin structures to be found within the coding region of CHID1. This gene is a member of the GH18 superfamily, which dictates some of its protein structure. This gene also has several aliases, the most common of which is Stabilin-1 interacting chitinase-like protein, or SI-CLP, which indicates its known interaction with STAB1. [10]
Due to its large size and many exons, CHID1 has many transcript variants that have been identified through mRNA sequencing. CHID1 has 27 exons and 22 known splice forms. [7] These forms indicate that there may be multiple promotor regions and transcription start sites used within the genome. The most commonly found transcripts translate to about 400 amino acids each.
CHID1 shows a very high level of conservation. It has been identified in many model systems including Drosophila melanogaster, Caenorhabditis elegans, and Oryza sativa. [11] When conservation is observed over a large region, it becomes clear which protein domains are most important to CHID1. By far, the two best conserved factors are exon junctions you can add a link to exon junction, as well as several known carbohydrate binding sites. Another important region appears to be the last 15 amino acids, which retain a high level of conservation even through insect sequences.
Species | Common name | Divergence in millions of years | Accession number | Similarity to human sequence |
---|---|---|---|---|
Gorilla gorilla gorilla | Gorilla | 8.8 | XP_004050445.1 | 100% |
Macaca mulatta | Rhesus macaque | 29.0 | AFI38071.1 | 94% |
Equus caballus | Horse | 94.2 | XP_003362692.1 | 91% |
Felis catus | House cat | 94.2 | XP_003993852.1 | 90% |
Taeniopygia guttata | Finch | 296 | XP_002199280.2 | 78% |
Ictalurus punctatus | Catfish | 400.1 | NP_001187344.1 | 70% |
Apis florea | Honey bee | 782.7 | XP_003691984.1 | 61% |
There are also several proposed paralogs to CHID1. These genes are di-N-acetylchitobiase, Chitinase-3-like protein 2, and chitotriosidase 1. All of these genes have roughly 40% similarity, and match human transcripts of CHID1 over 40-50% of the gene. [12] Based on the level of conservation, these paralogs all likely split at a time near to the split from bacteria.
Name | Accession number | Length (amino acids) | Similarity to CHID1 in humans |
---|---|---|---|
di-N-acetylchitobiase | NP_004379.1 | 385 | 38% |
Chitinase-3-like protein 2 | NP_001020368.1 | 380 | 39% |
Chitotriosidase-1 | NP_001243054.2 | 447 | 40% |
The protein translation of CHID1 is typically about 400 amino acids long (though this varies within the many known forms), and has few post-translational modifications. [13] In most forms and orthologs, it is predicted that CHID1 has a signal peptide that varies in length by sequence. [14] Structure predictions and x-ray crystallography structures of CHID1 indicate that its secondary structure is heavy in alpha helices, though it definitely has some beta strands present, as indicated by the SI-CLP x-ray crystallography structure. CHID1 is also a member of the GH18 superfamily of proteins, which has a unique conserved structure. [15] Members of this family contain an 8 stranded beta/alpha barrel. This protein is also shown to interact with other copies of itself in a homodimer, which also uses sulfate ion interacting molecules for an unknown purpose.
The expression of CHID1 is consistently shown to be ubiquitous and higher than average in all human tissues. [16] [17] [18] When analyzing the promoter region of CHID1 in humans, this expression is explained by the large number of transcription factor binding sites which are active in a wide range of tissues or are even ubiquitous. [19] In Drosophila melanogaster , tissue expression varies wildly between larval and adult tissue. [20] While expression is extremely widespread in adults, it is not at such a high level as in humans. In larvae, CHID1 is only shown to be expressed in specific tissues, and is up-regulated in very different tissues than in adults. In larval tissues, expression is highest in the salivary gland, but in adults the highest expression by far is in the male accessory gland. [20] Though expression is generally high in some species, it is not necessarily abnormal. In situ hybridization of gene transcripts in mouse brains shows that CHID1 is expressed relatively normally. [21] The pattern of high and low expression with the brain is very similar to another widespread gene: beta actin. [22] This is one indication that CHID1 may show normal expression patterning in mammals, despite its general upregulation.
Although CHID1 has no known function for certain, there are several proposed activities of the protein based on current knowledge. This protein may participate in metabolic processes such as chitin catabolism or carbohydrate metabolism. It may locate in various cellular compartments such as the cytoplasm, or in lysosomes all depending on certain post-translational modifications. [23] Another study proposed that CHID1 may have roles in pathogen sensing. [15] The function of CHID1 may also depend on its putative binding partner, STAB1, which is proposed to participate in cell signaling and defense against bacterium. [24]
CHID1 is only strongly suggested to interact with one other protein. The transmembrane protein stabilin-1 has been detected as an interactant by in vitro , in vivo , and yeast two-hybrid assays. [10] STAB1 is a large transmembrane receptor protein which may function in many aspects such as lymphocyte homing or angiogenesis. It is expressed at over twice the average gene level and is expected to play a role in cell defense against bacterium. [25] This interaction may give insight towards the function of both proteins as a whole.
Overall, the significance of CHID1 is unknown. Many expression profiles of CHID1 show that its expression does not change in any common diseases or conditions as currently studied. [26] Given the lack of knowledge about the function of CHID1, it is difficult to further study its role within the human condition.
Transmembrane protein 151B is a protein that in humans is encoded by the TMEM151B gene.
Transmembrane protein 268 is a protein that in humans is encoded by TMEM268 gene. The protein is a transmembrane protein of 342 amino acids long with eight alternative splice variants. The protein has been identified in organisms from the common fruit fly to primates. To date, there has been no protein expression found in organisms simpler than insects.
LOC105377021 is a protein which in humans is encoded by the LOC105377021 gene. LOC105377021 exhibits expressional pathology related to breast cancer, specifically triple negative breast cancer. LOC105377021 contains a serine rich region in addition to predicted alpha helix motifs.
TMEM156 is a gene that encodes the transmembrane protein 156 (TMEM156) in Homo sapiens. It has the clone name of FLJ23235.
Chromosome 16 open reading frame 95 (C16orf95) is a gene which in humans encodes the protein C16orf95. It has orthologs in mammals, and is expressed at a low level in many tissues. C16orf95 evolves quickly compared to other proteins.
Leucine-rich repeats and IQ motif containing 1 is a protein that in humans is encoded by the LRRIQ1 gene. The protein is likely a nuclear encoding mitochondrial protein and is found in all Metazoans.
Cardiac-enriched FHL2-interacting protein (CEFIP) is a protein encoded by the gene C10orf71 on chromosome 10 open reading frame 71. It is primarily understood that this gene is moderately expressed in muscle tissue and cardiac tissue.
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.
Transmembrane and coiled-coil domains 4, TMCO4, is a protein in humans that is encoded by the TMCO4 gene. Currently, its function is not well defined. It is transmembrane protein that is predicted to cross the endoplasmic reticulum membrane three times. TMCO4 interacts with other proteins known to play a role in cancer development, hinting at a possible role in the disease of cancer.
The Family with sequence similarity 149 member B1 is an uncharacterized protein encoded by the human FAM149B1 gene, with one alias KIAA0974. The protein resides in the nucleus of the cell. The predicted secondary structure of the gene contains multiple alpha-helices, with a few beta-sheet structures. The gene is conserved in mammals, birds, reptiles, fish, and some invertebrates. The protein encoded by this gene contains a DUF3719 protein domain, which is conserved across its orthologues. The protein is expressed at slightly below average levels in most human tissue types, with high expression in brain, kidney, and testes tissues, while showing relatively low expression levels in pancreas tissues.
Chromosome 19 open reading frame 18 (c19orf18) is a protein which in humans is encoded by the c19orf18 gene. The gene is exclusive to mammals and the protein is predicted to have a transmembrane domain and a coiled coil stretch. This protein has a function that is not yet fully understood by the scientific community.
LCHN is a protein that in humans is encoded by the KIAA1147 gene located on chromosome 7. It is likely part of the tripartite DENN domain family of proteins that often function as Rab-GEFs to regulate vesicular trafficking. Both the mRNA and protein have been shown to be upregulated following ischemic stroke, and to be produced at altered levels in patients with FTD-ALS, however the gene's contribution to these states is not well understood.
SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20. The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long.
FAM71E1, also known as Family With Sequence Similarity 71 Member E1, is a protein that in humans is encoded by the FAM71E1 gene. It is thought to be ubiquitously expressed at low levels throughout the body, and it is conserved in vertebrates, particularly mammals and some reptiles. The protein is localized to the nucleus and can be exported to the cytoplasm.
Zinc finger CCHC-type containing 18 (ZCCHC18) is a protein that in humans is encoded by ZCCHC18 gene. It is also known as Smad-interacting zinc finger protein 2 (SIZN2), para-neoplastic Ma antigen family member 7b (PNMA7B), and LOC644353. Other names such as zinc finger, CCHC domain containing 12 pseudogene 1, P0CG32, ZCC18_HUMAN had been used to describe this protein.
Chromosome 9 open reading frame 25 (C9orf25) is a domain that encodes the FAM219A gene. The terms FAM219A and C9orf25 are aliases and can be used interchangeably. The function of this gene is not yet completely understood.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene. The function of this protein is still poorly understood.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.