Hematopoietic SH2 Domain Containing (HSH2D) protein is a protein encoded by the hematopoietic SH2 domain containing (HSH2D) gene.
HSH2D is located on chromosome 19 at 19p13.11. Common aliases of the gene include HSH2 (Hematopoietic SH2 Protein) and ALX (Adaptor in Lymphocytes of Unknown Function X). The mRNA encodes two main isoforms. Isoform 1, the longest isoform, contains seven exons. The gene spans from 16134028 to 16158575.
Two main isoforms of HSH2D exist. Isoform 1 has seven exons and is 2,403 bp in length. Isoform 2 has six exons and is 2,936 bp long. Although isoform 2 has longer mRNA, it still produces the smaller isoform in the mature protein. Isoform 2 has a variant 5’ UTR and a different start codon, as well as a shorter N-terminus. [1] The mRNA has a short 5' UTR and a long 3' UTR.
The protein has a molecular weight of 39.0 kilodaltons (kDa) and a pI of 6.678. [2] The main feature of the protein is the SH2 (Src homology) domain, which is a region that has phosphotyrosine receptors and is important in many signaling molecules. [3] This domain is located from residues 26-127.
The secondary structure of the protein contains a helical section around residues 40-50, a sheet between 60-70, helices between 100-110, 135-145, 175-180, 200-225, and additional sheets between 235-240 and 295-300, shown in the figure at the bottom of the section (helices are purple arrows and sheets are red arrows). The protein has several locations of post-translational modifications, especially phosphorylation and GalNAc O-glycosylation, which has been shown to play a role in cancers. [4]
The tertiary structure of the protein has not been confirmed through research, however, predictions using I-TASSER [5] software are useful in visualizing the protein.
Based on NCBI GEO [6] expression profiles and EST analyses, the protein appears to be narrowly expressed throughout human tissues. It is highly expressed in bone marrow, CD4+ and CD8+ T cells, lymph node, mammary gland, spleen, stomach, thyroid, and small intestine tissue. Expression is elevated in cases of early T-cell precursor acute lymphoblastic leukemia [6] and lowered in breast cancer cells that are treated with estrogen, suggesting an interaction between the protein and estrogen.
The function of the HSH2D protein is still not fully understood, however it has been shown to play a role in various cellular functions such as apoptosis, wound healing, vascular endothelial growth factors, membrane-associated intracellular trafficking, biogenesis of lipid droplets and collagen remodeling. [7] It is also thought to play a role in T-cell activation. [8]
HSH2D interacts with several proto-oncogenes, including FES proto-oncogene (FES) and CRK proto-oncogene (CRK). It also has suspected interactions with other proteins such as tyrosine kinase non-receptor 2 (TRK2), PTEN-induced putative kinase (PINK1), and Interleukin 2 (IL2). [9] A summary of these proteins is shown below with their suspected functions.
Name | NCBI Accession Number | Function |
---|---|---|
FES proto-oncogene (FES) | NP_001996.1 | Hematopoiesis, growth factor and cytokine receptor signaling. |
CRK proto-oncogene (CRK) | NP_058431.2 | Adaptor that binds to tyrosine-phosphorylated proteins. Has SH2 and SH3 domains |
Tyrosine kinase non-receptor 2 (TNK2) | NP_005772.3 | Tyrosine kinase which may be linked to tyrosine phosphorylation signal transduction pathways. [10] |
PTEN-induced putative kinase (PINK1) | NP_115785.1 | Serine/threonine protein kinase |
Interleukin 2 (IL2) | NP_000577.2 | Cytokine important for T- and B- cell proliferation [11] |
The HSH2D protein has been studied along other human genes predicted to be involved in the human immune system. HSH2D was found to be highly expressed in patients with ulcerative colitis. [12] The protein is also associated with alpha-interferon activity. [13]
HSH2D has four distant paralogs and several orthologs in other species that have high levels of conservation.
The four paralogs of HSH2D in humans are other proteins containing SH2 domains. They do not have a high level of conservation other than this domain. All paralogs were found through genecards [14]
Name | NCBI Accession Number | Sequence Length (Amino Acids) | Sequence Similarity | Sequence Identity |
---|---|---|---|---|
SH2D2A | NP_001154913.1 | 399 | 29% | 21.7% |
SH2D7 | NP_001094874.1 | 451 | 33.1% | 25.9% |
SH2D4A | NP_001167630.1 | 454 | 12% | 17.4% |
SH2D4B | NP_997255.2 | 357 | 33.7% | 18.1% |
HSH2D has several orthologous proteins that span across several orders of species. The protein was well conserved across mammals as well as a few reptiles, amphibians, and invertebrates. The following list is not exhaustive, rather, it shows the wide range of organisms that the protein may be found in. All orthologous proteins were found with BLAST [15] or BLAT [16] programs.[ by whom? ]
Scientific Name | Common Name | Order | NCBI Accession Number | Sequence Length (Amino Acids) | Sequence Identity | Sequence Similarity |
---|---|---|---|---|---|---|
Pan troglodytes | Chimpanzee | Primates | NP_001229302.1 | 352 | 99% | 99.70% |
Heterocephalus glaber | Naked Mole Rat | Rodentia | EHB15865.1 | 324 | 54% | 63.40% |
Hipposideros armiger | Great roundleaf bat | Chiroptera | XP_019497370.1 | 360 | 67% | 76.10% |
Condylura cristata | Star nosed mole | Soricomorpha | XP_004688256.1 | 355 | 62% | 72.10% |
Camelus dromedarius | Dromedary camel | Cetariodactyla | XP_010993355.1 | 360 | 66% | 75.70% |
Panthera pardus | Leopard | Carnivora | XP_019271712.1 | 360 | 62% | 71.40% |
Meleagris gallopavo | Wild Turkey | Bird | XP_010723595.1 | 326 | 23% | 28.80% |
Anolis carolinensis | Carolina anole | Reptile | XP_016854511.1 | 567 | 22% | 31.50% |
Xenopus tropicalis | Western clawed frog | Amphibian | XP_012809627.1 | 363 | 31% | 42.80% |
Callorhinchus milii | Australian Ghostshark | Fish | XP_007899329.1 | 500 | 26% | 33.10% |
Lingula anatina | Lingula | Invertebrate | XP_013404014.1 | 187 | 18% | 23.40% |
Biomphalaria glabrata | N/A | Invertebrate | XP_013080865.1 | 818 | 12% | 10.10% |
Salpingoeca rosetta | N/A | Protista | XP_004995081.1 | 481 | 17.70% | 10.90% |
Tyrosine-protein kinase ABL1 also known as ABL1 is a protein that, in humans, is encoded by the ABL1 gene located on chromosome 9. c-Abl is sometimes used to refer to the version of the gene found within the mammalian genome, while v-Abl refers to the viral gene, which was initially isolated from the Abelson murine leukemia virus.
Crk-like protein is a protein that in humans is encoded by the CRKL gene.
Tyrosine-protein kinase ABL2 also known as Abelson-related gene (Arg) is an enzyme that in humans is encoded by the ABL2 gene.
C11orf49 is a protein coding gene that in humans encodes for the C11orf49 protein. It is heavily expressed in brain tissue and peripheral blood mononuclear cells, with the latter being an important component of the immune system. It is predicted that the C11orf49 protein acts as a kinase, and has been shown to interact with HTT and APOE2.
E3 ubiquitin-protein ligase RNF128 is an enzyme that in humans is encoded by the RNF128 gene.
Ankyrin repeat domain-containing protein 24 is a protein in humans that is coded for by the ANKRD24 gene. The gene is also known as KIAA1981. The protein's function in humans is currently unknown. ANKRD24 is in the protein family that contains ankyrin-repeat domains.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.
C17orf98 is a protein which in humans is coded by the gene c17orf98. The protein is derived from Homo sapiens chromosome 17. The C17orf98 gene consists of a 6,302 base sequence. Its mRNA has three exons and no alternative splice sites. The protein has 154 amino acids, with no abnormal amino acid levels. C17orf98 has a domain of unknown function (DUF4542) and is 17.6kDa in weight. C17orf98 does not belong to any other families nor does it have any isoforms. The protein has orthologs with high percent similarity in mammals and reptiles. The protein has additional distantly related orthologs across the metazoan kingdom, culminating with the sponge family.
Zinc finger CCHC-type containing 18 (ZCCHC18) is a protein that in humans is encoded by ZCCHC18 gene. It is also known as Smad-interacting zinc finger protein 2 (SIZN2), para-neoplastic Ma antigen family member 7b (PNMA7B), and LOC644353. Other names such as zinc finger, CCHC domain containing 12 pseudogene 1, P0CG32, ZCC18_HUMAN had been used to describe this protein.
Putative uncharacterized protein C6orf52 (C6orf52) is a protein in humans that is encoded by the gene "C6orf52" and has six known isoforms. C6orf52 was identified in 2002 by The National Institutes of Health Mammalian Gene Collection (MGC) Program. C6orf52 has one known paralog, tRNA selenocysteine 1-associated protein 1 (TRNAU1AP).
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
SH3 Domain Binding Kinase Family Member 3 is an enzyme that in humans is encoded by the SBK3 gene. SBK3 is a member of the serine/threonine protein kinase family. The SBK3 protein is known to exhibit transferase activity, especially phosphotransferase activity, and tyrosine kinase activity. It is well-conserved throughout mammalian organisms and has two paralogs: SBK1 and SBK2.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
OCEL1, also called Occludin//ELL Domain Containing 1, is a protein encoding gene located at chromosome 19p13.11 in the human genome. Other aliases for the gene include FLJ22709, FWP009, and S863-9. The function of OCEL1 has not yet been identified.
KIAA1143 is an uncharacterized protein in humans that is encoded by the KIAA1143 gene. it may play a role in cell growth mechanisms and regulation/creation of cytoskeletal structure. This gene is located on chromosome 3 on the minus strand
Maestro heat-like repeat-containing protein family member 9 (MROH9) is a protein which in humans is encoded by the MROH9 gene. The word ‘maestro’ itself is an acronym, standing for male-specific transcription in the developing reproductive organs (MRO). MRO genes belong to the MROH family, which includes MROH9.