BEND2 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | |||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | GeneCards: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
BEND2 is a protein that in humans is encoded by the BEND2 gene. [1] It is also found in other vertebrates, including mammals, birds, and reptiles. [1] The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. [2] [3] [4] The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation. [5]
BEND2 stands for BEN domain containing 2 and is also known as CXorf20 (HGNC ID: 28509). [1] [6] [7]
The locus for BEND2 is on the minus strand of the X chromosome at Xp22.13. The gene is approximately 58 kilobases in length. [1]
BEND2 contains 14 exons which undergo alternative splicing to create five transcript variants that vary from 4,720 base pairs (bp) to 2,144 bp in the mature mRNA. [1] [7] [3] The longest and most complete transcript of the gene, variant 1, encodes isoform 1 of the BEND2 protein (NP_699177.2). [1]
The untranslated regions (UTR) flanking the coding sequence of BEND2 at the 5' and 3' end of the mature mRNA molecule contain sites for RNA-binding proteins, including RBMX, pum2, and EIF4B as well as microRNA binding sites. The 5'UTR also contains an upstream in-frame stop codon and the 3'UTR contains a polyadenylation signal sequence.
The predicted molecular weight is 87.9 kDal. [10] [11]
The predicted isoelectric point is pH 5.07. [12]
The internal composition is enriched for serine residues. [10]
Corresponding to the five alternative transcripts of BEND2, the protein encoded by this gene is found in two isoforms (1 and 2) as well as three predicted structures (X1, X2, and X3). These isoforms range from 813 to 645 amino acids in length. [1] Isoform 1 is 799 amino acids in length. [13]
The presence of nuclear localization signals within the amino acid sequence or primary structure of the BEND2 protein leads to a prediction of subcellular localization in the nucleus. [14] The pat7 [(P-X(1-3)-(3-4K/R)] signal and a nuclear bipartite signal are both found near the N-terminus of the protein. [14] [15]
The secondary structure for BEND2 is unclear, in particular at the N-terminus, which is poorly conserved between orthologs. The C-terminus contains two BEN domains, which are predicted to form a series of alpha helices. [1] [5]
Based on its primary structure, BEND2 is predicted to undergo N-terminus acetylation, glycation of several lysine residues, SUMOlation, a SUMO interaction at the N-terminus, S-palmitoylation, and extensive phosphorylation. [16]
BEND2 is found to interact with the following proteins through experimental yeast two-hybrid screens or pull down assays.
Experiment type | Protein | Protein Function | Associated diseases |
---|---|---|---|
Two-hybrid screen | Ataxin 1(ATXN1) [17] | Chromatin-binding factor; RNA metabolism | Spinocerebellar ataxia 1/spinocerebellar degeneration |
Two-hybrid screen | Splicing factor 3A subunit 2(SF3A2) [18] | Activation of U2 snRNP; microtubule-binding protein | |
Two-hybrid screen | LIM Homeobox 2 (LHX2) [18] | Transcriptional regulator for cell differentiation; sequence-specific DNA binding | Schizencephaly |
Two-hybrid screen | Proline Rich 20D (PRR20D) [18] | Unknown function | |
Pull down assay | Amyolid precursor protein (APP) [19] | Cell surface receptor in neurons; cleaved to form transcriptional activators | Cerebral amyloid angiopathy; Alzheimer's disease |
BEND2 has two BEN domains at its C-terminus. [1] BEN domains are found in a diverse array of proteins and are predicted to be important for chromatin remodeling as well as for the recruitment of chromatin-modifying factors utilized during the process of transcriptional regulation of gene expression. [5] BEN domains are predicted to form four alpha helices that allow this domain to interact with its DNA target. [5] [20]
Dai et al. 2013 showed that the Drosophila melanogaster Insensitive (Insv) gene and corresponding protein has no domains of known chemical function yet it contains a single BEN domain. They illustrated the activity of the Insv protein in transcriptional regulation of genes and obtained a crystal structure of two Insv BEN domains interacting with their DNA target site. [20]
The expression of the BEND2 gene is regulated and it is therefore not ubiquitously expressed in the human body. High expression occurs in the testis and in the bone marrow. [21] The NCBI EST profile for this gene shows expression only in the testis and in the muscle. [22]
The promoter regulating expression of BEND2 (GXP_2567556) is 1255 base pairs in length and is located directly upstream of the BEND2 gene. It regulates transcription of all five transcriptional variants of BEND2. [23] Genomatix's MatInspector program predicted 418 transcription factor binding sites within the BEND2 promoter, including for SRY, neurogenin, interferon regulatory factor-3 (IRF-3), Ikaros2, and TCF/LEF-1.
The BEND2 protein has no known paralogs within the human genome. [24]
The BEND2 gene belongs to a family of human genes known as "BEN-domain containing”. This includes BANP (BEND1), BEND3, BEND4, BEND5, BEND6, BEND7, NACC1 (BEND8), and NACC2 (BEND9). The loci for these genes are spread throughout the human genome. [25] Each of these genes contains between one and four BEN domains. Except for at these motifs, the genes of the BEN family do not have similar sequences.
The BEND2 gene is conserved across evolutionary time as it has 114 known orthologs in a wide range of vertebrate species including mammals, birds, crocodilia, and amphibians. [26] The BEND2 protein has 42 known orthologs. [27] The C-terminus of the protein, the location of its BEN domains, is highly conserved; however, the N-terminus is not well conserved, even within the order of Primates.
Genus/species | Common name | Order | Date of divergence from H. sapiens (mya) | Accession number | Sequence length | Whole sequence identity | C-terminus identity |
Homo sapiens | Human | Primates | 0 | NP_699177.2 | 799 | 1.000 | 1.000 |
Pongo abelii | Orangutan | Primates | 15.76 | -- | 784 | 0.921 | 0.854 |
Macaca nemestrina | Southern pig-tailed macaque | Primates | 29.44 | XP_011733709.1 | 823 | 0.694 | 0.828 |
Vicugna pacos | Alpaca | Artiodactyla | 96 | XP_015106214.1 | 740 | 0.433 | 0.512 |
Ceratotherium simum simum | White rhinoceros | Perissodactyla | 96 | XP_014646569.1 | 864 | 0.412 | 0.527 |
Loxodonta africana | African bush elephant | Proboscidea | 105 | XP_010594135.1 | 829 | 0.382 | 0.489 |
Canis lupus familiaris | Dog | Carnivora | 96 | XP_013967473.1 | 900 | 0.362 | 0.445 |
Ailuropoda melanoleuca | Giant panada | Carnivora | 96 | XP_019665441.1 | 852 | 0.353 | 0.460 |
Rhinolophus sinicus | Chinese horseshoe bat | Chiroptera | 96 | XP_019610944.1 | 808 | 0.345 | 0.459 |
Dasypus novemcinctus | Nine-banded armadillo | Cingulata | 105 | XP_012377569.1 | 886 | 0.342 | 0.500 |
Trichechus manatus latirostris | Manatee | Sirenia | 105 | XP_012412857.1 | 950 | 0.335 | 0.475 |
Chrysochloris asiatica | Cape golden mole | Afrosoricida | 105 | XP_006835746.1 | 683 | 0.330 | 0.443 |
Oryctolagus cuniculus | European rabbit | Lagomorpha | 90 | XP_017205124.1 | 811 | 0.305 | 0.438 |
Monodelphis domestica | Gray short-tailed opossum | Didelphimorphia | 159 | XP_007500895.1 | 728 | 0.303 | 0.443 |
Ornithorhynchus anatinus | Platypus | Monotremata | 177 | XP_007668655.1 | 715 | 0.302 | 0.429 |
Gavialis gangeticus | Fish-eating crocodile | Crocodilia | 312 | XP_019380828.1 | 697 | 0.309 | 0.458 |
Chelonia mydas | Green sea turtle | Testudines | 312 | XP_007070584.1 | 749 | 0.297 | 0.453 |
Apteryx australis mantelli | North Island brown kiwi | Apterygiformes | 312 | XP_013807123.1 | 647 | 0.295 | 0.444 |
Columba livia | Rock dove | Columbiformes | 312 | XP_005509980.1 | 668 | 0.287 | 0.442 |
Pygoscelis adeliae | Adelie penguin | Sphenisciformes | 312 | XP_009323754.1 | 657 | 0.282 | 0.458 |
Nanorana parkeri | Tibet frog | Anura | 352 | XP_018417228.1 | 586 | 0.260 | 0.376 |
BEND2 is predicted to be a DNA-binding protein due to the presence of BEN domains at its C-terminus, a hypothesis supported by its localization to the nucleus, the transcription factors found in its promoter region, and the nature of the proteins it interacts with. Though the precise function of the BEND2 protein is not yet well understood by the scientific community, BEN domains have been found to be important regulators of transcription. [20]
The diseases that have been linked to BEND2 are related to the central nervous system though expression of the gene is not highly observed in these tissues.
NBEAL1 is a protein that in humans is encoded by the NBEAL1 gene. It is found on chromosome 2q33.2 of Homo sapiens.
WD repeat-containing protein 90 is a protein that, in humans, is encoded by the WDR90 gene (16p13.3). This human protein is 1750 amino acids, and has a molecular weight of 187.7 kDa. It contains multiple WD40 repeat domains and one domain of unknown function. This protein is conserved all the way back to invertebrates. Proteins containing WD transducin repeating domains have been found to play a role in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control, autophagy and apoptosis.
Ankyrin repeat domain-containing protein 24 is a protein in humans that is coded for by the ANKRD24 gene. The gene is also known as KIAA1981. The protein's function in humans is currently unknown. ANKRD24 is in the protein family that contains ankyrin-repeat domains.
The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.
Glutamate Rich Protein 2 is a protein in humans encoded by the gene ERICH2. This protein is expressed heavily in male tissues specifically in the testes, and proteins are specifically found in the nucleoli fibrillar center and the vesicles of these testicular cells. The protein has multiple protein interactions which indicate that it may play a role in histone modification and proper histone functioning.
Leukocyte Receptor Cluster Member 9 is an uncharacterized protein encoded by the LENG9 gene. In humans, LENG9 is predicted to play a role in fertility and reproductive disorders associated with female endometrium structures.
Uncharacterized protein C2orf73 is a protein that in humans is encoded by the C2orf73 gene. The protein is predicted to be localized to the nucleus.
CRACD-like protein. previously known as KIAA1211L is a protein that in humans is encoded by the CRACDL gene. It is highly expressed in the cerebral cortex of the brain. Furthermore, it is localized to the microtubules and the centrosomes and is subcellularly located in the nucleus. Finally, CRACDL is associated with certain mental disorders and various cancers.
The Family with sequence similarity 149 member B1 is an uncharacterized protein encoded by the human FAM149B1 gene, with one alias KIAA0974. The protein resides in the nucleus of the cell. The predicted secondary structure of the gene contains multiple alpha-helices, with a few beta-sheet structures. The gene is conserved in mammals, birds, reptiles, fish, and some invertebrates. The protein encoded by this gene contains a DUF3719 protein domain, which is conserved across its orthologues. The protein is expressed at slightly below average levels in most human tissue types, with high expression in brain, kidney, and testes tissues, while showing relatively low expression levels in pancreas tissues.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
WD repeat containing protein 53 (WDR53) is a protein encoded by the WDR53 gene that has been identified in the human genome by the Human Genome Project but has, at the moment, lacked experimental procedures to understand the function. It is located on chromosome 3 at location 3q29 in Homo sapiens. It has short up and down stream untranslated regions as well as WD40 repeat regions which have been linked to various functions.
TMEM128, also known as Transmembrane Protein 128, is a protein that in humans is encoded by the TMEM128 gene. TMEM128 has three variants, varying in 5' UTR's and start codon location. TMEM128 contains four transmembrane domains and is localized in the Endoplasmic Reticulum membrane. TMEM128 contains a variety of regulation at the gene, transcript, and protein level. While the function of TMEM128 is poorly understood, it interacts with several proteins associated with the cell cycle, signal transduction, and memory.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
Chromosome 5 open reading frame 22 (c5orf22) is a protein-coding gene of poorly characterized function in Homo sapiens. The primary alias is unknown protein family 0489 (UPF0489).
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.