TSBP1 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | TSBP1 , TSBP, chromosome 6 open reading frame 10, testis expressed basic protein 1, C6orf10 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | OMIM: 618151; HomoloGene: 81753; GeneCards: TSBP1; OMA:TSBP1 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
TSBP1 is a protein that in humans is encoded by the TSBP1 gene. [3] [4] TSBP1 was previously known as C6orf10. C6orf10 is an open reading frame on chromosome 6 containing a protein that is ubiquitously expressed at low levels in the adult genome and may play a role during fetal development. C6orf10 has been found to be linked to both neurodegenerative and autoimmune diseases in adults. Expression of this gene is highest in the testis but is also seen in other tissue types such as the brain, lens of the eye and the medulla. [5] [6] [7]
C6orf10 contains seven human mRNA splice variants (a, b, c, X1, X2, X3, X4).
TSBP1 contains a highly conserved stem loop structure in the 3' UTR from bases 100–124.
C6orf10 isoform a is rich in lysine (K), Glutamine (Q) and Glutamic acid (E) and poor in Histidine (H) and Phenylalanine(F) [11] ]. Isoform a is a basic protein with an isoelectric point of 9 and a molecular weight of 62,000 kDa. [12]
This isoform contains two transmembrane regions near the beginning of the amino acid sequence. [13] The first transmembrane region spans from residue 6 to residue 25 (19 total residues) and has an isoelectric point of 5. The second transmembrane region spans from residue 100 to residue 119 (19 total residues) and has an isoelectric point of 8. Isoform a contains a PTZ00121 domain [14] starting with residue 221 and going until the end of the protein. There are several repetitive sequences within this domain.
TSBP1 consists mostly of alpha helices and random coils. [15] There are only a few regions that contain beta sheets. [16] [17]
TSBP1 is predicted to be localized to the Nucleus and the Endoplasmic Reticulum. [13] There is a signal peptide cleavage site between amino acid 30 and 31 [18] which includes the first transmembrane domain. This N-terminal region of C6orf10 is likely localized to the endoplasmic reticulum. The C-terminal region of the protein contain two nuclear localization signals from amino acid 489-505 and 513-529 indicating that the section of the protein after the signal peptide cleavage site is localized to the nucleus.
TSBP1 is ubiquitously expressed at low levels in the adult human genome. In adults, expression of this gene is highest in the testis. [19] C6orf10 is expressed at higher levels in fetal and embryonic tissues. This indicates C6orf10 may play a role in development.
TSBP1 has a promoter that is 1206 bases long. [20] This promoter overlaps with the 3' UTR but ends before the first codon. This promoter is fairly well conserved across primates except for a 136 nucleotide region midway through and the end of the promoter region. [21] Primates have insertions at these two regions that humans are missing. This may suggest that these regions of the promoter are not essential to humans.
TSBP1 transcription is regulated by the binding of many transcription factors to the promoter region. The CCAAT binding protein and TATA box are highly conserved regions that are important in the initiation of transcription. Several of the transcription factors including EH1, NACA, NKX5-2, SIX4, VCR, etc. are involved in developmental pathways. [20]
Abbreviation | Transcription Factor Full Name | Matrix score | Strand |
---|---|---|---|
CSRNP-1 | Cytosine-Serine rich nuclear protein 1(AXUD1, AXIN1 up-regulated-1) | 1.0 | + |
CCAAT Box | CCAAT/enhancer binding protein (C/EBP), gamma | 0.923 | + |
EH1 | Engrailed Homeobox 1 | 0.862 | + |
Cart-1 | Cartilage homeoprotein 1 | 0.997 | - |
ZFP 263 | Zinc finger protein 263, ZKSCAN12 (Zinc finger protein with KRAB and SCAN domains 12) | 0.921 | + |
SWI/SNF | SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily, a member 3 | 0.999 | - |
TATA Box | Vertebrate TATA binding protein factor | 0.899 | + |
HSF2 | Heat shock Factor 2 | 0.974 | - |
Hmx2/Nkx5-2 | Hmx2/Nkx5-2 homeodomain transcription factor | 0.933 | + |
Pdx1 | Insulin promoter factor 1; pancreatic and duodenal homeobox 1 (Pdx1) | 0.924 | + |
LMX1A | LIM homeobox transcription factor 1 alpha | 1.0 | + |
NACA | Nascent polypeptide-associated complex subunit alpha 1 | 1.0 | - |
Oct1 | Octamer binding factor 1 | 0.921 | + |
POU6F1 | POU class 6 homeobox 1 | 0.973 | + |
STAT 5B | Signal transducer and activator of transcription 5B | 0.973 | + |
SIX 4 | Sine oculis homeobox homolog 4 | 0.96 | - |
NMP4 | Nuclear matrix protein 4 | 0.971 | + |
MSX | Homeodomain proteins MSX-1 and MSX-2 | 0.989 | - |
AREB6 | Atp1a1 regulatory element binding factor 6 | 1.0 | - |
VCR | Vertebrate caudal related homeodomain protein | 0.963 | + |
Most of the predicted protein interactions with C6orf10 are based solely on text mining and information gathered from genome-wide association studies. The two proteins with the highest interaction scores were Butyrophilin-like protein 2 (BTNL2) and Tetratricopeptide repeat domain containing TTC32. [22] BTNL2 is a negative regulator of T-cell activity and member of the immunoglobulin superfamily. BTNL2 is located in the C6orf10 gene neighborhood. TTC32 is from a protein family of structural repeat motifs that mediate protein-protein interactions in the formation of protein-protein complexes. [23] This may indicate the potential for C6orf10 to interact with another protein for form a complex.
C6orf10 has been found to be associated with both neurodegenerative diseases and autoimmune diseases. These associations are mostly obtained from genome wide association studies. Common neurodegenerative diseases associated with C6orf10 include frontotemporal dementia, Parkinson's disease, [24] and Alzheimer's disease. [24] Autoimmune diseases associated with C6orf10 include Rheumatoid arthritis, [25] [26] [24] [27] psoriasis, [28] [27] multiple sclerosis, [29] [24] Grave's disease [24] and lupus.[ citation needed ]
By searching the NCBI BLAST database [30] for protein-protein interactions, it was found that C6orf10 is a protein only found in mammals. The BLAST database found the highest number of homologs in the Primates , Artiodactyla , and Carnivora . There were only a couple of homologs in the taxonomic orders of Rodentia, Chiroptera , and Perissodactyla . In the orders of Scandentia, Eulipotyphyla, Tubulidentata and sirenia there was only one complete homolog, but a few partial sequences do exist. There were partial protein sequences in Lagomorpha, Dermoptera , and Macroscelidea and there were no orthologs in Diprotodontia, Didelphimorphia, Cetacea, Dasyuromorphia, Pilosa, Monotremata, and Proboscidea . No homologs were found outside of mammals.
Latin Name | Common Name | Identity | Median Date of divergence (MYA) | |
Primates | Homo sapiens | Human | 100% | 0 |
Primates | Gorilla gorilla gorilla | Gorilla | 83.36% | 8.61 |
Primates | Pongo abelii | Sumatran Orangutan | 82.81% | 15.2 |
Carnivora | Canis lupus familiaris | Dog | 49.79% | 94 |
Carnivora | Canis lupus dingo | Australian Dog | 49.33% | 94 |
Artiodactyla | Bubalus bubalis | Water Buffalo | 49.16% | 94 |
Artiodactyla | Equus caballus | Horse | 49.00% | 94 |
Artiodactyla | Odocoileus virginianus texanus | White Tailed Deer | 44.59% | 94 |
Rodentia | Chinchilla lanigera | Chinchilla | 41.99% | 88 |
Rodentia | Mus caroli | Ryuku mouse | 41.06% | 88 |
Carnivora | Felis catus | Cat | 40.77% | 94 |
Rodentia | Rattus norvegicus | Brown Rat | 37.45% | 88 |
C6orf10 has one paralog that diverged about 135.6 million years ago. This paralog is called Thioredoxin domain containing protein 2 (TXNDC2). [30]
[1] BLAST: Basic Local Alignment Search Tool. National Center for Biotechnology InformationAvailable at: https://blast.ncbi.nlm.nih.gov/Blast.cgi. Accessdate 4 March 2019.
Protein YIF1A is a Yip1 domain family proteins that in humans is encoded by the YIF1A gene.
Interferon-inducible GTPase 5 also known as immunity-related GTPase cinema 1 (IRGC1) is an enzyme that in humans is coded by the IRGC gene. It is predicted to behave like other proteins in the p47-GTPase-like and IRG families. It is most expressed in the testis.
Transmembrane Protein 176B, or TMEM176B is a transmembrane protein that in humans is encoded by the TMEM176B gene. It is thought to play a role in the process of maturation of dendritic cells.
BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
FAM71E1, also known as Family With Sequence Similarity 71 Member E1, is a protein that in humans is encoded by the FAM71E1 gene. It is thought to be ubiquitously expressed at low levels throughout the body, and it is conserved in vertebrates, particularly mammals and some reptiles. The protein is localized to the nucleus and can be exported to the cytoplasm.
Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.
Testis-expressed protein 9 is a protein that in humans is encoded the TEX9 gene. TEX9 that encodes a 391-long amino acid protein containing two coiled-coil regions. The gene is conserved in many species and encodes orthologous proteins in eukarya, archaea, and one species of bacteria. The function of TEX9 is not yet fully understood, but it is suggested to have ATP-binding capabilities.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.
Protein CDV3 homolog also known as carnitine deficiency-associated gene expressed in ventricle 3 is a protein that in humans is encoded by the CDV3 gene.
Chromosome 1 open reading frame 141, or C1orf141 is a protein which, in humans, is encoded by gene C1orf141. It is a precursor protein that becomes active after cleavage. The function is not yet well understood, but it is suggested to be active during development
Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene. The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.
Single-pass membrane and coiled-coil domain-containing protein 3 is a protein that is encoded in humans by the SMCO3 gene.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
GPATCH2L is a protein that is encoded by the GPATCH2L human gene located at 14q24.3. In humans, the length of mRNA in GPATCH2L (NM_017926) is 14,021 base pairs and the gene spans bases is 62,422 nt between chr14: 76,151,922 - 76,214,343. GPATCH2L is on the positive strand. IFT43 is the gene directly before GPATCH2L on the positive strand and LOC105370575 is the uncharacterized gene on the negative strand, which is approximately one and a half the size of GPATCH2L. Known aliases for GPATCH2L contain C14orf118, FLJ20689, FLJ10033, and KIAA1152. GPATCH2L produces 28 distinct introns, 17 different mRNAs, 14 alternatively spliced variants, and 3 unspliced forms. It has 5 probable alternative promoters, 7 validated polyadenylation sites, and 6 predicted promoters of varying lengths.
TEKTIP1, also known as tektin-bundle interacting protein 1, is a protein that in humans is encoded by the TEKTIP1 gene.
NADP-dependent oxidoreductase domain-containing protein 1 is a protein that in humans is encoded by the NOXRED1 gene. An alias of this gene is Chromosome 14 Open Reading Frame 148 (c14orf148). This gene is located on chromosome 14, at 14q24.3. NOXRED1 is predicted to be involved in pyrroline-5-carboxylate reductase activity as part of the L-proline biosynthetic pathway. It is expressed in a wide variety of tissues at a relatively low level, including the testes, thyroid, skin, small intestine, brain, kidney, colon, and more.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Transmembrane protein 248, also known as C7orf42, is a gene that in humans encodes the TMEM248 protein. This gene contains multiple transmembrane domains and is composed of seven exons.TMEM248 is predicted to be a component of the plasma membrane and be involved in vesicular trafficking. It has low tissue specificity, meaning it is ubiquitously expressed in tissues throughout the human body. Orthology analyses determined that TMEM248 is highly conserved, having homology with vertebrates and invertebrates. TMEM248 may play a role in cancer development. It was shown to be more highly expressed in cases of colon, breast, lung, ovarian, brain, and renal cancers.