SHLD1 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | SHLD1 , chromosome 20 open reading frame 196, shieldin complex subunit 1, RINN3, C20orf196 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | OMIM: 618028 MGI: 1920997 HomoloGene: 51865 GeneCards: SHLD1 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20. [5] The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long. [5]
C20orf196 is involved in the DNA repair network. Gupta et al. identified C20orf196 as part of a vertebrate-specific protein complex called shieldin. [6] Shieldin is recruited to double stranded breaks (DSB) to promote nonhomologous end joining-dependent repair (NHEJ), immunoglobulin class-switch recombination (CSR), and fusion of unprotected telomeres. [6] Analysis indicates a sub-stoichiometric interaction or weaker interaction affinity of SHLD1 to the shieldin complex. [6]
C20orf196 is located on the short arm of chromosome 20 at 20p12.3, from base pairs 5,750,286 to 5,864,407 on the direct strand. [5] It contains 11 exons. [7]
Its aliases are RINN3 [6] and SHLD1.
C20orf196 produces 9 different mRNAs, with 7 alternatively spliced variants and 2 unspliced forms. [7] There are 3 probable alternative promoters, 3 non-overlapping alternative last exons, and 2 alternative polyadenylation sites. [7] The mRNAs differ by the truncation of the 5' end, truncation of the 3' end, presence or absence of 2 cassette exons, and overlapping exons with different boundaries. [7]
The promoter region is within bases 5749286 to 5750555, totaling 1270 base pairs. [5] The transcription start site is located within bases 5750382 and 5750409, totaling 28 base pairs. [5]
RNA-Seq analysis has shown ubiquitous expression of c20orf196 in 26 human tissues: adrenal, appendix, bone marrow, brain, colon, duodenum, endometrium, esophagus, fat, gall bladder, heart, kidney, liver, lung, lymph node, ovary, pancreas, placenta, prostate, salivary gland, skin, small intestine, spleen, stomach, testis, thyroid, and urinary bladder. [5] The highest C20orf196 mRNA levels were found in the lymph node, tonsil, thyroid, adrenal gland, prostate, pharynx, parathyroid, connective tissue, and bone marrow. [8]
C20orf196 was found to be expressed in soft tissue/muscle tissue tumors, lymphoma tumors, and pancreatic tumors. [9] C20orf196 representation was biased toward the fetal developmental stage. [9] EBI expression data showed high expression of C20orf196 in the diencephalon and cerebral cortex in the developing brain. [9]
The most common transcript encodes a protein that is 205 amino acids long with a molecular mass of 23 kDa. [10] It has a predicted isoelectric point of 4.72. [11] It is predicted to have a half-life around 30 hours. [12] C20orf196 contains 19 positive residues (9.3%), 32 negative residues (15.6%), and 46 hydrophobic residues (22.4%). [13]
C20orf196 contains one domain, DUF4521, which arose in Amniote. [5] DUF4521 spans from amino acid 3 to 201. [5] Several regions of this domain are conserved in c20orf196 orthologs found in mammals, amphibians, and fish. The proteins of this family are functionally uncharacterized.
There are many phosphorylation sites targeted by unspecified serine kinases. [14] C20orf196 is predicted to have one SUMOylation site at amino acid 203 and one N-glycosylation site at amino acid 69. [15] [16] C20orf196 is predicted to have two ubiquitination sites at amino acids 84 and 139. [17]
Several modeling programs predicted a secondary structure containing alpha helix, beta sheet, and coil regions. [18] [19] CFSSP has predicted that C20orf196 secondary structure is 57.1% alpha helices, 48.8% beta strands, and 16.6% beta turns. [20]
Several databases citing yeast two-hybrid screenings have found C20orf196 to interact with PRMT1, QARS, MAD2L2, and CUL3. [21] [22] [23] [24] C20orf196 functionally interacts with REV7, SHLD2, and SHLD3 in the shieldin complex within the DNA repair network. [6]
C20orf196 gene orthologs are found in species including mammals, birds, reptiles, and amphibians. [6] [25] C20orf196 has distant orthologs in bony fish and cartilaginous fish. [6] [25] There are no invertebrate orthologs. [6] Orthologs are found in 163 organisms. [5]
Class | Species | Common Name | Date of Divergence (MYA) | Accession Number | Sequence Identity (%) | Sequence Similarity (%) |
---|---|---|---|---|---|---|
Mammalia (Marsupialia) | Sarcophilus harrisii | Tasmanian devil | 159 | XP_012395605.1 | 55 | 68 |
Phascolarctos cinereus | Koala | 159 | XP_020841153.1 | 54 | 67 | |
Aves | Gallus gallus | Red junglefowl | 312 | XP_015139412.1 | 33 | 49 |
Aptenodytes forsteri | Emperor penguin | 312 | XP_009280865.1 | 35 | 47 | |
Reptilia | Crocodylus porosus | Saltwater crocodile | 312 | XP_019404613.1 | 36 | 50 |
Pogona vitticeps | Central bearded dragon | 312 | XP_020649300.1 | 30 | 46 | |
Thamnophis sirtalis | Common garter snake | 312 | XP_013911941.1 | 33 | 51 | |
Amphibia | Nanorana parkeri | High Himalaya frog | 352 | XP_018422019.1 | 39 | 57 |
Osteichthyes | Monopterus albus | Asian swamp eel | 435 | XP_020455013.1 | 46 | 73 |
Chondrichthyes | Rhincodon typus | Whale shark | 473 | XP_020391945.1 | 30 | 55 |
C20orf196 has a high protein sequence divergence rate. It is a fast evolving protein. It evolves faster than fibrinogen, as seen in the figure to the right.
Genome-wide association studies have identified SNPs found in the C20orf196 gene that are associated with parental longevity, information processing speed, and breast carcinoma occurrence. [26]
Protein YIF1A is a Yip1 domain family proteins that in humans is encoded by the YIF1A gene.
UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.
ARMH3 or Armadillo Like Helical Domain Containing 3, also known as UPF0668 and c10orf76, is a protein that in humans is encoded by the ARMH3 gene. Its function is not currently known, but experimental evidence has suggested that it may be involved in transcriptional regulation. The protein contains a conserved proline-rich motif, suggesting that it may participate in protein-protein interactions via an SH3-binding domain, although no such interactions have been experimentally verified. The well-conserved gene appears to have emerged in Fungi approximately 1.2 billion years ago. The locus is alternatively spliced and predicted to yield five protein variants, three of which contain a protein domain of unknown function, DUF1741.
DEP Domain Containing Protein 1B also known as XTP1, XTP8, HBV XAg-Transactivated Protein 8, [formerly referred to as BRCC3] is a human protein encoded by a gene of similar name located on chromosome 5.
Family with sequence similarity 98, member A, or FAM98A, is a gene that in the human genome encodes the FAM98A protein. FAM98A has two paralogs in humans, FAM98B and FAM98C. All three are characterized by DUF2465, a conserved domain shown to bind to RNA. FAM98A is also characterized by a glycine-rich C-terminal domain. FAM98A also has homologs in vertebrates and invertebrates and has distant homologs in choanoflagellates and green algae.
Uncharacterized protein C14orf80 is a protein which in humans is encoded by the chromosome 14 open reading frame 80, C14orf80, gene.
C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.
Proline-rich protein 30 is a protein in humans that is encoded for by the PRR30 gene. PRR30 is a member in the family of Proline-rich proteins characterized by their intrinsic lack of structure. Copy number variations in the PRR30 gene have been associated with an increased risk for neurofibromatosis.
Transmembrane protein 171 (TMEM171) is a protein that in humans is encoded by the TMEM171 gene.
Chromosome 1 open reading frame 198 (C1orf198) is a protein that in humans is encoded by the C1orf198 gene. This particular gene does not have any paralogs in Homo sapiens, but many orthologs have been found throughout the Eukarya domain. C1orf198 has high levels of expression in all tissues throughout the human body, but is most highly expressed in lung, brain, and spinal cord tissues. Its function is most likely involved in lung development and hypoxia-associated events in the mitochondria, which are major consumers of oxygen in cells and are severely affected by decreases in available cellular oxygen.
C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein is 1,984 amino acids long. The gene contains 1 exon and is located at 2p23.3. Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence.
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
MIF4GD, or MIF4G domain-containing protein, is a protein which in humans is encoded by the MIF4GD gene. It is also known as SLIP1, SLBP -interacting protein 1, AD023, and MIFD. MIF4GD is expressed ubiquitously in humans, and has been found to be involved in activating proteins for histone mRNA translation, alternative splicing and translation of mRNAs, and is a factor in the regulation of cell proliferation.
Serum amyloid A-like 1 is a protein in humans encoded by the SAAL1 gene.
OCEL1, also called Occludin//ELL Domain Containing 1, is a protein encoding gene located at chromosome 19p13.11 in the human genome. Other aliases for the gene include FLJ22709, FWP009, and S863-9. The function of OCEL1 has not yet been identified.
Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.
Zinc Finger Protein 821, also known as ZNF821, is a protein encoded by the ZNF821 gene. This gene is located on the 16th chromosome and is expressed highly in the testes, moderately expressed in the brain and low expression in 23 other tissues. The protein encoded is 412 amino acids long with 2 Zinc Finger motifs and a 23 amino acid long STPR domain.
Zinc Finger Protein 548 (ZNF548) is a human protein encoded by the ZNF548 gene which is located on chromosome 19. It is found in the nucleus and is hypothesized to play a role in the regulation of transcription by RNA Polymerase II. It belongs to the Krüppel C2H2-type zinc-finger protein family as it contains many zinc-finger repeats.
KIAA2013, also known as Q8IYS2 or MGC33867, is a single-pass transmembrane protein encoded by the KIAA2013 gene in humans. The complete function of KIAA2013 has not yet been fully elucidated.
C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.