SHLD1

Last updated
SHLD1
Identifiers
Aliases SHLD1 , chromosome 20 open reading frame 196, shieldin complex subunit 1, RINN3, C20orf196
External IDs OMIM: 618028 MGI: 1920997 HomoloGene: 51865 GeneCards: SHLD1
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001303477
NM_001303478
NM_001303479
NM_152504

NM_028637
NM_001358260
NM_001358261

RefSeq (protein)

NP_001290406
NP_001290407
NP_001290408
NP_689717

NP_082913
NP_001345189
NP_001345190

Location (UCSC) Chr 20: 5.75 – 5.86 Mb Chr 2: 132.53 – 132.59 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

SHLD1 or shieldin complex subunit 1 is a gene on chromosome 20. [5] The C20orf196 gene encodes an mRNA that is 1,763 base pairs long, and a protein that is 205 amino acids long. [5]

Contents

Function

C20orf196 is involved in the DNA repair network. Gupta et al. identified C20orf196 as part of a vertebrate-specific protein complex called shieldin. [6] Shieldin is recruited to double stranded breaks (DSB) to promote nonhomologous end joining-dependent repair (NHEJ), immunoglobulin class-switch recombination (CSR), and fusion of unprotected telomeres. [6] Analysis indicates a sub-stoichiometric interaction or weaker interaction affinity of SHLD1 to the shieldin complex. [6]

Gene

Locus

C20orf196 is located on the short arm of chromosome 20 at 20p12.3, from base pairs 5,750,286 to 5,864,407 on the direct strand. [5] It contains 11 exons. [7]

Aliases

Its aliases are RINN3 [6] and SHLD1.

Expression

mRNA

Alternative Splicing

C20orf196 produces 9 different mRNAs, with 7 alternatively spliced variants and 2 unspliced forms. [7] There are 3 probable alternative promoters, 3 non-overlapping alternative last exons, and 2 alternative polyadenylation sites. [7] The mRNAs differ by the truncation of the 5' end, truncation of the 3' end, presence or absence of 2 cassette exons, and overlapping exons with different boundaries. [7]

Isoforms

C20orf196 has six splice isoforms. [7]

Promoter

The promoter region is within bases 5749286 to 5750555, totaling 1270 base pairs. [5] The transcription start site is located within bases 5750382 and 5750409, totaling 28 base pairs. [5]

Expression

NCBI GEO Human Tissue Expression Profile for C20orf196. NCBI GEO Human Tissue Expression Profile for C20orf196.png
NCBI GEO Human Tissue Expression Profile for C20orf196.

RNA-Seq analysis has shown ubiquitous expression of c20orf196 in 26 human tissues: adrenal, appendix, bone marrow, brain, colon, duodenum, endometrium, esophagus, fat, gall bladder, heart, kidney, liver, lung, lymph node, ovary, pancreas, placenta, prostate, salivary gland, skin, small intestine, spleen, stomach, testis, thyroid, and urinary bladder. [5] The highest C20orf196 mRNA levels were found in the lymph node, tonsil, thyroid, adrenal gland, prostate, pharynx, parathyroid, connective tissue, and bone marrow. [8]

C20orf196 was found to be expressed in soft tissue/muscle tissue tumors, lymphoma tumors, and pancreatic tumors. [9] C20orf196 representation was biased toward the fetal developmental stage. [9] EBI expression data showed high expression of C20orf196 in the diencephalon and cerebral cortex in the developing brain. [9]

Protein

General Features

The most common transcript encodes a protein that is 205 amino acids long with a molecular mass of 23 kDa. [10] It has a predicted isoelectric point of 4.72. [11] It is predicted to have a half-life around 30 hours. [12] C20orf196 contains 19 positive residues (9.3%), 32 negative residues (15.6%), and 46 hydrophobic residues (22.4%). [13]

Cellular Localization

C20orf196 is predicted to localize in the nucleus. [7]

Domains

C20orf196 contains one domain, DUF4521, which arose in Amniote. [5] DUF4521 spans from amino acid 3 to 201. [5] Several regions of this domain are conserved in c20orf196 orthologs found in mammals, amphibians, and fish. The proteins of this family are functionally uncharacterized.

Post-Translational Modifications

There are many phosphorylation sites targeted by unspecified serine kinases. [14] C20orf196 is predicted to have one SUMOylation site at amino acid 203 and one N-glycosylation site at amino acid 69. [15] [16] C20orf196 is predicted to have two ubiquitination sites at amino acids 84 and 139. [17]

Secondary Structure

Several modeling programs predicted a secondary structure containing alpha helix, beta sheet, and coil regions. [18] [19] CFSSP has predicted that C20orf196 secondary structure is 57.1% alpha helices, 48.8% beta strands, and 16.6% beta turns. [20]

Protein Interactions

Several databases citing yeast two-hybrid screenings have found C20orf196 to interact with PRMT1, QARS, MAD2L2, and CUL3. [21] [22] [23] [24] C20orf196 functionally interacts with REV7, SHLD2, and SHLD3 in the shieldin complex within the DNA repair network. [6]

Homology and Evolution

Orthologs

C20orf196 gene orthologs are found in species including mammals, birds, reptiles, and amphibians. [6] [25] C20orf196 has distant orthologs in bony fish and cartilaginous fish. [6] [25] There are no invertebrate orthologs. [6] Orthologs are found in 163 organisms. [5]

Table of Orthologs for C20orf196
ClassSpeciesCommon NameDate of Divergence (MYA)Accession NumberSequence Identity (%)Sequence Similarity (%)
Mammalia (Marsupialia)Sarcophilus harrisiiTasmanian devil159XP_012395605.15568
Phascolarctos cinereusKoala159XP_020841153.15467
AvesGallus gallusRed junglefowl312XP_015139412.13349
Aptenodytes forsteriEmperor penguin312XP_009280865.13547
ReptiliaCrocodylus porosusSaltwater crocodile312XP_019404613.13650
Pogona vitticepsCentral bearded dragon312XP_020649300.13046
Thamnophis sirtalisCommon garter snake312XP_013911941.13351
AmphibiaNanorana parkeriHigh Himalaya frog352XP_018422019.13957
OsteichthyesMonopterus albusAsian swamp eel435XP_020455013.14673
ChondrichthyesRhincodon typusWhale shark473XP_020391945.13055

Paralogs

There are no paralogs in humans. [5]

Figure illustrating the evolution rate for C20orf196 in twenty orthologs as compared to the fast-evolving protein, fibrinogen, and slow-evolving protein, cytochrome C. Rate of Evolution for C20orf196.png
Figure illustrating the evolution rate for C20orf196 in twenty orthologs as compared to the fast-evolving protein, fibrinogen, and slow-evolving protein, cytochrome C.

Rate of evolution

C20orf196 has a high protein sequence divergence rate. It is a fast evolving protein. It evolves faster than fibrinogen, as seen in the figure to the right.

Phenotype

Genome-wide association studies have identified SNPs found in the C20orf196 gene that are associated with parental longevity, information processing speed, and breast carcinoma occurrence. [26]

Related Research Articles

<span class="mw-page-title-main">YIF1A</span> Protein-coding gene in the species Homo sapiens

Protein YIF1A is a Yip1 domain family proteins that in humans is encoded by the YIF1A gene.

<span class="mw-page-title-main">C20orf27</span> Protein-coding gene in the species Homo sapiens

UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.

<span class="mw-page-title-main">ARMH3</span> Protein-coding gene in the species Homo sapiens

ARMH3 or Armadillo Like Helical Domain Containing 3, also known as UPF0668 and c10orf76, is a protein that in humans is encoded by the ARMH3 gene. Its function is not currently known, but experimental evidence has suggested that it may be involved in transcriptional regulation. The protein contains a conserved proline-rich motif, suggesting that it may participate in protein-protein interactions via an SH3-binding domain, although no such interactions have been experimentally verified. The well-conserved gene appears to have emerged in Fungi approximately 1.2 billion years ago. The locus is alternatively spliced and predicted to yield five protein variants, three of which contain a protein domain of unknown function, DUF1741.

<span class="mw-page-title-main">DEPDC1B</span> Protein-coding gene in the species Homo sapiens

DEP Domain Containing Protein 1B also known as XTP1, XTP8, HBV XAg-Transactivated Protein 8, [formerly referred to as BRCC3] is a human protein encoded by a gene of similar name located on chromosome 5.

<span class="mw-page-title-main">FAM98A</span> Protein-coding gene in the species Homo sapiens

Family with sequence similarity 98, member A, or FAM98A, is a gene that in the human genome encodes the FAM98A protein. FAM98A has two paralogs in humans, FAM98B and FAM98C. All three are characterized by DUF2465, a conserved domain shown to bind to RNA. FAM98A is also characterized by a glycine-rich C-terminal domain. FAM98A also has homologs in vertebrates and invertebrates and has distant homologs in choanoflagellates and green algae.

<span class="mw-page-title-main">C14orf80</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C14orf80 is a protein which in humans is encoded by the chromosome 14 open reading frame 80, C14orf80, gene.

C6orf222 is a protein that in humans is encoded by the C6orf222 gene (6p21.31). C6orf222 is conserved in mammals, birds and reptiles with the most distant ortholog being the green sea turtle, Chelonia mydas. The C6orf222 protein contains one mammalian conserved domain: DUF3293. The protein is also predicted to contain a BH3 domain, which has predicted conservation in distant orthologs from the clade Aves.

<span class="mw-page-title-main">Proline-rich protein 30</span>

Proline-rich protein 30 is a protein in humans that is encoded for by the PRR30 gene. PRR30 is a member in the family of Proline-rich proteins characterized by their intrinsic lack of structure. Copy number variations in the PRR30 gene have been associated with an increased risk for neurofibromatosis.

<span class="mw-page-title-main">TMEM171</span> Protein-coding gene in the species Homo sapiens

Transmembrane protein 171 (TMEM171) is a protein that in humans is encoded by the TMEM171 gene.

<span class="mw-page-title-main">C1orf198</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 open reading frame 198 (C1orf198) is a protein that in humans is encoded by the C1orf198 gene. This particular gene does not have any paralogs in Homo sapiens, but many orthologs have been found throughout the Eukarya domain. C1orf198 has high levels of expression in all tissues throughout the human body, but is most highly expressed in lung, brain, and spinal cord tissues. Its function is most likely involved in lung development and hypoxia-associated events in the mitochondria, which are major consumers of oxygen in cells and are severely affected by decreases in available cellular oxygen.

<span class="mw-page-title-main">C2orf16</span> Protein-coding gene in the species Homo sapiens

C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein is 1,984 amino acids long. The gene contains 1 exon and is located at 2p23.3. Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence.

<span class="mw-page-title-main">C7orf50</span> Mammalian protein found in Homo sapiens

C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.

<span class="mw-page-title-main">MIF4GD</span> Protein-coding gene in the species Homo sapiens

MIF4GD, or MIF4G domain-containing protein, is a protein which in humans is encoded by the MIF4GD gene. It is also known as SLIP1, SLBP -interacting protein 1, AD023, and MIFD. MIF4GD is expressed ubiquitously in humans, and has been found to be involved in activating proteins for histone mRNA translation, alternative splicing and translation of mRNAs, and is a factor in the regulation of cell proliferation.

<span class="mw-page-title-main">SAAL1</span> Protein-coding gene in the species Homo sapiens

Serum amyloid A-like 1 is a protein in humans encoded by the SAAL1 gene.

<span class="mw-page-title-main">OCEL1</span> Protein-coding gene in the species Homo sapiens

OCEL1, also called Occludin//ELL Domain Containing 1, is a protein encoding gene located at chromosome 19p13.11 in the human genome. Other aliases for the gene include FLJ22709, FWP009, and S863-9. The function of OCEL1 has not yet been identified.

<span class="mw-page-title-main">FAM98C</span> Gene

Family with sequence 98, member C or FAM98C is a gene that encodes for FAM98C has two aliases FLJ44669 and hypothetical protein LOC147965. FAM98C has two paralogs in humans FAM98A and FAM98B. FAM98C can be characterized for being a Leucine-rich protein. The function of FAM98C is still not defined. FAM98C has orthologs in mammals, reptiles, and amphibians and has a distant orhtologs in Rhinatrema bivittatum and Nanorana parkeri.

<span class="mw-page-title-main">ZNF821</span> Zinc Finger 821

Zinc Finger Protein 821, also known as ZNF821, is a protein encoded by the ZNF821 gene. This gene is located on the 16th chromosome and is expressed highly in the testes, moderately expressed in the brain and low expression in 23 other tissues. The protein encoded is 412 amino acids long with 2 Zinc Finger motifs and a 23 amino acid long STPR domain.

<span class="mw-page-title-main">ZNF548</span> Protein-coding gene in the species Homo sapiens

Zinc Finger Protein 548 (ZNF548) is a human protein encoded by the ZNF548 gene which is located on chromosome 19. It is found in the nucleus and is hypothesized to play a role in the regulation of transcription by RNA Polymerase II. It belongs to the Krüppel C2H2-type zinc-finger protein family as it contains many zinc-finger repeats.

<span class="mw-page-title-main">KIAA2013</span> Protein-coding gene in the species Homo sapiens

KIAA2013, also known as Q8IYS2 or MGC33867, is a single-pass transmembrane protein encoded by the KIAA2013 gene in humans. The complete function of KIAA2013 has not yet been fully elucidated.

<span class="mw-page-title-main">C13orf42</span> C13orf42 gene page

C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000171984 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000044991 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 3 4 5 6 7 8 9 10 "C20orf196 chromosome 20 open reading frame 196 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2018-02-05.
  6. 1 2 3 4 5 6 7 8 Gupta R, Somyajit K, Narita T, Maskey E, Stanlie A, Kremer M, Typas D, Lammers M, Mailand N, Nussenzweig A, Lukas J, Choudhary C (May 2018). "DNA Repair Network Analysis Reveals Shieldin as a Key Regulator of NHEJ and PARP Inhibitor Sensitivity". Cell. 173 (4): 972–988.e23. doi: 10.1016/j.cell.2018.03.050 . PMC   8108093 . PMID   29656893. S2CID   4886733.
  7. 1 2 3 4 5 6 Thierry-Mieg, Danielle; Thierry-Mieg, Jean. "AceView: Gene:C20orf196, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView". www.ncbi.nlm.nih.gov. Retrieved 2018-02-05.
  8. Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson Å, Kampf C, Sjöstedt E, Asplund A, Olsson I, Edlund K, Lundberg E, Navani S, Szigyarto CA, Odeberg J, Djureinovic D, Takanen JO, Hober S, Alm T, Edqvist PH, Berling H, Tegel H, Mulder J, Rockberg J, Nilsson P, Schwenk JM, Hamsten M, von Feilitzen K, Forsberg M, Persson L, Johansson F, Zwahlen M, von Heijne G, Nielsen J, Pontén F (January 2015). "Proteomics. Tissue-based map of the human proteome". Science. 347 (6220): 1260419. doi:10.1126/science.1260419. PMID   25613900. S2CID   802377.
  9. 1 2 3 "The European Bioinformatics Institute < EMBL-EBI". 2018.
  10. Database, GeneCards Human Gene. "C20orf196 Gene - GeneCards | CT196 Protein | CT196 Antibody". www.genecards.org. Retrieved 2018-02-20.
  11. "Compute pI/Mw". ExPASy. 2018.
  12. Bachmair A, Finley D, Varshavsky A (October 1986). "In vivo half-life of a protein is a function of its amino-terminal residue". Science. 234 (4773): 179–86. doi:10.1126/science.3018930. PMID   3018930.
  13. "Statistical Analysis of Protein Sequences". EMBL-EBI. 2018.
  14. Blom N, Gammeltoft S, Brunak S (December 1999). "Sequence and structure-based prediction of eukaryotic protein phosphorylation sites". Journal of Molecular Biology. 294 (5): 1351–62. doi:10.1006/jmbi.1999.3310. PMID   10600390.
  15. Zhao Q, Xie Y, Zheng Y, Jiang S, Liu W, Mu W, Liu Z, Zhao Y, Xue Y, Ren J (July 2014). "GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs". Nucleic Acids Research. 42 (Web Server issue): W325-30. doi:10.1093/nar/gku383. PMC   4086084 . PMID   24880689.
  16. Gupta R, Jung E, Brunak S. "Prediction of N-glycosylation sites in human proteins". DTU Bioinformatics. 46: 203–206.
  17. Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY (January 2016). "UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines". BMC Systems Biology. 10 Suppl 1 (1): 6. doi:10.1186/s12918-015-0246-z. PMC   4895383 . PMID   26818456.
  18. Zhang Y (January 2008). "I-TASSER server for protein 3D structure prediction". BMC Bioinformatics. 9: 40. doi:10.1186/1471-2105-9-40. PMC   2245901 . PMID   18215316.
  19. Raghava, G. P. S. (2000). "APSSP: Advanced Protein Secondary Structure Prediction Server".
  20. T, Ashok Kumar (2013-04-01). "CFSSP: Chou and Fasman Secondary Structure Prediction server". Zenodo. 1 (9): 15–19. doi:10.5281/zenodo.50733.
  21. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, Kuhn M, Bork P, Jensen LJ, von Mering C (January 2015). "STRING v10: protein-protein interaction networks, integrated over the tree of life". Nucleic Acids Research. 43 (Database issue): D447-52. doi:10.1093/nar/gku1003. PMC   4383874 . PMID   25352553.
  22. Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M, Galeota E, Sacco F, Palma A, Nardozza AP, Santonico E, Castagnoli L, Cesareni G (January 2012). "MINT, the molecular interaction database: 2012 update". Nucleic Acids Research. 40 (Database issue): D857-61. doi:10.1093/nar/gkr930. PMC   3244991 . PMID   22096227.
  23. Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R (January 2004). "IntAct: an open source molecular interaction database". Nucleic Acids Research. 32 (Database issue): D452-5. doi:10.1093/nar/gkh052. PMC   308786 . PMID   14681455.
  24. Calderone A, Castagnoli L, Cesareni G (August 2013). "mentha: a resource for browsing integrated protein-interaction networks". Nature Methods. 10 (8): 690–1. doi:10.1038/nmeth.2561. PMID   23900247. S2CID   9733108.
  25. 1 2 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (October 1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–10. doi:10.1016/s0022-2836(05)80360-2. PMID   2231712. S2CID   14441902.
  26. "GWAS Catalog". 2018.