C7orf50

Last updated

C7orf50
Identifiers
Aliases C7orf50 , YCR016W, chromosome 7 open reading frame 50
External IDs MGI: 1920462; HomoloGene: 49901; GeneCards: C7orf50; OMA:C7orf50 - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_028469

RefSeq (protein)

NP_082745

Location (UCSC) Chr 7: 1 – 1.14 Mb Chr 5: 139.35 – 139.45 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

C7orf50 (Chromosome 7, Open Reading Frame 50) is a gene in humans ( Homo sapiens ) that encodes a protein known as C7orf50 (uncharacterized protein C7orf50). This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. [5] [6] C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. [7] This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. [8] [9] Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons . [10] [11]

Contents

Gene

Background

C7orf50, also known as YCR016W, MGC11257, and LOC84310, is a protein coding gene of poor characterization in need of further research. This gene can be accessed on NCBI at the accession number NC_000007.14, on HGNC at the ID number 22421, on ENSEMBL at the ID ENSG00000146540, on GeneCards at GCID:GC07M000996, and on UniProtKB at the ID Q9BRJ6.

Location

C7orf50 is located on the short arm of chromosome 7 (7p22.3), starting at base pair (bp) 977,964 and ending at bp 1,138,325. This gene spans 160,361 bps on the minus (-) strand and contains a total of 13 exons. [5]

Gene Neighborhood

Genes within the neighborhood of C7orf50 are the following: LOC105375120, GPR146, LOC114004405, LOC107986755, ZFAND2A, LOC102723758, LOC106799841, COX19, ADAP1, CYP2W1, MIR339, GPER1, and LOC101927021. This neighborhood extends from bp 89700 to bp 1165958 on chromosome 7. [5]

mRNA

Alternative Splicing

C7orf50 has a total of 7 experimentally curated mRNA transcripts. [5] These transcripts are maintained independently of annotated genomes and were not generated computationally from a specific genome build such as the GRCh38.p13 primary assembly; therefore, they are typically more reliable. The longest and most complete of these transcripts (transcript 4) being 2138bp, producing a 194 amino acid-long (aa) protein, and consisting of 5 exons. [12] Of these transcripts, four of them encode for the same 194aa protein (isoform a), [13] only differing in their 5' and 3' untranslated regions (UTRs). The three other transcripts encode isoform b, c, and d, respectively. The table below is representative of these transcripts.

C7orf50 Experimentally Determined

NCBI Reference Sequences (RefSeq) mRNA Transcripts

NameNCBI Accession #Transcript Length# of ExonsProtein LengthIsoform
Transcript Variant 1NM_032350.51311bp5194aaa
Transcript Variant 2NM_001134395.11301bp5194aaa
Transcript Variant 3NM_001134396.11282bp5194aaa
Transcript Variant 4NM_001318252.22138bp5194aaa
Transcript Variant 7NM_001350968.11081bp6193aab
Transcript Variant 8NM_001350969.11500bp5180aac
Transcript Variant 9NM_001350970.11448bp360aad

Alternatively, when the primary genomic assembly, GRCh38.p13, is used for annotation (NCBI: NC_000007.14), there are 10 computationally predicted mRNA transcripts. [5] The most complete and supported of these transcripts (transcript variant X6) is 1896bp, producing a 225aa-long protein. [14] In total, there are 6 different isoforms predicted for C7orf50. Of these transcripts, 5 of them encode for the same isoform (X3). [15] The remaining transcripts encode isoforms X2, X4, X5, X6, and X7 as represented below.

C7orf50 Computationally Determined

NCBI Reference Sequences (RefSeq) mRNA Transcripts

NameNCBI Accession #Transcript LengthProtein LengthIsoform
Transcript Variant X2XM_017012719.11447bp375aaX2
Transcript Variant X3XM_011515582.31192bp225aaX3
Transcript Variant X4XM_024446977.11057bp193aaX4
Transcript Variant X5XM_011515581.31240bp225aaX3
Transcript Variant X6XM_011515584.21896bp225aaX3
Transcript Variant X7XM_017012720.21199bp225aaX3
Transcript Variant X8XM_011515583.21215bp225aaX3
Transcript Variant X9XM_017012721.22121bp211aaX5
Transcript Variant X10XM_024446978.12207bp180aaX6
Transcript Variant X11XM_024446979.1933bp93aaX7

5' and 3' UTR

Based on the experimentally determined C7orf50 mRNA transcript variant 4, the 5' UTR of C7orf50 is 934 nucleotides (nt) long, while the 3' UTR is 619nt. The coding sequence (CDS) of this transcript spans nt 935..1519 for a total length of 584nt and is encoded in reading frame 2. [12] Interestingly, the 5'UTR of C7orf50 contains a uORF in need of further study, ranging from nt 599 to nt 871 also in the second reading frame. [16]

Protein

General Properties

The C7orf50 Isoform a's 194aa protein sequence from NCBI [13] is as follows:

>NP_001127867.1 uncharacterized protein C7orf50 isoform a [Homo sapiens] MAKQKRKVPEVTEKKNKKLKKASAEGPLLGPEAAPSGEGAGSKGEAVLRPGLDAEPELSPEEQRVLERKL 70 KKERKKEERQRLREAGLVAQHPPARRSGAELALDYLCRWAQKHKNWRFQKTRQTWLLLHMYDSDKVPDEH 140 FSTLLAYLEGLQGRARELTVQKAEALMRELDEEGSDPPLPGRAQRIRQVLQLLS                 194

The underlined region within the sequence is indicative of a domain known as DUF2373 ("domain of unknown function 2373"), found in isoforms a, b, and c.

C7orf50 has a predicted molecular weight (Mw) of 22 kDa, making C7orf50 smaller than the average protein (52 kDa). [17] The isoelectric point (theoretical pI) for this isoform is 9.7, meaning that C7orf50 is slightly basic. [18] [19] As for charge runs and patterns within isoform a, there is a significant mixed charge (*) run (-++0++-+++--+) from aa67 to aa79 and an acidic (-) run from aa171 – aa173. It is likely that this mixed charge run encodes the protein-protein interaction (PPI) site of C7orf50. [20] [21]

Domains and Motifs

DUF2373 is a domain of unknown function found in the C7orf50 protein. This is a highly conserved c-terminal region found from fungi to humans. [22] As for motifs, a bipartite nuclear localization signal (NLS) was predicted from aa6 to aa21, meaning that C7orf50 is likely localized in the nucleus. [23] Interestingly, a nuclear export signal (NES) is also found within the C7orf50 protein at the following amino acids: 150, and 153 - 155, suggesting that C7orf50 has function both inside and outside the nucleus. [24] [25]

Schematic Model of C7orf50 Protein. Green region is indicative of nuclear localization signal (NLS), blue of the mixed charge run, and orange of the DUF2373. Marked sites are indicative of post-translational modifications. Image made with Prosite MyDomains tool. C7orf50 Schematic.png
Schematic Model of C7orf50 Protein. Green region is indicative of nuclear localization signal (NLS), blue of the mixed charge run, and orange of the DUF2373. Marked sites are indicative of post-translational modifications. Image made with Prosite MyDomains tool.

Structure

Secondary Structure

The majority of C7orf50 (isoform a) secondary structure is made up of alpha helices, with the remainder being small portions of random coils, beta turns, or extended strands. [26] [27]

Tertiary Structure

The tertiary structure of C7orf50 consists primarily of alpha helices as determined I-TASSER. [9] [28] [29]

Quaternary Structure

The interaction network (quaternary structure) involving the C7orf50 protein has significantly more (p < 1.0e-16) interactions than a randomly selected set of proteins. This indicates that these proteins are partially connected biologically as a group; therefore, they intrinsically depend on each other within their biological pathway. [30] This means that although the function of C7orf50 is uncharacterized, it is most likely to be associated with the same processes and functions as the proteins within its network.

Functional Enrichments within the C7orf50 Network
Biological ProcessesrRNA processingmaturation of 5.8S, LSU, and SSU rRNA
Molecular Functionscatalytic activity, acting on RNAATP-dependent RNA helicase activity
Cellular Componentsnucleoluspreribosomes
Reactome Pathwaysmajor pathway of rRNA processing in the nucleolus and cytosolrRNA modification in the nucleus and cytosol
Protein Domains and Motifshelicase conserved C-terminal domainDEAD/DEAH box helicase

The closest predicted functional partners of C7orf50 are the following proteins: DDX24, DDX52, PES1, EBNA1BP2, RSLD1, NOP14, FTSJ3, KRR1, LYAR, and PWP1. These proteins are predicted to co-express rather than bind directly C7orf50 and each other.

STRING quaternary analysis of C7orf50. Shows protein-protein interactions (direct and indirect) associated with C7orf50. Network nodes (circles) represent proteins. Edges (lines) represent protein-protein associations. C7orf50 QuarternaryStructure.png
STRING quaternary analysis of C7orf50. Shows protein-protein interactions (direct and indirect) associated with C7orf50. Network nodes (circles) represent proteins. Edges (lines) represent protein-protein associations.

Regulation

Gene Regulation

Promoter

C7orf50 has 6 predicted promoter regions. The promoter with the greatest number of transcripts and CAGE tags overall is promoter set 6 (GXP_6755694) on ElDorado by Genomatix. This promoter region is on the minus (-) strand and has a start position of 1,137,965 and an end position of 1,139,325, making this promoter 1,361bp long. It has 16 coding transcripts and the transcript with the greatest identity to C7orf50 transcript 4 is transcript GXT_27788039 with 98746 CAGE tags. [31]

Promoter IDStart PositionEnd PositionLength# of Coding TranscriptsGreatest # of CAGE Tags in Transcripts
GXP_9000582101306310131631101bp0N/A
GXP_6755691102823910300701832bp4169233
GXP_6053282105520610563061101bp1449
GXP_3207505112728811283881101bp1545
GXP_9000584113054111316411101bp0N/A
GXP_6755694113796511393251361bp16100,070

The CpG island associated with this promoter has 75 CpGs (22% of island), and is 676bp long. The C count plus G count is 471, the percentage C or G is 70% within this island, and the ratio of observed to expected CpG is 0.91. [32] [33]

C7orf50 with ElDorado suggested promoters with exons labeled. Gene is on the minus (-) strand, thus promoter (GXP_6755694) transcripts runs 5' to 3' on the bottom strand (R to L). C7orf50 PromoterSchematic.png
C7orf50 with ElDorado suggested promoters with exons labeled. Gene is on the minus (-) strand, thus promoter (GXP_6755694) transcripts runs 5’ to 3’ on the bottom strand (R to L).

Transcription Factor Binding Sites

As determined by MatInspector at Genomatix, the following transcription factor (TFs) families are most highly predicted to bind to C7orf50 in the promoter region. [31]

Transcription FactorDetailed Family Information
NR2F Nuclear receptor subfamily 2 factors
PEROPeroxisome proliferator-activated receptor
HOMFHomeodomain transcription factors
PRDMPR (PRDI-BF1-RIZ1 homologous) domain transcription factor
VTBPVertebrate TATA binding protein factor
HZIPHomeodomain-leucine zipper transcription factors
ZTREZinc transcriptional regulatory element
XBBFX-box binding factors
SP1F GC-Box factors SP1/GC
CAATCCAAT binding factors
ZF57KRAB domain zinc finger protein 57
CTCF CTCF and BORIS gene family, transcriptional regulators with highly conserved zinc finger domains
MYOD Myoblast determining factors
KLFSKrueppel like transcription factors

Expression Pattern

C7orf50 shows ubiquitous expression in the kidneys, brain, fat, prostate, spleen and 22 other tissues and low tissue and immune cell specificity . [5] [6] This expression is very high, 4 times above the average gene; therefore, there is a higher abundance of C7orf50 mRNA than the average gene within a cell. [34] There does not appear to be a definitive cell type in which this gene is not expressed. [35]

Transcription Regulation

Splice Enhancers

The mRNA of C7orf50 is predicted to have exonic splicing enhancers, in which SR proteins can bind, at bp positions 45 (SRSF1 (IgM-BRCA1)), 246 (SRSF6), 703 (SRSF5), 1301 (SRSF1), and 1308 (SRSF2) [36] [37]

Stem Loop Prediction

Both the 5' and 3' UTRs of the mRNA of C7orf50 are predicted to fold into structures such as bulge loops, internal loops, multibranch loops, hairpin loops, and double helices. The 5'UTR has a predicted free energy of -416 kcal/mol with an ensemble diversity of 238. The 3' UTR has a predicted free energy of -279 kcal/mol with an ensemble diversity of 121. [38]

miRNA Targeting

There are many poorly conserved miRNA binding sites predicted within the 3’UTR of C7orf50 mRNA. The notable miRNA families that are predicted to bind to C7orf50 mRNA and regulate/repress transcription are the following: miR-138-5p, miR-18-5p, miR-129-3p, miR-124-3p.1, miR-10-5p, and miR-338-3p. [39] [40] [41]

Protein Regulation

Subcellular Localization

The C7orf50 protein is predicted to localize intercellularly in both the nucleus and cytoplasm, but primarily within the nucleoplasm and nucleoli. [42] [43] [23] [44]

Post-Translational Modification

The C7orf50 protein is predicted to be mucin-type GalNAc o-glycosylated at the following amino acid sites: 12, 23, 36, 42, 59, and 97. [45] [46] Additionally, this protein is predicted to be SUMOylated at aa71 with the SUMO protein binding from aa189 through aa193. [47] [48] [49] C7orf50 is also predicted to be kinase-specific phosphorylated at the following amino acids: 12, 23, 36, 42, 59, 97, 124, 133, 159, and 175. [50] [51] [52] [53] [54] Interestingly, many of these sites overlap with the o-glycosylation sites. Of these phosphorylation sites, the majority are serines (53%) with the remainder being either tyrosines or threonines. The most associated kinases with these sites are the following kinase groups: AGC, CAMK, TKL, and STE. Finally, this protein is predicted to have 8 glycations of the ε amino groups of lysines at the following sites: aa3, 5, 14, 15, 17, 21, 76, and 120. [55] [56]

Homology

Paralogs

No paralogs of C7orf50 have been detected in the human genome; however, there is slight evidence (58% similarity) of a paralogous DUF2373 domain in the protein of KIDINS220. [57]

Orthologs

Below is a table of a variety of orthologs of the human C7orf50 gene. [58] [7] The table includes closely, moderately, and distantly related orthologs. C7orf50 is highly evolutionary conserved from mammals to fungi. When these ortholog sequences are compared, the most conserved portions are those of DUF2373, highlighting this domain's importance in the functioning of C7orf50. C7orf50 has evolved moderately and evenly over time with a divergence rate greater than Hemoglobin but less than Cytochrome C.

Selected Orthologs of C7orf50
Genus and SpeciesCommon NameTaxon ClassDate of Divergence (MYA)Accession #Length (AA)% identity w/ human
Homo sapiens Human Mammalia N/ANM_001318252.2194aa100%
Tupaia chinensisChinese Tree ShrewMammalia82XP_006167949.1194aa76%
Dasypus novemcinctus Nine-banded ArmadilloMammalia105XP_004483895.1198aa70%
Miniopterus natalens Natal Long-fingered BatMammalia96XP_016068464.1199aa69%
Protobothrops mucrosquamatus Brown-spotted Pit Viper Reptilia 312XP_015673296.1196aa64%
Balearica regulorum gibbericeps Grey-crowned Crane Aves 312XP_010302837.1194aa61%
Falco peregrinus Peregrine FalconAves312XP_027635198.1193aa59%
Xenopus laevis African Clawed FrogAmphibia352XP_018094637.1198aa50%
Electrophorus electricus Electric EelActinopterygii435XP_026880604.1195aa53%
Rhincodon typus Whale Shark Chondrichthyes 465XP_020372968.1195aa52%
Ciona intestinalis Sea Vase Ascidiacea 676XP_026696561.1282aa37%
Octopus bimaculoides California Two-spot Octopus Cephalopoda 797XP_014772175.1221aa40%
Priapulus caudatus Priapulus Priapulida 797XP_014663190.1333aa39%
Bombus terrestris Buff-tailed BumblebeeInsecta797XP_012171653.1260aa32%
Actinia tenebrosa Australian Red Waratah Sea Anemone Anthozoa 824XP_031575029.1330aa43%
Trichoplax adhaerens Trichoplax Trichoplacidae 948XP_002110193.1137aa44%
Spizellomyces punctatusBranching Chytrid Fungi Fungi 1105XP_016610491.1412aa29%
Eremothecium cymbalariaeFungiFungi1105XP_003644395.1266aa25%
Quercus suber Cork Oak Tree Plantae 1496XP_023896156.1508aa30%
Plasmopara halstedii Downy Mildew of Sunflower Oomycetes 1768XP_024580369.1179aa26%
Rate of C7orf50 divergence compared to divergence rates of Hemoglobin and Cytochrome C. C7orf50 Divergence.png
Rate of C7orf50 divergence compared to divergence rates of Hemoglobin and Cytochrome C.

Function

The consensus prediction of C7orf50 function (GO terms), as determined by I-TASSER, [59] [28] [29] predicts the molecular function to be protein binding, the biological process to be protein import (specifically into the nucleus), and the associated cellular component to be a pore complex (specifically of the nuclear envelope). It can be predicted that the function of C7orf50 is one in which C7orf50 imports ribosomal proteins into the nucleus in order to be made into ribosomes, but further research is needed to solidify this function.

Interacting Proteins

Proteins Predicted to Interact with C7orf50 [60] [61]
Name of ProteinName of GeneFunctionUniProt Accession #
THAP1 domain-containing protein 1 THAP1 DNA-binding transcription regulator that regulates endothelial cell proliferation and G1/S cell-cycle progression. [62] Q9NVV9
Protein Tax-2 tax Transcriptional activator that activates both the viral long terminal repeat (LTR) and cellular promoters via activation of CREB, NF-kappa-B, SRF and AP-1 pathways. [63] P03410
Major Prion Protein PRNP Its primary physiological function is unclear. May play a role in neuronal development and synaptic plasticity. May be required for neuronal myelin sheath maintenance. May promote myelin homeostasis through acting as an agonist for ADGRG6 receptor. May play a role in iron uptake and iron homeostasis. [64] P04156
Aldehyde dehydrogenase X, mitochondrial ALDH1B1 Pay a major role in the detoxification of alcohol-derived acetaldehyde. They are involved in the metabolism of corticosteroids, biogenic amines, neurotransmitters, and lipid peroxidation. [65] P30837
Cell growth-regulating nucleolar protein LYAR Plays a role in the maintenance of the appropriate processing of 47S/45S pre-rRNA to 32S/30S pre-rRNAs and their subsequent processing to produce 18S and 28S rRNAs. [66] [67] Q9NX58
Coiled-coil domain-containing protein 85B CCDC85B Functions as a transcriptional repressor. [68] [69] Q15834
Nucleolar protein 56 NOP56 Involved in the early to middle stages of 60S ribosomal subunit biogenesis. Core component of box C/D small nucleolar ribonucleoprotein (snoRNP) particles. Required for the biogenesis of box C/D snoRNAs such U3, U8 and U14 snoRNAs. [70] O00567
rRNA 2'-O-methyltransferase fibrillarin FBL Has the ability to methylate both RNAs and proteins. Involved in pre-rRNA processing by catalyzing the site-specific 2'-hydroxyl methylation of ribose moieties in pre-ribosomal RNA. [71] [72] [73] P22087
40S ribosomal protein S6 RPS6 May play an important role in controlling cell growth and proliferation through the selective translation of particular classes of mRNA. [74] P62753

Clinical Significance

C7orf50 has been noted in a variety of genome-wide association studies (GWAS) and has been shown to be associated with type 2 diabetes among sub-Saharan Africans, [75] daytime sleepiness in African-Americans, [76] prenatal exposure to particulate matter, [77] heritable DNA methylation marks associated with breast cancer, [78] DNA methylation in relation to plasma carotenoids and lipid profile, [79] and has significant interactions with prion proteins. [80]

Related Research Articles

Transmembrane protein 241 is a ubiquitous sugar transporter protein which in humans is encoded by the TMEM241 gene.

The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.

<span class="mw-page-title-main">Transmembrane protein 255A</span> Mammalian protein found in Homo sapiens

Transmembrane protein 255A is a protein that is encoded by the TMEM255A gene. TMEM255A is often referred to as family with sequence similarity 70, member A (FAM70A). The TMEM255A protein is transmembrane and is predicted to be located the nuclear envelope of eukaryote organisms.

<span class="mw-page-title-main">C21orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.

<span class="mw-page-title-main">ZCCHC18</span> Protein-coding gene in the species Homo sapiens

Zinc finger CCHC-type containing 18 (ZCCHC18) is a protein that in humans is encoded by ZCCHC18 gene. It is also known as Smad-interacting zinc finger protein 2 (SIZN2), para-neoplastic Ma antigen family member 7b (PNMA7B), and LOC644353. Other names such as zinc finger, CCHC domain containing 12 pseudogene 1, P0CG32, ZCC18_HUMAN had been used to describe this protein.

<span class="mw-page-title-main">C15orf39</span>

C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.

<span class="mw-page-title-main">TMEM128</span>

TMEM128, also known as Transmembrane Protein 128, is a protein that in humans is encoded by the TMEM128 gene. TMEM128 has three variants, varying in 5' UTR's and start codon location. TMEM128 contains four transmembrane domains and is localized in the Endoplasmic Reticulum membrane. TMEM128 contains a variety of regulation at the gene, transcript, and protein level. While the function of TMEM128 is poorly understood, it interacts with several proteins associated with the cell cycle, signal transduction, and memory.

<span class="mw-page-title-main">WD Repeat and Coiled Coil Containing Protein</span> Protein-coding gene in humans

WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.

<span class="mw-page-title-main">CLIP4</span> Protein

CAP-Gly Domain Containing Linker Protein Family Member 4 is a protein that in humans is encoded by the CLIP4 gene. In terms of conserved domains, the CLIP4 gene contains primarily ankyrin repeats and the eponymous CAP-Gly domains. The structure of the CLIP4 protein is largely made up of coil, with alpha helices dominating the rest of the protein. CLIP4 mRNA expression occurs largely in the adrenal cortex and atrioventricular node. The literature encompassing CLIP4's conserved domains and paralogs points toward microtubule regulation as a possible function of CLIP4.

TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.

<span class="mw-page-title-main">MIF4GD</span> Protein-coding gene in the species Homo sapiens

MIF4GD, or MIF4G domain-containing protein, is a protein which in humans is encoded by the MIF4GD gene. It is also known as SLIP1, SLBP -interacting protein 1, AD023, and MIFD. MIF4GD is expressed ubiquitously in humans, and has been found to be involved in activating proteins for histone mRNA translation, alternative splicing and translation of mRNAs, and is a factor in the regulation of cell proliferation.

C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.

<span class="mw-page-title-main">C9orf85</span> Protein-coding gene in the species Homo sapiens

Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. Isoelectric point was found to be 9.54. The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation.

<span class="mw-page-title-main">C6orf136</span> Protein-coding gene in the species Homo sapiens

C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.

<span class="mw-page-title-main">FAM120AOS</span> Protein-coding gene in the species Homo sapiens

FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.

<span class="mw-page-title-main">FAM166C</span>

Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.

<span class="mw-page-title-main">TBC1D30</span> Protein-coding gene in the species Homo sapiens

TBC1D30 is a gene in the human genome that encodes the protein of the same name. This protein has two domains, one of which is involved in the processing of the Rab protein. Much of the function of this gene is not yet known, but it is expressed mostly in the brain and adrenal cortex.

<span class="mw-page-title-main">THAP3</span> Protein in Humans

THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

<span class="mw-page-title-main">LRRC74A</span> Protein-coding gene

Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000146540 Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000053553 Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 3 4 5 6 "C7orf50 chromosome 7 open reading frame 50 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-04-29.
  6. 1 2 "C7orf50 protein expression summary - The Human Protein Atlas". www.proteinatlas.org. Retrieved 2020-04-29.
  7. 1 2 "C7orf50 orthologs". NCBI. Retrieved 2020-05-02.
  8. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P (2002). "The Transport of Molecules between the Nucleus and the Cytosol". Molecular Biology of the Cell (4th ed.). Garland Science.
  9. 1 2 "I-TASSER server for protein structure and function prediction". zhanglab.ccmb.med.umich.edu. Retrieved 2020-04-29.
  10. Boivin V, Deschamps-Francoeur G, Scott MS (March 2018). "Protein coding genes as hosts for noncoding RNA expression". Seminars in Cell & Developmental Biology. 75: 3–12. doi: 10.1016/j.semcdb.2017.08.016 . PMID   28811264.
  11. HUGO Gene Nomenclature Committee. "MicroRNA protein coding host genes". GeneNames. Archived from the original on 2018-11-21. Retrieved 2020-04-29.
  12. 1 2 "Homo sapiens chromosome 7 open reading frame 50 (C7orf50), transcript variant 4, mRNA". 2020-04-25.{{cite journal}}: Cite journal requires |journal= (help)
  13. 1 2 "uncharacterized protein C7orf50 isoform a [Homo sapiens] - Protein - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-04-29.
  14. "PREDICTED: Homo sapiens chromosome 7 open reading frame 50 (C7orf50), transcript variant X6, mRNA". 2020-03-02.{{cite journal}}: Cite journal requires |journal= (help)
  15. "uncharacterized protein C7orf50 isoform X3 [Homo sapiens] - Protein - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-04-29.
  16. "ORF Finder". www.bioinformatics.org. Retrieved 2020-05-03.
  17. "Average protein size - Various - BNID 113349". bionumbers.hms.harvard.edu. Retrieved 2020-04-29.
  18. Kozlowski LP. "Proteome-pI - Proteome Isoelectric Point Database statistics". isoelectricpointdb.org. Retrieved 2020-04-29.
  19. "ExPASy - Compute pI/Mw tool". web.expasy.org. Retrieved 2020-04-29.
  20. "SAPS < Sequence Statistics < EMBL-EBI". www.ebi.ac.uk. Retrieved 2020-04-29.
  21. Zhu ZY, Karlin S (August 1996). "Clusters of charged residues in protein three-dimensional structures". Proceedings of the National Academy of Sciences of the United States of America. 93 (16): 8350–5. Bibcode:1996PNAS...93.8350Z. doi: 10.1073/pnas.93.16.8350 . PMC   38674 . PMID   8710874.
  22. "Pfam: Family: DUF2373 (PF10180)". pfam.xfam.org. Archived from the original on 2015-07-02. Retrieved 2020-04-29.
  23. 1 2 "Motif Scan". myhits.isb-sib.ch. Retrieved 2020-04-29.
  24. "NetNES 1.1 Server". www.cbs.dtu.dk. Retrieved 2020-05-02.
  25. la Cour T, Kiemer L, Mølgaard A, Gupta R, Skriver K, Brunak S (June 2004). "Analysis and prediction of leucine-rich nuclear export signals". Protein Engineering, Design & Selection. 17 (6): 527–36. doi: 10.1093/protein/gzh062 . PMID   15314210.
  26. "NPS@ : CONSENSUS secondary structure prediction". npsa-prabi.ibcp.fr. Retrieved 2020-04-29.
  27. "CFSSP: Chou & Fasman Secondary Structure Prediction Server". www.biogem.org. Retrieved 2020-04-29.
  28. 1 2 Zhang C, Freddolino PL, Zhang Y (July 2017). "COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information". Nucleic Acids Research. 45 (W1): W291–W299. doi: 10.1093/nar/gkx366 . PMC   5793808 . PMID   28472402.
  29. 1 2 Yang J, Zhang Y (July 2015). "I-TASSER server: new development for protein structure and function predictions". Nucleic Acids Research. 43 (W1): W174-81. doi: 10.1093/nar/gkv342 . PMC   4489253 . PMID   25883148.
  30. "C7orf50 protein (human) - STRING interaction network". string-db.org. Retrieved 2020-04-29.
  31. 1 2 "Genomatix - NGS Data Analysis & Personalized Medicine". www.genomatix.de. Retrieved 2020-04-29.[ permanent dead link ]
  32. "CpG Island Info". genome.ucsc.edu. Retrieved 2020-05-03.
  33. Gardiner-Garden M, Frommer M (July 1987). "CpG islands in vertebrate genomes". Journal of Molecular Biology. 196 (2): 261–82. doi:10.1016/0022-2836(87)90689-9. PMID   3656447.
  34. "AceView: Gene:C7orf50, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView". www.ncbi.nlm.nih.gov. Retrieved 2020-04-29.
  35. "2895856 - GEO Profiles - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-04-29.
  36. Smith PJ, Zhang C, Wang J, Chew SL, Zhang MQ, Krainer AR (August 2006). "An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers". Human Molecular Genetics. 15 (16): 2490–508. doi: 10.1093/hmg/ddl171 . PMID   16825284.
  37. Cartegni L, Wang J, Zhu Z, Zhang MQ, Krainer AR (July 2003). "ESEfinder: A web resource to identify exonic splicing enhancers". Nucleic Acids Research. 31 (13): 3568–71. doi:10.1093/nar/gkg616. PMC   169022 . PMID   12824367.
  38. "RNAfold web server". rna.tbi.univie.ac.at. Retrieved 2020-04-30.
  39. "TargetScanHuman 7.2". www.targetscan.org. Retrieved 2020-04-30.
  40. Chipman LB, Pasquinelli AE (March 2019). "miRNA Targeting: Growing beyond the Seed". Trends in Genetics. 35 (3): 215–222. doi:10.1016/j.tig.2018.12.005. PMC   7083087 . PMID   30638669.
  41. Friedman RC, Farh KK, Burge CB, Bartel DP (January 2009). "Most mammalian mRNAs are conserved targets of microRNAs". Genome Research. 19 (1): 92–105. doi:10.1101/gr.082701.108. PMC   2612969 . PMID   18955434.
  42. "C7orf50 protein expression summary - The Human Protein Atlas". www.proteinatlas.org. Retrieved 2020-05-02.
  43. "PSORT II Prediction". psort.hgc.jp. Retrieved 2020-05-02.
  44. Horton P, Nakai K (1997). "Better prediction of protein cellular localization sites with the k nearest neighbors classifier". Proceedings. International Conference on Intelligent Systems for Molecular Biology. 5: 147–52. PMID   9322029.
  45. "NetOGlyc 4.0 Server". www.cbs.dtu.dk. Retrieved 2020-05-02.
  46. Steentoft C, Vakhrushev SY, Joshi HJ, Kong Y, Vester-Christensen MB, Schjoldager KT, et al. (May 2013). "Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology". The EMBO Journal. 32 (10): 1478–88. doi:10.1038/emboj.2013.79. PMC   3655468 . PMID   23584533.
  47. Zhao Q, Xie Y, Zheng Y, Jiang S, Liu W, Mu W, et al. (July 2014). "GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs". Nucleic Acids Research. 42 (Web Server issue): W325-30. doi: 10.1093/nar/gku383 . PMC   4086084 . PMID   24880689.
  48. Ren J, Gao X, Jin C, Zhu M, Wang X, Shaw A, et al. (June 2009). "Systematic study of protein sumoylation: Development of a site-specific predictor of SUMOsp 2.0". Proteomics. 9 (12): 3409–3412. doi:10.1002/pmic.200800646. PMID   29658196. S2CID   4900031.
  49. "GPS-SUMO: Prediction of SUMOylation Sites & SUMO-interaction Motifs". sumosp.biocuckoo.org. Archived from the original on 2013-05-10. Retrieved 2020-05-02.
  50. "GPS 5.0 - Kinase-specific Phosphorylation Site Prediction". gps.biocuckoo.cn. Retrieved 2020-05-02.
  51. "NetPhos 3.1 Server". www.cbs.dtu.dk. Retrieved 2020-05-02.
  52. Blom N, Gammeltoft S, Brunak S (December 1999). "Sequence and structure-based prediction of eukaryotic protein phosphorylation sites". Journal of Molecular Biology. 294 (5): 1351–62. doi:10.1006/jmbi.1999.3310. PMID   10600390.
  53. Blom N, Sicheritz-Pontén T, Gupta R, Gammeltoft S, Brunak S (June 2004). "Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence". Proteomics. 4 (6): 1633–49. doi:10.1002/pmic.200300771. PMID   15174133. S2CID   18810164.
  54. Wang C, Xu H, Lin S, Deng W, Zhou J, Zhang Y, et al. (March 2020). "GPS 5.0: An Update on the Prediction of Kinase-specific Phosphorylation Sites in Proteins". Genomics, Proteomics & Bioinformatics. 18 (1): 72–80. doi: 10.1016/j.gpb.2020.01.001 . PMC   7393560 . PMID   32200042.
  55. "NetGlycate 1.0 Server". www.cbs.dtu.dk. Retrieved 2020-05-02.
  56. Johansen MB, Kiemer L, Brunak S (September 2006). "Analysis and prediction of mammalian protein glycation". Glycobiology. 16 (9): 844–53. doi: 10.1093/glycob/cwl009 . PMID   16762979.
  57. "Protein BLAST: search protein databases using a protein query". blast.ncbi.nlm.nih.gov. Retrieved 2020-05-02.
  58. "BLAST: Basic Local Alignment Search Tool". blast.ncbi.nlm.nih.gov. Retrieved 2020-05-02.
  59. "I-TASSER results". zhanglab.ccmb.med.umich.edu. Retrieved 2020-05-03.[ permanent dead link ]
  60. "IntAct Portal". www.ebi.ac.uk. Retrieved 2020-05-03.
  61. "CCSB Interactome Database". interactome.dfci.harvard.edu. Retrieved 2020-05-03.
  62. "THAP1 - THAP domain-containing protein 1 - Homo sapiens (Human) - THAP1 gene & protein". www.uniprot.org. Retrieved 2020-05-03.
  63. "tax - Protein Tax-2 - Human T-cell leukemia virus 2 (HTLV-2) - tax gene & protein". www.uniprot.org. Retrieved 2020-05-03.
  64. "PRNP - Major prion protein precursor - Homo sapiens (Human) - PRNP gene & protein". www.uniprot.org. Retrieved 2020-05-03.
  65. "ALDH1B1 - Aldehyde dehydrogenase X, mitochondrial precursor - Homo sapiens (Human) - ALDH1B1 gene & protein". www.uniprot.org. Retrieved 2020-05-03.
  66. "LYAR - Cell growth-regulating nucleolar protein - Homo sapiens (Human) - LYAR gene & protein". www.uniprot.org. Retrieved 2020-05-03.
  67. Miyazawa N, Yoshikawa H, Magae S, Ishikawa H, Izumikawa K, Terukina G, et al. (April 2014). "Human cell growth regulator Ly-1 antibody reactive homologue accelerates processing of preribosomal RNA". Genes to Cells. 19 (4): 273–86. doi:10.1111/gtc.12129. PMID   24495227. S2CID   6143550.
  68. Du X, Wang Q, Hirohashi Y, Greene MI (December 2006). "DIPA, which can localize to the centrosome, associates with p78/MCRS1/MSP58 and acts as a repressor of gene transcription". Experimental and Molecular Pathology. 81 (3): 184–90. doi:10.1016/j.yexmp.2006.07.008. PMID   17014843.
  69. "CCDC85B - Coiled-coil domain-containing protein 85B - Homo sapiens (Human) - CCDC85B gene & protein". www.uniprot.org. Retrieved 2020-05-03.
  70. "NOP56 - Nucleolar protein 56 - Homo sapiens (Human) - NOP56 gene & protein". www.uniprot.org. Retrieved 2020-05-03.
  71. "FBL - rRNA 2'-O-methyltransferase fibrillarin - Homo sapiens (Human) - FBL gene & protein". www.uniprot.org. Retrieved 2020-05-03.
  72. Tessarz P, Santos-Rosa H, Robson SC, Sylvestersen KB, Nelson CJ, Nielsen ML, et al. (January 2014). "Glutamine methylation in histone H2A is an RNA-polymerase-I-dedicated modification". Nature. 505 (7484): 564–8. Bibcode:2014Natur.505..564T. doi:10.1038/nature12819. PMC   3901671 . PMID   24352239.
  73. Iyer-Bierhoff A, Krogh N, Tessarz P, Ruppert T, Nielsen H, Grummt I (December 2018). "SIRT7-Dependent Deacetylation of Fibrillarin Controls Histone H2A Methylation and rRNA Synthesis during the Cell Cycle". Cell Reports. 25 (11): 2946–2954.e5. doi: 10.1016/j.celrep.2018.11.051 . PMID   30540930.
  74. "RPS6 - 40S ribosomal protein S6 - Homo sapiens (Human) - RPS6 gene & protein". www.uniprot.org. Retrieved 2020-05-03.
  75. Meeks KA, Henneman P, Venema A, Addo J, Bahendeka S, Burr T, et al. (February 2019). "Epigenome-wide association study in whole blood on type 2 diabetes among sub-Saharan African individuals: findings from the RODAM study". International Journal of Epidemiology. 48 (1): 58–70. doi:10.1093/ije/dyy171. PMC   6380309 . PMID   30107520.
  76. Barfield R, Wang H, Liu Y, Brody JA, Swenson B, Li R, et al. (August 2019). "Epigenome-wide association analysis of daytime sleepiness in the Multi-Ethnic Study of Atherosclerosis reveals African-American-specific associations". Sleep. 42 (8): zsz101. doi:10.1093/sleep/zsz101. PMC   6685317 . PMID   31139831.
  77. Gruzieva O, Xu CJ, Yousefi P, Relton C, Merid SK, Breton CV, et al. (May 2019). "Prenatal Particulate Air Pollution and DNA Methylation in Newborns: An Epigenome-Wide Meta-Analysis". Environmental Health Perspectives. 127 (5): 57012. doi:10.1289/EHP4522. PMC   6792178 . PMID   31148503.
  78. Joo JE, Dowty JG, Milne RL, Wong EM, Dugué PA, English D, et al. (February 2018). "Heritable DNA methylation marks associated with susceptibility to breast cancer". Nature Communications. 9 (1): 867. Bibcode:2018NatCo...9..867J. doi:10.1038/s41467-018-03058-6. PMC   5830448 . PMID   29491469.
  79. Tremblay BL, Guénard F, Lamarche B, Pérusse L, Vohl MC (June 2019). "Network Analysis of the Potential Role of DNA Methylation in the Relationship between Plasma Carotenoids and Lipid Profile". Nutrients. 11 (6): 1265. doi: 10.3390/nu11061265 . PMC   6628241 . PMID   31167428.
  80. Satoh J, Obayashi S, Misawa T, Sumiyoshi K, Oosumi K, Tabunoki H (February 2009). "Protein microarray analysis identifies human cellular prion protein interactors". Neuropathology and Applied Neurobiology. 35 (1): 16–35. doi: 10.1111/j.1365-2990.2008.00947.x . PMID   18482256. S2CID   32299311.