INAVA

Last updated
INAVA
Identifiers
Aliases INAVA , chromosome 1 open reading frame 106, C1orf106, innate immunity activator
External IDs OMIM: 618051 MGI: 1921579 HomoloGene: 10103 GeneCards: INAVA
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001142569
NM_018265
NM_001367289
NM_001367290

NM_028872

RefSeq (protein)

NP_001136041
NP_060735
NP_001354218
NP_001354219

NP_083148
NP_001392081
NP_001392082
NP_001392083

Location (UCSC) Chr 1: 200.89 – 200.92 Mb Chr 1: 136.14 – 136.16 Mb
PubMed search [3] [4]
Wikidata
View/Edit Human View/Edit Mouse

INAVA, sometimes referred to as hypothetical protein LOC55765, is a protein of unknown function that in humans is encoded by the INAVA gene. [5] Less common gene aliases include FLJ10901 and MGC125608.

Contents

Gene

Location

C1orf106 location on chromosome 1 Gene location.png
C1orf106 location on chromosome 1

In humans, INAVA is located on the long arm of chromosome 1 at locus 1q32.1. It spans from 200,891,499 to 200,915,736 (24.238 kb) on the plus strand. [5]

Gene neighborhood

Gene neighborhood C1orf106 gene neighborhood.png
Gene neighborhood

INAVA is flanked by G protein-coupled receptor 25 (upstream) and maestro heat-like repeat family member 3 (MROH3P), a predicted downstream pseudogene. Ribosomal protein L34 pseudogene 6 (RPL34P6) is further upstream and kinesin family member 21B is further downstream. [5]

Promoter

Predicted C1orf106 promoter region with putative transcription factor binding sites Promoter.pdf
Predicted C1orf106 promoter region with putative transcription factor binding sites

There are seven predicted promoters for INAVA, and experimental evidence suggests that isoform 1 and 2, the most common isoforms, are transcribed using different promoters. [6] MatInspector, a tool available through Genomatix, was used to predict transcription factor binding sites within potential promoter regions. The transcription factors that are predicted to target the anticipated promoter for isoform 1 are expressed in a range of tissues. The most common tissues of expression are the urogenital system, nervous system and bone marrow. This coincides with expression data for the INAVA protein, which is highly expressed in the kidney and bone marrow. [7] A diagram of the predicted promoter region, with highlighted transcription factor binding sites, is shown to the right. The factors that are predicted to bind to the promoter region of isoform 2 differ, and twelve of the top twenty predicted factors are expressed in blood cells and/or tissues of the cardiovascular system.

Expression

C1orf106 is expressed in a wide range of tissues. Expression data from GEO profiles is shown below. The sites of highest expression, are listed in the table. Expression is moderate in the placenta, prostate, testis, lung, salivary glands and dendritic cells. It is low in the brain, most immune cells, the adrenal gland, uterus, heart and adipocytes. [7] Expression data, from various experiments, found on GEO profiles suggests that INAVA expression is up-regulated in several cancers including: lung, ovarian, colorectal and breast.

C1orf106 expression data from GEO profiles C1orf106 Expression data.png
C1orf106 expression data from GEO profiles
TissuePercentile rank
B lymphocytes90
Trachea89
Skin88
Human bronchial epithelial cells88
Colorectal adenocarcinoma87
Kidney87
Tongue85
Pancreas84
Appendix82
Bone marrow80

mRNA

Isoforms

Nine putative isoforms are produced from the INAVA gene, seven of which are predicted to encode proteins. [8] Isoform 1 and 2, shown below, are the most common isoforms.

Most common C1orf106 isoforms C1orf106 isoforms.png
Most common C1orf106 isoforms

Isoform 1, which is the longest, is accepted as the canonical isoform. It contains ten exons, which encode a protein that is 677 amino acids long, depending on the source. Some sources report that the protein is only 663 amino acids due to the use of a start codon that is forty-two nucleotides downstream. According to NCBI, this isoform has only been predicted computationally. [5] This may be because the Kozak sequence surrounding the downstream start codon is more similar to the consensus Kozak sequence as shown in the table below. Softberry was used to obtain the sequence of the predicted isoform. [9] Isoform 2 is shorter due to a truncated N-terminus. Both isoforms have an alternative polyadenylation site. [8]

Surrounding sequence of start codons compared to Kozak consensus sequence Kozak consensus.png
Surrounding sequence of start codons compared to Kozak consensus sequence

miRNA regulation

Predicted miRNA target sequence MiRNA.png
Predicted miRNA target sequence

miRNA-24 was identified as a microRNA that could potentially target INAVA mRNA. [10] The binding site, which is located in the 5' untranslated region is shown.

Protein

General properties

C1of106 protein (isoform 1) C1orf106 diagram.png
C1of106 protein (isoform 1)

Isoform 1, diagramed below, contains a DUF3338 domain, two low complexity regions and a proline rich region. The protein is arginine and proline rich, and has a lower than average amount of asparagine and hydrophobic amino acids, specifically phenylalanine and isoleucine. [11] The isoelectric point is 9.58, and the molecular weight of the unmodified protein is 72.9 kdal. [12] The protein is not predicted to have an N-terminal signal peptide, but there are predicted nuclear localization signals (NLS) and a leucine rich nuclear export signal. [13] [14] [15]

Modifications

INAVA is predicted to be highly phosphorylated. [16] [17] Phosphoylation sites predicted by PROSITE are shown in the table below. NETPhos predictions are illustrated in the diagram. Each line points to a predicted phosphorylation site, and connects to a letter which represents either serine (S), threonine (T) or tyrosine (Y).

Phosphorylation sites predicted by PROSITE Phosphorylation table.png
Phosphorylation sites predicted by PROSITE
Phosphorylation sites predicted by NETPhos. Letter corresponds to serine (S), threonine (T) or tyrosine (Y). Phosphodiagram.png
Phosphorylation sites predicted by NETPhos. Letter corresponds to serine (S), threonine (T) or tyrosine (Y).

Structure

Coiled-coils are predicted to span from residue 130-160 and 200–260. [18] The secondary composition was predicted to be about 60% random coils, 30% alpha helices and 10% beta sheets. [19]

Interactions

The proteins with which the INAVA protein interacts are not well characterized. Text mining evidence suggests INAVA may interact with the following proteins: DNAJC5G, SLC7A13, PIEZO2, MUC19. [20] Experimental evidence, from a yeast two hybrid screen, suggests the INAVA protein interacts with 14-3-3 protein sigma, which is an adaptor protein. [21]

Homology

INAVA is well conserved in vertebrates as shown in the table below. Sequences were retrieved from BLAST [22] and BLAT. [23]

SequenceGenus and speciesCommon nameNCBI accessionLength(aa)Sequence identityTime since divergence (Mya)
*C1orf106Homo sapiens Human NP_060735.3 667100%NA
*C1orf106Macaca fascicularis Crab-eating macaque XP_005540414.1 70397%29.0
*LOC289399Rattus norvegicus Norway rat NP_001178750.1 66786%92.3
*Predicted C1orf106 homologOdobenus rosmarus divergens Walrus XP_004392787.1 67285%94.2
*C1orf106-likeLoxodonta africana Elephant XP_003410255.1 66384%98.7
*Predicted C1orf106 homologDasypus novemcinctus Nine-banded armadillo XP_004478752.1 67681%104.2
*Predicted C1orf106 homologOchotona princeps American pika XP_004578841.1 68178%92.3
*Predicted C1orf106 homologMonodelphis domestica Gray short-tailed opossum XP_001367913.2 57876%162.2
*Predicted C1orf106 homologChrysemys picta bellii Painted turtle XP_005313167.1 60256%296.0
*Predicted C1orf106 homologGeospiza fortis Medium ground finch XP_005426868.1 54250%296.0
*Predicted C1orf106 homologAlligator mississippiensis Alligator XP_006278041.1 54749%296.0
*Predicted C1orf106 homologFicedula albicollis Collared flycatcher XP_005059352.1 54249%296.0
Predicted C1orf106 homologLatimeria chalumnae West Indian Ocean coelacanth XP_005988436.1 61346%414.9
*Predicted C1orf106 homologLepisosteus oculatus Spotted gar XP_006628420.1 63744%400.1
*FERM domain containing 4AXenopus (Silurana) tropicalis Western clawed frog XP_002935289.2 69543%371.2
*Predicted C1orf106 homologOreochromis niloticus Nile tilapia XP_005478188.1 57640%400.1
Predicted C1orf106 homologHaplochromis burtoni Astatotilapia burtoni XP_005914919.1 57640%400.1
Predicted C1orf106 homologPundamilia nyererei Haplochromis nyererei XP_005732720.1 57740%400.1
*LOC563192Danio rerio Zebrafish NP_001073474.1 61237%400.1
LOC101161145Oryzias latipes Japanese rice fish XP_004069287.1 61233%400.1

A graph of the sequence identity versus the time since divergence for the asterisked entries is shown below. The colors correspond to degree of relatedness (green = closely related, purple = distantly related).

Percent sequence identity in relation to species relatedness C1orf106Conservation.png
Percent sequence identity in relation to species relatedness

Paralogs

Proteins that are considered to be INAVA paralogs are not consistent between databases. A multiple sequence alignment (MSA) of potentially paralogous proteins was made to determine the likelihood of a truly paralogous relationship. [24] The sequences were retrieved from a BLAST search in humans with the C1orf106 protein. The MSA suggests the proteins share a homologous domain, DUF3338, which is found in eukaryotes. A portion of the multiple sequence alignment is shown below. Apart from the DUF domain (boxed in green), there was little conservation. The DUF3338 domain does not have any extraordinary physical properties, however, one notable finding is that each of the proteins in the MSA is predicted to have two nuclear localization signals. The proteins in the MSA are all predicted to localize to the nucleus. [13] A comparison of the physical properties of the proteins was also conducted using SAPS and is shown in the table. [11]

Conservation of the DUF3338 domain in humans DUF3338 Domain.png
Conservation of the DUF3338 domain in humans
Physical properties of potential paralogs Paralog Properties (C1orf106.png
Physical properties of potential paralogs

Clinical significance

A total of 556 single nucleotide polymorphisms (SNPs) have been identified in the gene region of INAVA, 96 of which are associated with a clinical source. [25] Rivas et al. [26] identified four SNPs, shown in the table below, that may be associated with inflammatory bowel disease and Crohn's disease. According to GeneCards, other disease associations may include multiple sclerosis and ulcerative colitis. [27]

ResidueChangeNotes
333 (rs41313912)Tyrosine ⇒ phenylalaninePhosphorylated, moderate conservation
376Arginine ⇒ cysteineModerate conservation
397Arginine ⇒ threonineNot conserved
554 (rs61745433)Arginine ⇒ cysteineModerate conservation

Model organisms

Model organisms have been used in the study of INAVA function. A conditional knockout mouse line called 5730559C18Riktm2a(EUCOMM)Wtsi was generated at the Wellcome Trust Sanger Institute. [28] Male and female animals underwent a standardized phenotypic screen [29] to determine the effects of deletion. [30] [31] [32] [33] Additional screens performed: - In-depth immunological phenotyping [34] - in-depth bone and cartilage phenotyping [35]

Related Research Articles

<span class="mw-page-title-main">C11orf1</span> Protein-coding gene in the species Homo sapiens

Chromosome 11 open reading frame one, also known as C11orf1, is a protein-coding gene. It has been found by yeast two hybrid screen to bind to SETDB1 a histone protein methyltransferase enzyme. SETDB1 has been implicated in Huntington's disease, a neurodegenerative disorder.

<span class="mw-page-title-main">HIKESHI</span> Protein-coding gene in the species Homo sapiens

HIKESHI is a protein important in lung and multicellular organismal development that, in humans, is encoded by the HIKESHI gene. HIKESHI is found on chromosome 11 in humans and chromosome 7 in mice. Similar sequences (orthologs) are found in most animal and fungal species. The mouse homolog, lethal gene on chromosome 7 Rinchik 6 protein is encoded by the l7Rn6 gene.

<span class="mw-page-title-main">OSER1</span> Protein-coding gene in the species Homo sapiens

Chromosome 20 open reading frame 111, or C20orf111, is the hypothetical protein that in humans is encoded by the C20orf111 gene. C20orf111 is also known as Perit1, HSPC207, and dJ1183I21.1. It was originally located using genomic sequencing of chromosome 20. The National Center for Biotechnology Information, or NCBI, shows that it is located at q13.11 on chromosome 20, however the genome browser at the University of California-Santa Cruz (UCSC) website shows that it is at location q13.12, and within a million base pairs of the adenosine deaminase locus. It was also found to have an increase in expression in cells undergoing hydrogen peroxide(H
2
O
2
)-induced apoptosis. After analyzing the amino acid content of C20orf111, it was found to be rich in serine residues.

<span class="mw-page-title-main">ARMH3</span> Protein-coding gene in the species Homo sapiens

ARMH3 or Armadillo Like Helical Domain Containing 3, also known as UPF0668 and c10orf76, is a protein that in humans is encoded by the ARMH3 gene. Its function is not currently known, but experimental evidence has suggested that it may be involved in transcriptional regulation. The protein contains a conserved proline-rich motif, suggesting that it may participate in protein-protein interactions via an SH3-binding domain, although no such interactions have been experimentally verified. The well-conserved gene appears to have emerged in Fungi approximately 1.2 billion years ago. The locus is alternatively spliced and predicted to yield five protein variants, three of which contain a protein domain of unknown function, DUF1741.

<span class="mw-page-title-main">CCDC138</span>

Coiled-coil domain-containing protein 138, also known as CCDC138, is a human protein encoded by the CCDC138 gene. The exact function of CCDC138 is unknown.

<span class="mw-page-title-main">CFAP206</span> Protein-coding gene in the species Homo sapiens

Cilia And Flagella Associated Protein 206 (CFAP206) is a gene that in humans encodes a protein “DUF3508”. This protein has a function that is not currently very well understood. Other known aliases are “dJ382I10.1, UPF0704 Protein C6orf165.” In humans, the gene coding sequence is 56,501 base pairs long, with an mRNA of 2,215 base pairs, and a protein sequence of 622 amino acids. The C6orf165 gene is conserved in chimpanzee, rhesus monkey, dog, cow, mouse, rat, chicken, zebrafish, mosquito, frog, and more C6orf165 is rarely expressed in humans, with relatively high expression in brain, lungs (trachea) and testis. The molecular weight of UPF0704 is 71,193 Da and the PI is 6.38

<span class="mw-page-title-main">CCDC47</span> Protein-coding gene in the species Homo sapiens

Coiled-coil domain 47 (CCDC47) is a gene located on human chromosome 17, specifically locus 17q23.3 which encodes for the protein CCDC47. The gene has several aliases including GK001 and MSTP041. The protein itself contains coiled-coil domains, the SEEEED superfamily, a domain of unknown function (DUF1682) and a transmembrane domain. The function of the protein is unknown, but it has been proposed that CCDC47 is involved in calcium ion homeostasis and the endoplasmic reticulum overload response.

<span class="mw-page-title-main">FAM167A</span> Protein-coding gene in the species Homo sapiens

Family with sequence similarity 167, member A is a protein in humans that is encoded by the FAM167A gene located on chromosome 8. FAM167A and its paralogs are protein encoding genes containing the conserved domain DUF3259, a protein of unknown function. FAM167A has many orthologs in which the domain of unknown function is highly conserved.

<span class="mw-page-title-main">C12orf42</span> Protein-coding gene in the species Homo sapiens

Chromosome 12 Open Reading Frame 42 (C12orf42) is a protein-encoding gene in Homo sapiens.

<span class="mw-page-title-main">C21orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.

<span class="mw-page-title-main">C19orf44</span> Mammalian protein found in Homo sapiens

Chromosome 19 open reading frame 44 is a protein that in humans is encoded by the C19orf44 gene. C19orf44 is an uncharacterized protein with an unknown function in humans. C19orf44 is non-limiting implying that the protein exists in other species besides human. The protein contains one domain of unknown function (DUF) that is highly conserved throughout its orthologs. This protein is most highly expressed in the testis and ovary, but also has significant expression in the thyroid and parathyroid. Other names for this protein include: LOC84167.

<span class="mw-page-title-main">C4orf51</span> Protein-coding gene in the species Homo sapiens

Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.

<span class="mw-page-title-main">CFAP299</span> Protein-coding gene in the species Homo sapiens

Cilia- and flagella-associated protein 299 (CFAP299), is a protein that in humans is encoded by the CFAP299 gene. CFAP299 is predicted to play a role in spermatogenesis and cell apoptosis.

<span class="mw-page-title-main">C1orf198</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 open reading frame 198 (C1orf198) is a protein that in humans is encoded by the C1orf198 gene. This particular gene does not have any paralogs in Homo sapiens, but many orthologs have been found throughout the Eukarya domain. C1orf198 has high levels of expression in all tissues throughout the human body, but is most highly expressed in lung, brain, and spinal cord tissues. Its function is most likely involved in lung development and hypoxia-associated events in the mitochondria, which are major consumers of oxygen in cells and are severely affected by decreases in available cellular oxygen.

<span class="mw-page-title-main">C9orf50</span> Protein-coding gene in the species Homo sapiens

Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene. C9orf50 has one other known alias, FLJ35803. In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.

<span class="mw-page-title-main">C1orf185</span> Protein-coding gene in the species Homo sapiens

Chromosome 1 open reading frame 185, also known as C1orf185, is a protein that in humans is encoded by the C1orf185 gene. In humans, C1orf185 is a lowly expressed protein that has been found to be occasionally expressed in the circulatory system.

<span class="mw-page-title-main">C14orf119</span> Protein-coding gene in the species Homo sapiens

C14orf119 is a protein that in humans is encoded by the c14orf119 gene. The c14orf119 protein is predicted to be localized in the nucleus. Additionally, c14orf119 expression is decreased in individuals with systemic lupus erythematosus (SLE) when compared with healthy individual and is increased in individuals with various types of lymphomas when compared to healthy individuals.

<span class="mw-page-title-main">CCDC121</span> Protein-coding gene in the species Homo sapiens

Coiled-coil domain containing 121 (CCDC121) is a protein encoded by the CCDC121 gene in humans. CCDC121 is located on the minus strand of chromosome 2 and encodes three protein isoforms. All isoforms of CCDC121 contain a domain of unknown function referred to as DUF4515 or pfam14988.

<span class="mw-page-title-main">C6orf136</span>

C6orf136 is a protein in humans encoded by the C6orf136 gene. The gene is conserved in mammals, mollusks, as well some porifera. While the function of the gene is currently unknown, C6orf136 has been shown to be hypermethylated in response to FOXM1 expression in Head Neck Squamous Cell Carcinoma (HNSCC) tissue cells. Additionally, elevated expression of C6orf136 has been associated with improved survival rates in patients with bladder cancer. C6orf136 has three known isoforms.

<span class="mw-page-title-main">CCDC190</span> Protein-coding gene in the species Homo sapiens

Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.

References

  1. 1 2 3 GRCh38: Ensembl release 89: ENSG00000163362 - Ensembl, May 2017
  2. 1 2 3 GRCm38: Ensembl release 89: ENSMUSG00000041605 - Ensembl, May 2017
  3. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. 1 2 3 4 "NCBI Gene 55765" . Retrieved 10 February 2014.
  6. "Genomatix: MatInspector" . Retrieved 6 March 2014.
  7. 1 2 "GEO Profiles" . Retrieved 6 March 2014.
  8. 1 2 "Aceview" . Retrieved 6 March 2014.
  9. "Softberry" . Retrieved 20 April 2014.
  10. "TargetScanHuman 6.2" . Retrieved 15 April 2014.
  11. 1 2 "Statistical Analysis of Protein Sequences" . Retrieved 20 April 2014.
  12. "Compute pI/Mw tool" . Retrieved 10 April 2014.
  13. 1 2 "PSORTII" . Retrieved 20 April 2014.
  14. "cNLS Mapper". Archived from the original on 22 November 2021. Retrieved 20 April 2014.
  15. "NetNES" . Retrieved 20 April 2014.
  16. "NETPhos" . Retrieved 20 April 2014.
  17. "Swiss Institute of Bioinformatics: PROSITE".
  18. "ExPASy COILS". Archived from the original on 22 April 2014. Retrieved 20 April 2014.
  19. "SOPMA" . Retrieved 27 April 2014.
  20. "STRING" . Retrieved 15 April 2014.
  21. "MINT" . Retrieved 15 April 2014.
  22. "BLAST" . Retrieved 8 March 2014.
  23. "BLAT" . Retrieved 8 March 2014.
  24. "SDSC Biology Workbench: ClustalW" . Retrieved 12 March 2014.
  25. "dbSNP" . Retrieved 22 April 2014.
  26. Rivas MA; et al. (2011). "Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease". Nature Genetics. 43 (11): 1066–1073. doi:10.1038/ng.952. PMC   3378381 . PMID   21983784.
  27. "GeneCards" . Retrieved 1 May 2014.
  28. Gerdin AK (2010). "The Sanger Mouse Genetics Programme: high throughput characterisation of knockout mice". Acta Ophthalmologica. 88: 925–7. doi:10.1111/j.1755-3768.2010.4142.x. S2CID   85911512.
  29. 1 2 "International Mouse Phenotyping Consortium".
  30. Skarnes WC, Rosen B, West AP, Koutsourakis M, Bushell W, Iyer V, Mujica AO, Thomas M, Harrow J, Cox T, Jackson D, Severin J, Biggs P, Fu J, Nefedov M, de Jong PJ, Stewart AF, Bradley A (Jun 2011). "A conditional knockout resource for the genome-wide study of mouse gene function". Nature. 474 (7351): 337–42. doi:10.1038/nature10163. PMC   3572410 . PMID   21677750.
  31. Dolgin E (Jun 2011). "Mouse library set to be knockout". Nature. 474 (7351): 262–3. doi: 10.1038/474262a . PMID   21677718.
  32. Collins FS, Rossant J, Wurst W (Jan 2007). "A mouse for all reasons". Cell. 128 (1): 9–13. doi: 10.1016/j.cell.2006.12.018 . PMID   17218247. S2CID   18872015.
  33. White JK, Gerdin AK, Karp NA, Ryder E, Buljan M, Bussell JN, Salisbury J, Clare S, Ingham NJ, Podrini C, Houghton R, Estabel J, Bottomley JR, Melvin DG, Sunter D, Adams NC, Sanger Institute Mouse Genetics Project, Tannahill D, Logan DW, Macarthur DG, Flint J, Mahajan VB, Tsang SH, Smyth I, Watt FM, Skarnes WC, Dougan G, Adams DJ, Ramirez-Solis R, Bradley A, Steel KP (2013). "Genome-wide generation and systematic phenotyping of knockout mice reveals new roles for many genes". Cell. 154 (2): 452–64. doi:10.1016/j.cell.2013.06.022. PMC   3717207 . PMID   23870131.
  34. 1 2 "Infection and Immunity Immunophenotyping (3i) Consortium". Archived from the original on 2015-05-21. Retrieved 2015-05-19.
  35. 1 2 "OBCD Consortium".