LOC101928193

Last updated
LOC101928193
Identifiers
Aliases uncharacterized LOC101928193
External IDs GeneCards:
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

XM_011519275
XM_011519276
XM_011548774

n/a

RefSeq (protein)

n/a

n/a

Location (UCSC)n/an/a
PubMed search [1] n/a
Wikidata
View/Edit Human

LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. [2] The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. [3] The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. [4] The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans. [5]

Contents

Gene

Locus

Cytogenic location of LOC101928193 at 9q34.2. The gene is on the positive strand and is located from base pairs 133,189,767 to 133,192,979. Cytogenic location of LOC101928193 at 9q34.2.png
Cytogenic location of LOC101928193 at 9q34.2. The gene is on the positive strand and is located from base pairs 133,189,767 to 133,192,979.

The cytogenic location of LOC101928193 in humans is located on the positive strand at 9q34.2. The molecular location of the protein-encoding region of LOC101928193 is from base pairs 133,189,767 to 133,192,979. Within this region, there is 1 intron and 2 exons. [4]

Gene neighborhood

LOC101928193 is flanked by GBGT1 and 0BP2B on chromosome 9. [5] GBGT1 encodes a member of the ABO gene family and also plays a role in synthesizing glycolipids that are involved in tropism and binding pathogens. [6] 0BP2B is a gene that associates with E-Selectin Level in the ABO gene region. [7]

mRNA

In humans, the LOC101928193 gene produces 3 transcript variants, which produce 3 isoforms of the protein. [4] The LOC101928193 isoform X1 is the longest one at 406 codons in length. [8] LOC101928193 isoform X2 is 388 codons long and LOC101928193 isoform X3 is 399 codons long. [9] [10] All isoforms have 2 exons and their coding mRNA is 3213 nucleotides long. [4]

Protein

The molecular weight of LOC101928193 is 43.5 kilodaltons. [11] The isoelectric point is 9 pI. [11]

LOC101928193 amino acid composition. This is a glycine, valine, and serine rich protein. It is also methionine, asparagine, aspartic acid, glutamic acid, and lysine poor. LOC101929193 amino acid composition.png
LOC101928193 amino acid composition. This is a glycine, valine, and serine rich protein. It is also methionine, asparagine, aspartic acid, glutamic acid, and lysine poor.

Composition

LOC101928193 predicted secondary structure. The orange c's indicate predicted coils and the red e's indicate predicted beta sheets. LOC101928193 Secondary Structure.png
LOC101928193 predicted secondary structure. The orange c's indicate predicted coils and the red e's indicate predicted beta sheets.

Compared to most human proteins, there are more valine, glycine, serine, histidine, and phenylalanine residues in LOC101928193. [12] LOC101928193 is an alanine, methionine, asparagine, aspartic acid, glutamic acid, and lysine poor protein. The enrichment of all other amino acids is normal compared to other human proteins. LOC101928193 composition is highly conserved between mammals. [12]

LOC101928193 Amino Acid Enrichment [12]
Amino acidEnrichment levelResidues presentProperties
Valine (V)Fully enriched51 (12.6%)Hydrophobic
Glycine (G)Semi-enriched62 (15.3%)Polar
Serine (S)Semi-enriched49 (12.1%)Polar
Histidine (H)Semi-enriched18 (4.4%)Basic
Phenylalanine (F)Semi-enriched28 (6.9%)Hydrophobic
Alanine (A)Fully depleted7 (1.7%)Hydrophobic
Methionine (M)Fully depleted1 (0.2%)Hydrophobic
Asparagine (N)Fully depleted0 (0%)Polar
Aspartic acid (D)Fully depleted1 (0.2%)Polar
Glutamic acid (E)Fully depleted2 (0.5%)Polar
Lysine (K)Fully depleted0 (0%)Polar

LOC101928193 has an amino acid charge distribution of 0.7% negative, 4.9% positive, and 94.4% neutral. There are no charge runs, hydrophobic segments, or transmembrane domains.

LOC101928193 predicted tertiary structure. Image coloured by rainbow N - C terminus. LOC101928193 tertiary phyre2 structure.jpg
LOC101928193 predicted tertiary structure. Image coloured by rainbow N → C terminus.

Domains and motifs

There are two different motifs present in LOC101928193. Myristoylation sites are found in the protein sequence 17 times, and a zinc finger domain motif occurs once. [15] The presence of myristoylation sites indicates that LOC101928193 may function in membrane targeting, protein-protein interactions, and signal transduction pathways. Zinc finger domain motifs aid in gene transcription, cell adhesion, protein folding, and chromatin remodeling. [15]

Primary sequence

The LOC101928193 primary coding sequence mRNA is 3213 nucleotides long. [8] There are no upstream open-reading frames, Kozak consensus sequences, or transmembrane regions.

LOC101928193 conceptual translation with post-translational modifications and motifs. LOC101928193 Conceptual Translation.png
LOC101928193 conceptual translation with post-translational modifications and motifs.

Secondary structure

LOC101928193 has a predicted secondary structure of 56.40% random coils and 43.60% beta sheets. [13] No alpha helices are predicted to occur. Due to the lack of alpha helices in the protein, no coiled coils are predicted to occur in the LOC101928193 secondary structure. [16]

Tertiary structure

The tertiary structure of LOC101928193 is an all beta-sheet protein, as can be seen by its predicted tertiary structure. Both the N-terminus and the C-terminus lack beta-sheets.

Post-translational modifications

O-GlcNAc

There are 13 predicted O-GlcNAc sites within the LOC101928193 protein. [17] O-GlcNAc is a unique form of protein glycosylation that occurs exclusively in the nuclear and cytoplasmic compartments of the cell. [18] O-Glc-NAcylated proteins are abundant on proteins involved in signaling pathways, stress responses, cytoskeletal assembly, and energy metabolism.

N-linked glycosylation

There are no N-linked glycosylation sites due to the absence of asparagine residues.

Phosphorylation

LOC101928193 has many sites of phosphorylation at several serines, threonines, and tyrosines throughout its structure that results in a conformational change and aids in signaling pathways and regulation. There are 33 predicted phosphorylation sites. [19] The relative amount of phosphorylation sites is highly conserved throughout orthologs of LOC101928193. [19]

Subcellular localization

LOC101928193 is targeted to the cytoplasm for Homo sapiens, rodents, amphibians, fish, and mollusks. [20] It is predicted to localize in the nucleus for cnidarians, fungi, and bacteria. [20]

Expression

The Mean RPKM Values of 27 Different Human Tissues From RNA-Sequencing of LOC101928193. The protein is most highly expressed in the thyroid, ovaries, skin, and testes. LOC101928193 Gene Expression NCBI Data.png
The Mean RPKM Values of 27 Different Human Tissues From RNA-Sequencing of LOC101928193. The protein is most highly expressed in the thyroid, ovaries, skin, and testes.

LOC101928193 is not expressed ubiquitously, but is instead tissue specific in low levels of mRNA abundance compared to other human proteins. [5] LOC101928193 has the highest level of expression in the thyroid and has high levels of expression in the ovaries, skin, and testes. [5] Additionally, the gene is expressed in 23 other tissues at levels lower than 0.1 RPKM (Reads Per Kilobase of transcript per Million mapped reads) in humans. Other studies have also found that tissue-specific circular RNA induction of LOC101928193 during human fetal development has the highest levels in the heart, kidney, and stomach at 10 weeks gestational time. [4]

Regulation of Expression

LOC101928193 promoter and isoforms. There is one promoter and three isoforms. LOC101928193 Promoter and Isoforms.png
LOC101928193 promoter and isoforms. There is one promoter and three isoforms.

Epigenetic

Epigenetic processes such as DNA methylation and histone modification that control expression have not been found in LOC101928193.

Transcriptional

LOC101928193 5' UTR stem loops near AUG. LOC101928193 stem loops.png
LOC101928193 5' UTR stem loops near AUG.

Promoter

There is one promoter for the LOC101928193 gene (GXP_6058323), and it is 1101 nucleotides long on the positive strand from base pairs 133,188,767 to 133,189,867 on chromosome 9. [21] The transcription start site can be found at the 1001 base pair position. [21]

Transcription factor binding sites

Several transcription factors are predicted to bind to the promoter sequence. Some examples include: [23]

Based on the functions of these transcription factors, it is possible that LOC101928193 may have been involved in gene repression, hematopoiesis regulation, fetal development, inhibition, DNA-binding, or limb development.

Translational and mRNA stability

Under conditions consistent with the temperature in the human body, multiple stem loops are predicted to occur in the 5' UTR, the coding region of the protein, and in the 3' UTR. The stem loops direct RNA folding, protect structural stability for mRNA, provide recognition sites for RNA binding proteins, and serve as a substrate for enzymatic reactions. [24] There is an interior loop and a stem loop in the mRNA near AUG on the 5' UTR. [22] These structures are often bound by proteins or cause the attenuation of a transcript in order to regulate translation. Furthermore, these stem-loops aid in mRNA stability and the predicted 5' UTR conformation has a free energy of -124.30 kcal/mol. [22] In the 3' UTR, there are 6 predicted stem loops to occur with a free energy of -310.70 kcal/mol, which is spontaneously formed. [22] There are no known microRNA targets in the 3' UTR.

Homology and Evolution

LOC101928193 Unrooted Phylogenetic Tree. Color coded by taxonomic group: Mammals (orange), amphibians (green), fish (blue), mollusks (yellow), cnidarians (teal), fungi (lime green), and bacteria (purple). LOC101928193 Unrooted Phylogenetic Tree.png
LOC101928193 Unrooted Phylogenetic Tree. Color coded by taxonomic group: Mammals (orange), amphibians (green), fish (blue), mollusks (yellow), cnidarians (teal), fungi (lime green), and bacteria (purple).

Paralogs

There are no known paralogs of LOC101928193.

Orthologs

LOC101929193 rate of evolution in comparison to cytochrome c and fibrinogen. LOC101929193 rate of evolution in comparison to cytochrome C and fibrinogen.png
LOC101929193 rate of evolution in comparison to cytochrome c and fibrinogen.
LOC101928193 conserved coding domain found from a multiple sequence alignment of orthologs. A sequence logo provides a richer and more precise description of, for example, a binding site, than would a consensus sequence. LOC101928193 Conserved Coding Domain.png
LOC101928193 conserved coding domain found from a multiple sequence alignment of orthologs. A sequence logo provides a richer and more precise description of, for example, a binding site, than would a consensus sequence.

LOC101928193 has over 20 orthologs that are present in mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. [8] The most distant orthologs are found in bacteria that diverged from humans more than 4.29 billion years ago. [26] No orthologs for LOC101928193 have been discovered in close mammalian relatives of humans, including in primates. Below is a table of a range of organisms with orthologs related to the human LOC101928193 protein.

LOC101928193 Orthologs [2]
SpeciesCommon nameDate of divergence (MYA) [28] NCBI accession #Sequence length (amino acids)Protein identityProtein similarity
Homo sapiens Humans0XP_011517577406100%100%
Microtus ochrogaster Prairie vole90XP_02664365117333%45%
Xenopus tropicalis Western clawed frog352OCA3272925925%33%
Xenopus laevis African clawed frog352OCT7546516726%41%
Acipenser ruthenus Sterlet435RXM9222825925%33%
Carassius auratus Goldfish435XP_02613314343725%35%
Biomphalaria glabrata Freshwater snail797XP_01306791613131%47%
Mizuhopecten yessoensis Japanese scallop797OWF4445128415%32%
Onthophagus taurus Taurus scarab797XP_02290335929424%33%
Stylophora pistillata Hood coral824PFX2156112637%42%
Nematostella vectensis Starlet sea anemone824XP_00164128941831%36%
Sphaeroforma arctica Sphaeroforma arctica1023XP_01414760412334%47%
Verticillium alfalfe Verticillium alfalfe1105XP_00300004935528%35%
Chitinispirillum alkaliphilum Chitinispirillum alkaliphilum4290KMQ4964210542%55%
Burkholderia pseudomallei Pseudomonas pseudomallei4290ALJ7535123833%42%

Distant homologs

The most distant detectable homolog is in several viral and bacterial species that diverged from humans over 4.29 billion years ago. [26]

Homologous domains

There is a conserved coding region of 28 amino acids that is repeated six times in the protein-encoding region within LOC101928193 and across its orthologs. This domain begins with a glycine at the amino acid position of 194, 222, 250, 278, 306, and 334 within LOC101928193. The domain is conserved across mammals, cnidarians, fish, bacteria, and amphibians, and even in some species within these taxonomic groups that are not orthologs but share the same domain. The sequence always begins with a polar glycine and a hydrophobic valine. There is also a conserved basic arginine within the middle of the sequence.

Phylogeny

No other species has LOC101928193 in the same form as in humans. Several species within mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria have LOC101928193 in a slightly different form with a similarity usually between 30 and 50%. Several taxonomic groups do not express any proteins or genes similar to LOC101928193 including Archaeans, plants, and several animal species.

Inheritance

LOC101928193 may not follow a normal inheritance pattern or occur regularly in the genome as it has a scattered occurrence throughout evolutionarily related species. [2] Furthermore, the similarity between orthologs of LOC101928193 is constant over time and is not higher in closely related taxonomic groups or lower in distantly related taxonomic groups. It is possible that LOC101928193 incorporates into the genome of different species through viral pathways as LOC101928193 has been found to have ligand binding sites for cyanobacteria proteins, like chlorophyll a. [29] Orthologs of LOC101928193 have been found to contain UL36, which is a large tegument protein that functions in the viral cycle and is commonly found in human herpesvirus simplex virus 1. [30] [31]

Related Research Articles

<span class="mw-page-title-main">FAM76A</span> Protein-coding gene in the species Homo sapiens

FAM76A is a protein that in Homo sapiens is encoded by the FAM76A gene. Notable structural characteristics of FAM76A include an 83 amino acid coiled coil domain as well as a four amino acid poly-serine compositional bias. FAM76A is conserved in most chordates but it is not found in other deuterostrome phlya such as echinodermata, hemichordata, or xenacoelomorpha—suggesting that FAM76A arose sometime after chordates in the evolutionary lineage. Furthermore, FAM76A is not found in fungi, plants, archaea, or bacteria. FAM76A is predicted to localize to the nucleus and may play a role in regulating transcription.

The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.

<span class="mw-page-title-main">C8orf58</span> Protein-coding gene in the species Homo sapiens

Chromosome 8 open reading frame 58 is an uncharacterised protein that in humans is encoded by the C8orf58 gene. The protein is predicted to be localized in the nucleus.

<span class="mw-page-title-main">C15orf39</span>

C15orf39 is a protein that in humans is encoded by the Chromosome 15 open reading frame 15 (C15orf39) gene.

<span class="mw-page-title-main">C16orf86</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.

C11orf42 is an uncharacterized protein in Homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.

<span class="mw-page-title-main">CXorf38 Isoform 1</span> Human protein

Chromosome X Open Reading Frame 38 (CXorf38) is a protein which, in humans, is encoded by the CXorf38 gene. CXorf38 appears in multiple studies regarding the escape of X chromosome inactivation.

<span class="mw-page-title-main">WD Repeat and Coiled Coil Containing Protein</span> Protein-coding gene in humans

WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.

<span class="mw-page-title-main">C16orf90</span> Protein-coding gene in the species Homo sapiens

C16orf90 or chromosome 16 open reading frame 90 produces uncharacterized protein C16orf90 in homo sapiens. C16orf90's protein has four predicted alpha-helix domains and is mildly expressed in the testes and lowly expressed throughout the body. While the function of C16orf90 is not yet well understood by the scientific community, it has suspected involvement in the biological stress response and apoptosis based on expression data from microarrays and post-translational modification data.

<span class="mw-page-title-main">C20orf202</span>

C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.

<span class="mw-page-title-main">C7orf50</span> Mammalian protein found in Homo sapiens

C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.

<span class="mw-page-title-main">C14orf119</span> Protein-coding gene in the species Homo sapiens

C14orf119 is a protein that in humans is encoded by the c14orf119 gene. The c14orf119 protein is predicted to be localized in the nucleus. Additionally, c14orf119 expression is decreased in individuals with systemic lupus erythematosus (SLE) when compared with healthy individual and is increased in individuals with various types of lymphomas when compared to healthy individuals.

<span class="mw-page-title-main">C22orf31</span> Protein-coding gene in the species Homo sapiens

C22orf31 is a protein which in humans is encoded by the C22orf31 gene. The C22orf31 mRNA transcript has an upstream in-frame stop codon, while the protein has a domain of unknown function (DUF4662) spanning the majority of the protein-coding region. The protein has orthologs with high percent similarity in mammals. The most distant orthologs are found in species of bony fish, but C22orf31 is not found in any species of birds or amphibians.

C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.

<span class="mw-page-title-main">CCDC190</span> Protein found in humans

Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.

<span class="mw-page-title-main">FAM166C</span>

Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.

<span class="mw-page-title-main">C12orf50</span> Protein-coding gene in humans

Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.

<span class="mw-page-title-main">GPATCH2L</span> It is Wikipedia article of unknown gene called "GPATCH2L".

GPATCH2L is a protein that is encoded by the GPATCH2L human gene located at 14q24.3. In humans, the length of mRNA in GPATCH2L (NM_017926) is 14,021 base pairs and the gene spans bases is 62,422 nt between chr14: 76,151,922 - 76,214,343. GPATCH2L is on the positive strand. IFT43 is the gene directly before GPATCH2L on the positive strand and LOC105370575 is the uncharacterized gene on the negative strand, which is approximately one and a half the size of GPATCH2L. Known aliases for GPATCH2L contain C14orf118, FLJ20689, FLJ10033, and KIAA1152. GPATCH2L produces 28 distinct introns, 17 different mRNAs, 14 alternatively spliced variants, and 3 unspliced forms. It has 5 probable alternative promoters, 7 validated polyadenylation sites, and 6 predicted promoters of varying lengths.

<span class="mw-page-title-main">KIAA1143</span> Research of newly discovered gene KIAA1143 about its function and biological properties/significance

KIAA1143 is an uncharacterized protein in humans that is encoded by the KIAA1143 gene. it may play a role in cell growth mechanisms and regulation/creation of cytoskeletal structure. This gene is located on chromosome 3 on the minus strand

<span class="mw-page-title-main">C13orf46</span> C13of46 Gene and Protein

Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.

References

  1. "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  2. 1 2 3 "Uncharacterized protein LOC101928193 isoform X1". Uncharacterized protein LOC101928193 isoform X1. NCBI BLAST.
  3. 1 2 3 "LOC101928193 Gene". NCBI.
  4. 1 2 3 4 5 6 7 "uncharacterized LOC101928193 [Homo sapiens (human)]".
  5. 1 2 3 4 Fagerberg L, Hallström BM, Oksvold P, Kampf C, Djureinovic D, Odeberg J, Habuka M, Tahmasebpoor S, Danielsson A, Edlund K, Asplund A, Sjöstedt E, Lundberg E, Szigyarto CA, Skogs M, Takanen JO, Berling H, Tegel H, Mulder J, Nilsson P, Schwenk JM, Lindskog C, Danielsson F, Mardinoglu A, Sivertsson A, von Feilitzen K, Forsberg M, Zwahlen M, Olsson I, Navani S, Huss M, Nielsen J, Ponten F, Uhlén M (February 2014). "Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics". Molecular & Cellular Proteomics. 13 (2): 397–406. doi: 10.1074/mcp.M113.035600 . PMC   3916642 . PMID   24309898.
  6. "GBGT1 Gene (Protein Coding)". GeneCards.
  7. Paterson AD, Lopes-Virella MF, Waggott D, Boright AP, Hosseini SM, Carter RE, Shen E, Mirea L, Bharaj B, Sun L, Bull SB (November 2009). "Genome-wide association identifies the ABO blood group as a major locus associated with serum levels of soluble E-selectin". Arteriosclerosis, Thrombosis, and Vascular Biology. 29 (11): 1958–67. doi:10.1161/ATVBAHA.109.192971. PMC   3147250 . PMID   19729612.
  8. 1 2 3 "Uncharacterized protein LOC101928193 isoform X1". NCBI.
  9. "uncharacterized protein LOC101928193 isoform X2 [Homo sapiens]". NCBI.
  10. "uncharacterized protein LOC101928193 isoform X3 [Homo sapiens]". NCBI.
  11. 1 2 "ExPASy". ExPASy. SIB.
  12. 1 2 3 4 "SAPS". EMBL-EBI. 2019.
  13. 1 2 "GOR IV Secondary Structure Prediction Method". Prabi NPS. 2016.
  14. Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJ (June 2015). "The Phyre2 web portal for protein modeling, prediction and analysis". Nature Protocols. 10 (6): 845–58. doi:10.1038/nprot.2015.053. PMC   5298202 . PMID   25950237.
  15. 1 2 "MyHits Motif Scan". SIB MyHits.
  16. "Prediction of Coiled Coil Regions in Proteins". Prediction of Coiled Coil Regions in Proteins. ExPASy COILS. Archived from the original on 2019-07-12. Retrieved 2019-05-07.
  17. "YinOYang 1.2 Server". YinOYang 1.2 Server. Technical University of Denmark.
  18. Hart G (2009). The O-GlcNAc Modification. New York, NY: Cold Spring Harbor.
  19. 1 2 "NetPhos DTU Bioinformatics". NetPhos.
  20. 1 2 "PSORT II". Expasy.
  21. 1 2 3 "Gene2Promoter". Genomatix.
  22. 1 2 3 4 "MFold". MFold. The RNA Institute.
  23. "Gene2Promoter". Genomatix. 2019. Archived from the original on 2001-02-24. Retrieved 2019-04-22.
  24. Svoboda P, Di Cara A (April 2006). "Hairpin RNA: a secondary structure of primary importance" (PDF). Cellular and Molecular Life Sciences. 63 (7–8): 901–8. doi:10.1007/s00018-005-5558-5. PMID   16568238. S2CID   14403230.
  25. "Multiple Sequence Alignment". Multiple Sequence Alignment. ClustalW.
  26. 1 2 3 "TimeTree of Life". TimeTree.
  27. "WebLogo Database". WebLogo Database. University of California – Berkeley.
  28. Hinchliff CE, Smith SA, Allman JF, Burleigh JG, Chaudhary R, Coghill LM, Crandall KA, Deng J, Drew BT, Gazis R, Gude K, Hibbett DS, Katz LA, Laughinghouse HD, McTavish EJ, Midford PE, Owen CL, Ree RH, Rees JA, Soltis DE, Williams T, Cranston KA (October 2015). "Synthesis of phylogeny and taxonomy into a comprehensive tree of life". Proceedings of the National Academy of Sciences of the United States of America. 112 (41). TimeTree: 12764–9. Bibcode:2015PNAS..11212764H. doi: 10.1073/pnas.1423041112 . PMC   4611642 . PMID   26385966.
  29. Yang J, Zhang Y (July 2015). "I-TASSER server: new development for protein structure and function predictions". Nucleic Acids Research. 43 (W1): W174-81. doi:10.1093/nar/gkv342. PMC   4489253 . PMID   25883148.
  30. "Keratin-associated protein 10-4 [Verticillium alfalfae VaMs.102]". Keratin-associated protein 10-4 [Verticillium alfalfae VaMs.102]. NCBI.
  31. "UniProtKB – P10220 (LTP_HHV11)". UniProtKB – P10220 (LTP_HHV11). UniProt.

Suggested Reading