CFAP92 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | CFAP92 , cilia and flagella associated protein 92 (putative), KIAA1257, FAP92 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | HomoloGene: 131623 GeneCards: CFAP92 | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
KIAA1257 is a protein that in humans is encoded by the KIAA1257 gene. KIAA1257 has been shown to be involved with activation of genes involved in sex determination [3] . [4]
In humans the gene KIAA1257 is located on chromosome 3q21.3. It spans 122 kilobasepairs (kBp) and contains 22 exons. It is flanked by Ras-related protein Rab-43 and several pseudogenes and on the opposite strand Acyl CoA dehydrogenase family member 9 (ACAD9) and EF-hand and coiled-coil domain containing 1 (EFCC1).
The exons of KIAA1257 are alternatively spliced into 17 different isoforms (Table 1). Isoform X1 encodes the longest protein product and isoform X4 is the most common variant translated. Both the 5' and 3' UTR's are capable of forming stem loop structures that could serve as binding site for RNA-binding proteins. [5]
Isoform | Length (bp) |
---|---|
X1 | 8645 |
X2 | 8641 |
X3 | 8218 |
X4 | 8612 |
X5 | 8370 |
X6 | 8190 |
X7 | 3524 |
X8 | 3428 |
X9 | 7801 |
X10 | 7685 |
X11 | 7862 |
X12 | 7809 |
X13 | 13296 |
X14 | 13401 |
X15 | 7579 |
X16 | 7585 |
X17 | 2163 |
Table 1
The protein KIAA1257 exists most commonly as a translation of the mRNA isoform X4, which is only half the length of isoform X1's product even though they have similar mRNA lengths. Protein isoform X1 is 1179 amino acids long, has a molecular weight of 136.4 kilodaltons (kDa) and an isoelectric point (pI) of 8.1. [6] [7] KIAA1257 contains a domain of unknown function (DUF) 4550 in the first third of the protein sequence that has a high lysine content (15%). [6] Most of the protein exists in a random coil structure but the final thirds contains 6 predicted alpha helices. [8] KIAA1257 is predicted to be localized to the nucleus and contains several nuclear localization signals. [9] A summary of KIAA1257 orthologs is shown below.
Species | Identity [10] | Length [6] | MW [6] | pI [7] | Localization (confidence) [9] |
---|---|---|---|---|---|
Human | 100% | 1179 | 136.4 | 8.1 | Nucleus (73.9%) |
Chimp | 97% | 1147 | 131.7 | 8.5 | Nucleus (65.2%) |
Dog | 69% | 1163 | 133.6 | 8.9 | Nucleus (82.6%) |
Turkey | 39% | 1174 | 132.0 | 8.5 | Nucleus (65.2%) |
Spotted gar | 36% | 1320 | 148.2 | 7.7 | Nucleus (73.9%) |
Table 2
KIAA1257 is mainly expressed in the testes and ovaries of adult humans, however expression is low in these tissues. KIAA1257 is most highly expressed during the earliest stages of development. Expression is the highest in the 2 through 8 cell stages of embryonic development and begins to decline steadily after morula and then blastocyst formation. [11]
KIAA1257 has a promoter region upstream of the 5' UTR with several transcription factor binding sites including a Sox11 binding site. [12] Sox11 is involved in the regulation of many developmental genes.
KIAA1257 has been shown to activate expression of Nuclear receptor subfamily 5 group A member 1 (NR5A1). [3] NR5A1 is involved in sex determination and defects in the gene are related to XY sex reversal.
KIAA1257 is found in all vertebrates except for cartilaginous and jawless fishes. KIAA1257 orthologs in birds, fish, and reptiles have 30-40% identity with humans while mammals such as goats, cats, and dogs have 60-70% identity and primates have 85-99% identity. [13]
Species | Identity | Cover | Length |
---|---|---|---|
Human | 100% | 100% | 1179 |
Chimp | 97% | 99% | 1147 |
Dog | 69% | 92% | 1163 |
Prairie deer mouse | 67% | 93% | 1164 |
Goat | 61% | 75% | 931 |
Common shrew | 58% | 53% | 660 |
Brown spotted pit viper | 36% | 77% | 1080 |
Nile tilapia | 34% | 84% | 1050 |
Table 3
KIAA0895 is a protein that in Homo sapiens is encoded by the KIAA0895 gene. The gene encodes a protein commonly known as the KIAA0895 protein. It's aliases include hypothetical protein LOC23366, OTTHUMP00000206979, OTTHUMP00000206980, 9530077C05Rik, and 1110003N12Rik. It is located at 7p14.2.
KIAA1704, also known as LSR7, is a protein that in humans is encoded by the GPALPP1 gene. The function of KIAA1704 is not yet well understood. KIAA1704 contains one domain of unknown function, DUF3752. The protein contains a conserved, uncharged, repeated motif GPALPP(GF) near the N terminus and an unusual, conserved, mixed charge throughout. It is predicted to be localized to the nucleus.
Transmembrane protein 241 is a ubiquitous sugar transporter protein which in humans is encoded by the TMEM241 gene.
Family with sequence similarity 98, member A, or FAM98A, is a gene that in the human genome encodes the FAM98A protein. FAM98A has two paralogs in humans, FAM98B and FAM98C. All three are characterized by DUF2465, a conserved domain shown to bind to RNA. FAM98A is also characterized by a glycine-rich C-terminal domain. FAM98A also has homologs in vertebrates and invertebrates and has distant homologs in choanoflagellates and green algae.
The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.
BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.
FAM227A is a protein that in humans is encoded by FAM227A gene. Current studies have determined the location of this gene to be in the nuclear region of the cell. FAM227A is most highly expressed in the tissues of the fallopian tube, testis, and pituitary gland. FAM227A is present in species of mammals, birds and reptiles, and gene alignment sequences have shown that FAM227A is a rapidly evolving gene.
UPF0575 protein C19orf67 is a protein which in humans is encoded by the C19orf67 gene. Orthologs of C19orf67 are found in many mammals, some reptiles, and most jawed fish. The protein is expressed at low levels throughout the body with the exception of the testis and breast tissue. Where it is expressed, the protein is predicted to be localized in the nucleus to carry out a function. The highly conserved and slowly evolving DUFF3314 region is predicted to form numerous alpha helices and may be vital to the function of the protein.
The Family with sequence similarity 149 member B1 is an uncharacterized protein encoded by the human FAM149B1 gene, with one alias KIAA0974. The protein resides in the nucleus of the cell. The predicted secondary structure of the gene contains multiple alpha-helices, with a few beta-sheet structures. The gene is conserved in mammals, birds, reptiles, fish, and some invertebrates. The protein encoded by this gene contains a DUF3719 protein domain, which is conserved across its orthologues. The protein is expressed at slightly below average levels in most human tissue types, with high expression in brain, kidney, and testes tissues, while showing relatively low expression levels in pancreas tissues.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
Chromosome 16 open reading frame 46 is a protein of yet to be determined function in Homo sapiens. It is encoded by the C16orf46 gene with NCBI accession number of NM_001100873. It is a protein-coding gene with an overlapping locus.
Chromosome 9 open reading frame 43 is a protein that in humans is encoded by the C9orf43 gene. The gene is also known as MGC17358 and LOC257169. C9orf43 contains DUF 4647 and a polyglutamine repeat region although protein function is not well understood.
LOC101928193 is a protein which in humans is encoded by the LOC101928193 gene. There are no known aliases for this gene or protein. Similar copies of this gene, called orthologs, are known to exist in several different species across mammals, amphibians, fish, mollusks, cnidarians, fungi, and bacteria. The human LOC101928193 gene is located on the long (q) arm of chromosome 9 with a cytogenic location at 9q34.2. The molecular location of the gene is from base pair 133,189,767 to base pair 133,192,979 on chromosome 9 for an mRNA length of 3213 nucleotides. The gene and protein are not yet well understood by the scientific community, but there is data on its genetic makeup and expression. The LOC101928193 protein is targeted for the cytoplasm and has the highest level of expression in the thyroid, ovary, skin, and testes in humans.
WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.
Family with Sequence Similarity 155 Member B is a protein in humans that is encoded by the FAM155B gene. It belongs to a family of proteins whose function is not yet well understood by the scientific community. It is a transmembrane protein that is highly expressed in the heart, thyroid, and brain.
C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15). Isoform 1 of the gene is 19,713 base pairs long. C2orf74 has orthologs in 135 different species, including primarily placental mammals and some marsupials.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
Coiled-Coil Domain Containing 190, also known as C1orf110, the Chromosome 1 Open Reading Frame 110, MGC48998 and CCDC190, is found to be a protein coding gene widely expressed in vertebrates. RNA-seq gene expression profile shows that this gene selectively expressed in different organs of human body like lung brain and heart. The expression product of c1orf110 is often called Coiled-coil domain-containing protein 190 with a size of 302 aa. It may get the name because a coiled-coil domain is found from position 14 to 72. At least 6 spliced variants of its mRNA and 3 isoforms of this protein can be identified, which is caused by alternative splicing in human.
Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.