CFAP97D2 (Cilia and Flagella Associated Protein 97 Domain containing 2) is a protein that in humans is encoded by the CFAP97D2 gene.
Homo sapiens CFAP97D2 gene (XM_017020910.2) is a 68,395 base pair gene that encodes mRNA transcripts ranging from 952 nucleotides to 1841 nucleotides. [1] It is located on Chromosome 13 and found at locus 13q34 on the plus strand. CFAP97D2 is on Chromosome 13. This gene encodes 166 amino acids that make up Cilia and Flagella Associated Protein 97 containing Domain 2 (CFAP97D2) protein. The CFAP97D2 gene is also known as C17orf105 Homolog gene. [2]
There are 5 mRNA transcripts produced from the CFAP97D2 gene. These transcripts encode 5 different CFAP97D2 isoforms: X1, X2, X3, 1, and 2. Isoform X1 is 166 AA, the longest isoform of Homo sapiens CFAP97D2. Isoform X2 has 2 deletirious mRNA mutations at the end of exon 5 accounting for a single amino acid deletion. The final protein transcript is a 165 amino acids. Isoform X3 has numerous mRNA insertions in Exons 3, 4, and 5. The final protein length is 101 AA. Isoforms 1 and 2 have complete deleterious mRNA mutations of Exon 4 and encode final protein lengths of 99 and 98 AA respectively. [1] [3]
RNA sequencing revealed CFAP97D2 expression in the brain, lungs, pancreas, testis, fallopian tubes, and cervix. [4] [5] [6]
Homo sapiens CFAP97D2 Isoform X1 is a basic protein with a predicted PI of 10.4 and Mw 19.3 kaD. [7] There is nuclear leucine zipper motif (AA #37-52) and nuclear localization signal (AA #102-116). [8] [9]
By definition, the leucine zipper region of Homo sapiens CFAP97D2 (AA #37-52) is an alpha helix. [10] [11]
CFAP97D2 is characterized by coiled-coiled regions and 2 alpha helices [12]
The CFAP97 gene family contains 3 genes: CFAP97, CFAP97D1, and CFAP97D2 and is characterized by the KIAA1430 gene domain. CFAP97D1 has longer evolutionary conservation than CFAP97D2 with an estimated date of divergence 431 MYA. [13]
CFAP97D2 is a highly conserved protein found in primates, rodents, bats, even-toed ungulates, otarlidae, birds, reptiles, and bony fish. [13] Primate and rodent CFAP97D2 proteins are most recently related (% identity range: 74-100%). Bats, even-toed ungulates, otariidae, and birds are moderately related to Homo sapiens CFAP97D2 (% identity range: 55.9-73%) and reptile and body fish species are most distantly related (% identity range: 25.6-42%). The asiactic toad is an outlier as a reptile with 60.4% identity with Homo sapiens CFAP97D2.
CFAP97D1 is conserved in both invertebrate and vertebrate species. [14] The CFAP97D1 genes found in vertebrate species are paralogs of CFAP97D2. The invertebrate CFAP97D1 genes existing prior to 431 MYA are distant CFAP97D2 homologs due to these paralogs' shared evolutionary history.
CFAP97D1 has longer evolutionary conservation than CFAP97D2. [15] CFAP97D1's slow and consistent conservation is evidenced by its little divergence from its ortholog ancestors over 600 million years while CFAP97D2 has changed rapidly over 431 million years. Present only in vertebrate species, CFAP97D2's rapid evolution indicates that it is under different selective pressures than CFAP97D1 and therefore serves in a different functional capacity.
NBEAL1 is a protein that in humans is encoded by the NBEAL1 gene. It is found on chromosome 2q33.2 of Homo sapiens.
The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.
Retrotransposon Gag Like 6 is a protein encoded by the RTL6 gene in humans. RTL6 is a member of the Mart family of genes, which are related to Sushi-like retrotransposons and were derived from fish and amphibians. The RTL6 protein is localized to the nucleus and has a predicted leucine zipper motif that is known to bind nucleic acids in similar proteins, such as LDOC1.
Transmembrane and coiled-coil domains 4, TMCO4, is a protein in humans that is encoded by the TMCO4 gene. Currently, its function is not well defined. It is transmembrane protein that is predicted to cross the endoplasmic reticulum membrane three times. TMCO4 interacts with other proteins known to play a role in cancer development, hinting at a possible role in the disease of cancer.
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
C11orf42 is an uncharacterized protein in Homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.
ProteinFAM89A is a protein which in humans is encoded by the FAM89A gene. It is also known as chromosome 1 open reading frame 153 (C1orf153). Highest FAM89A gene expression is observed in the placenta and adipose tissue. Though its function is largely unknown, FAM89A is found to be differentially expressed in response to interleukin exposure, and it is implicated in immune responses pathways and various pathologies such as atherosclerosis and glioma cell expression.
Leucine rich single-pass membrane protein 2 is a single-pass membrane protein rich in leucine, that in humans is encoded by the LSMEM2 gene. The LSMEM2 protein is conserved in mammals, birds, and reptiles. In humans, LSMEM2 is found to be highly expressed in the heart, skeletal muscle and tongue.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
RING Finger Protein 227, also known as RNF227 and LINC02581, is a protein which in humans is encoded by the RNF227 gene. According to DNA microarray data, it is found in at least 15 tissues.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
FAM120AOS, or family with sequence similarity 120A opposite strand, codes for uncharacterized protein FAM120AOS, which currently has no known function. The gene ontology describes the gene to be protein binding. Overall, it appears that the thyroid and the placenta are the two tissues with the highest expression levels of FAM120AOS across a majority of datasets.
TBC1D30 is a gene in the human genome that encodes the protein of the same name. This protein has two domains, one of which is involved in the processing of the Rab protein. Much of the function of this gene is not yet known, but it is expressed mostly in the brain and adrenal cortex.
Zinc Finger Protein 548 (ZNF548) is a human protein encoded by the ZNF548 gene which is located on chromosome 19. It is found in the nucleus and is hypothesized to play a role in the regulation of transcription by RNA Polymerase II. It belongs to the Krüppel C2H2-type zinc-finger protein family as it contains many zinc-finger repeats.
Transmembrane epididymal protein 1 is a transmembrane protein encoded by the TEDDM1 gene. TEDDM1 is also commonly known as TMEM45C and encodes 273 amino acids that contains six alpha-helix transmembrane regions. The protein contains a 118 amino acid length family of unknown function. While the exact function of TEDDM1 is not understood, it is predicted to be an integral component of the plasma membrane.
TEKTIP1, also known as tektin-bundle interacting protein 1, is a protein that in humans is encoded by the TEKTIP1 gene.
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.
TMEM252 or transmembrane protein 252 is a protein that, in humans, is encoded by the TMEM252 gene.
Chromosome 13 Open Reading Frame 46 is a protein which in humans is encoded by the C13orf46 gene. In humans, C13orf46 is ubiquitously expressed at low levels in tissues, including the lungs, stomach, prostate, spleen, and thymus. This gene encodes eight alternatively spliced mRNA transcript, which produce five different protein isoforms.
Leucine-rich repeat-containing protein 74A (LRRC74A), is a protein encoded by the LRRC74A gene. The protein LRRC74A is localized in the cytoplasm. It has a calculated molecular weight of approximately 55 kDa. The LRRC74A protein is nominally expressed in the testis, salivary gland, and pancreas.