Forkhead-associated domain containing protein 1 (FHAD1) is a protein encoded by the FHAD1 gene.
As the name suggests, it has a forkhead-associated domain and an extensive coiled coil structure. It is predicted to have a function related to DNA transcription. It is localized to the nucleus and has a nuclear localization signal.
In humans, the FHAD1 gene is located on chromosome 1 (1p36.21) and the genomic sequence is on the plus strand starting from 15236559 bp and ending at 15400283 bp. [1] There are 3 main genes around FHAD1, out of which 2 encode proteins with known functions. Two genes, EFHD2 and Chymotrypsin-C (CTRC) lie downstream of FHAD1 on the plus strand. [1] TMEM51 lies upstream of FHAD1. [1]
FHAD1 is 163,682 bases long and contains 43 exons.
FHAD1 has 4 aliases, Forkhead associated phosphopeptide binding domain 1, Forkhead-associated (FHA) phosphopeptide binding domain 1, FHA Domain-Containing Protein 1, and KIAA1937. [2]
The mRNA transcript of FHAD1 5138 bp long. The gene has 30 isoforms based on NCBI gene data.
The FHAD 1 protein is 1412 aa long, weighs 16.2 kDa and has an isoelectric point of 6.52. [3] It has 3 isoforms, namely 1, 3 and 4, but only isoform 1 is supported by experimental evidence. It consists of 1 glutamic acid rich region and 1 proline rich region.
The FHA domain extends from 18 - 84 aa in the protein. It can recognize and bind to phosphorylation sites, specifically pSer, pThr and pTyr. The exact mechanism and function of this domain still being studied, but it is found in proteins performing many different functions, mainly DNA repair and transduction. [4]
FHAD1 contains one Smc (Structural maintenance of chromosomes) region from 275 - 1401 aa. This region encodes Smc proteins that are involved in cell cycle control, cell division and chromosome separation. [5]
This region extends from 394 - 494 aa in FHAD1. The proteins encoded by the TMPIT proteins are predicted to be transmembrane proteins. [6] However, there is lack of literature to support this.
This domain extends from 694 - 777 aa in FHAD1. It encodes a protein from a family of bacterial proteins with no known function. [7]
FHAD1 contains the forkhead-associated domain that consists of beta sheets. Based on structure prediction software, the rest of the protein consists of alpha helices and random coils. Overall, FHAD1 has a coiled coil structure as shown in the figure.
FHAD1 is predicted to undergo multiple different types of post-translational modifications based on prediction software.
FHAD1 has been predicted to be a nuclear protein with 94.1% reliability. It also contains possible nuclear localization signal sequences between 1100 - 1107 aa. Two pat4 and one pat7 sequences were predicted. Pat4 and pat7 are consensus sequences consisting of clusters of lysine or arginine residues.
In humans, FHAD1 is expressed in testis, fallopian tube and uterine tissues in females, nasopharynx and bronchi of lungs based on studies found on the Human Protein Atlas. [9] NCBI's EST Profile also showed that FHAD1 is highly expressed in the testis, with some expression in the trachea and esophagus. In mice, the gene was also expressed in the testis, along with the pituitary gland, lung and brain.
FHAD1 has a promoter that extends from 15246234 – 15247380 bp and is 1147 bp long. It includes an initial part of the 5' UTR of FHAD1. Some transcription factors predicted to bind to this promoter are:
In the 5' UTR and 3' UTR of FHAD1, multiple stem loops are predicted to form .
FHAD1 can be involved in transcriptional regulation through interaction with other transcriptional regulators.
FHAD1 was found to be a binding partner for GTF2IRD1 (GTF2I repeat domain containing protein 1) via a yeast 2 hybrid screen. [15] GTF2I is a gene that encodes the general transcription factor II-1. This specific study showed that GTF2IRD1 is a nuclear protein that is involved transcriptional regulation through chromatin modification. The fact that it exists in the nucleus and was found in neuronal cells correlates with the localization and functional data for FHAD1. Additionally, FHAD1 and GTF2IRD1 interacted through RD2 (repeat domain 2) of GTF2IRD1. RD2 has shown some level of DNA binding activity.
FHAD1 was found to interact (colocalization) with 14-3-3 protein epsilon via cosedimentation. This protein binds to a number of binding partners, mostly by recognizing phosphothreonine or phosphoserine motifs. [16]
FHAD1 showed differential expression in patients diagnosed with endometriosis and obesity. [17]
FHAD1 has no known paralogs. It has orthologs in the organisms in the following classes: Mammalia, Reptilia, Aves, Sarcopterygii, Actinopterygii, Gastropoda and Lingulata. There was significant conservation in the FHA domain in all the organisms in the table below.
The rate of evolution of FHAD1 was compared with that of fibrinogen and cytochrome c and it showed that FHAD1 is a rapidly evolving gene.
Common Name | Time of divergence (mya) | Sequence identity |
---|---|---|
Black fruit bat | 96 | 78% |
Goat | 96 | 75% |
Killer whale | 96 | 75% |
Giant Panda | 96 | 75% |
Sea otter | 96 | 72% |
Guinea Pig | 90 | 71% |
Great roundleaf bat | 96 | 71% |
European hedgehog | 96 | 67% |
Mongolian gerbil | 90 | 63% |
Green sea turtle | 312 | 43% |
Chinese soft-shell turtle | 312 | 40% |
Emperor penguin | 312 | 39% |
American alligator | 312 | 38% |
Brown mesite | 312 | 38% |
Pigeon | 312 | 36% |
West Indian Ocean coelacanth | 413 | 33% |
Atlantic salmon | 435 | 29% |
Red bellied piranha | 435 | 29% |
California sea hare/slug | 797 | 28% |
C8orf48 is a protein that in humans is encoded by the C8orf48 gene. C8orf48 is a nuclear protein specifically predicted to be located in the nuclear lamina. C8orf48 has been found to interact with proteins that are involved in the regulation of various cellular responses like gene expression, protein secretion, cell proliferation, and inflammatory responses. This protein has been linked to breast cancer and papillary thyroid carcinoma.
The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2, spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood. There are two known isoforms of CCDC142. CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus. Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria. Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Glutamate rich protein 5 is a protein in humans encoded by the ERICH5 gene, also known as chromosome 8 open reading frame 47 (C8orf47).
Uncharacterized protein C12orf60 is a protein that in humans is encoded by the C12orf60 gene. The gene is also known as LOC144608 or MGC47869. The protein lacks transmembrane domains and helices, but it is rich in alpha-helices. It is predicted to localize in the nucleus.
BEND2 is a protein that in humans is encoded by the BEND2 gene. It is also found in other vertebrates, including mammals, birds, and reptiles. The expression of BEND2 in Homo sapiens is regulated and occurs at high levels in the skeletal muscle tissue of the male testis and in the bone marrow. The presence of the BEN domains in the BEND2 protein indicates that this protein may be involved in chromatin modification and regulation.
Chromosome 19 open reading frame 18 (c19orf18) is a protein which in humans is encoded by the c19orf18 gene. The gene is exclusive to mammals and the protein is predicted to have a transmembrane domain and a coiled coil stretch. This protein has a function that is not yet fully understood by the scientific community.
LCHN is a protein that in humans is encoded by the KIAA1147 gene located on chromosome 7. It is likely part of the tripartite DENN domain family of proteins that often function as Rab-GEFs to regulate vesicular trafficking. Both the mRNA and protein have been shown to be upregulated following ischemic stroke, and to be produced at altered levels in patients with FTD-ALS, however the gene's contribution to these states is not well understood.
C17orf53 is a gene in humans that encodes a protein known as C17orf53, uncharacterized protein C17orf53. It has been shown to target the nucleus, with minor localization in the cytoplasm. Based on current findings C17orf53 is predicted to perform functions of transport, however further research into the protein could provide more specific evidence regarding its function.
Chromosome 21 Open Reading Frame 58 (C21orf58) is a protein that in humans is encoded by the C21orf58 gene.
WD repeat containing protein 53 (WDR53) is a protein encoded by the WDR53 gene that has been identified in the human genome by the Human Genome Project but has, at the moment, lacked experimental procedures to understand the function. It is located on chromosome 3 at location 3q29 in Homo sapiens. It has short up and down stream untranslated regions as well as WD40 repeat regions which have been linked to various functions.
C2orf81 is a human gene encoding protein c2orf81, which is predicted to have nuclear localization.
C11orf42 is an uncharacterized protein in Homo sapiens that is encoded by the C11orf42 gene. It is also known as chromosome 11 open reading frame 42 and uncharacterized protein C11orf42, with no other aliases. The gene is mostly conserved in mammals, but it has also been found in rodents, reptiles, fish and worms.
C20orf202 is a protein that in humans is encoded by the C20orf202 gene. In humans, this gene encodes for a nuclear protein that is primarily expressed in the lung and placenta.
ZNF337, also known as zinc finger protein 337, is a protein that in humans is encoded by the ZNF337 gene. The ZNF337 gene is located on human chromosome 20 (20p11.21). Its protein contains 751 amino acids, has a 4,237 base pair mRNA and contains 6 exons total. In addition, alternative splicing results in multiple transcript variants. The ZNF337 gene encodes a zinc finger domain containing protein, however, this gene/protein is not yet well understood by the scientific community. The function of this gene has been proposed to participate in a processes such as the regulation of transcription (DNA-dependent), and proteins are expected to have molecular functions such as DNA binding, metal ion binding, zinc ion binding, which would be further localized in various subcellular locations. While there are no commonly associated or known aliases, an important paralog of this gene is ZNF875
C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA (miRNA) protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons.
KRBA1 is a protein that in humans is encoded by the KRBA1 gene. It is located on the plus strand of chromosome 7 from 149,411,872 to 149,431,664. It is also commonly known under two other aliases: KIAA1862 and KRAB A Domain Containing 1 gene and encodes the KRBA1 protein in humans. The KRBA family of genes is understood to encode different transcriptional repressor proteins
C14orf119 is a protein that in humans is encoded by the c14orf119 gene. The c14orf119 protein is predicted to be localized in the nucleus. Additionally, c14orf119 expression is decreased in individuals with systemic lupus erythematosus (SLE) when compared with healthy individual and is increased in individuals with various types of lymphomas when compared to healthy individuals.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.