Epistasis and functional genomics

Last updated

Epistasis refers to genetic interactions in which the mutation of one gene masks the phenotypic effects of a mutation at another locus. [1] Systematic analysis of these epistatic interactions can provide insight into the structure and function of genetic pathways. Examining the phenotypes resulting from pairs of mutations helps in understanding how the function of these genes intersects. Genetic interactions are generally classified as either Positive/Alleviating or Negative/Aggravating. Fitness epistasis (an interaction between non-allelic genes) is positive (in other words, diminishing, antagonistic or buffering) when a loss of function mutation of two given genes results in exceeding the fitness predicted from individual effects of deleterious mutations, and it is negative (that is, reinforcing, synergistic or aggravating) when it decreases fitness. [2] Ryszard Korona and Lukas Jasnos showed that the epistatic effect is usually positive in Saccharomyces cerevisiae. Usually, even in case of positive interactions double mutant has smaller fitness than single mutants. [2] The positive interactions occur often when both genes lie within the same pathway [3] Conversely, negative interactions are characterized by an even stronger defect than would be expected in the case of two single mutations, and in the most extreme cases (synthetic sick/lethal) the double mutation is lethal. This aggravated phenotype arises when genes in compensatory pathways are both knocked out.

Contents

High-throughput methods of analyzing these types of interactions have been useful in expanding our knowledge of genetic interactions. Synthetic genetic arrays (SGA), diploid based synthetic lethality analysis on microarrays (dSLAM), and epistatic miniarray profiles (E-MAP) are three important methods which have been developed for the systematic analysis and mapping of genetic interactions. This systematic approach to studying epistasis on a genome wide scale has significant implications for functional genomics. By identifying the negative and positive interactions between an unknown gene and a set genes within a known pathway, these methods can elucidate the function of previously uncharacterized genes within the context of a metabolic or developmental pathway.

Inferring function: alleviating and aggravating mutations

In order to understand how information about epistatic interactions relates to gene pathways, consider a simple example of vulval cell differentiation in C. elegans. Cells differentiate from Pn cells to Pn.p cells to VP cells to vulval cells. Mutation of lin-26 [4] blocks differentiation of Pn cells to Pn.p cells. Mutants of lin-36 [5] behave similarly, blocking differentiation at the transition to VP cells. In both cases, the resulting phenotype is marked by an absence of vulval cells as there is an upstream block in the differentiation pathway. A double mutant in which both of these genes have been disrupted exhibits an equivalent phenotype that is no worse than either single mutant. The upstream disruption at lin-26 masks the phenotypic effect of a mutation at lin-36 [1] in a classic example of an alleviating epistatic interaction.

Aggravating mutations on the other hand give rise to a phenotype which is worse than the cumulative effect of each single mutation. This aggravated phenotype is indicative of two genes in compensatory pathways. In the case of the single mutant a parallel pathway is able to compensate for the loss of the disrupted pathway however, in the case of the double mutant the action of this compensatory pathway is lost as well, resulting in the more dramatic phenotype observed. This relationship has been significantly easier to detect than the more subtle alleviating phenotypes and has been extensively studied in S. cerevisiae through synthetic sick/lethal (SSL) screens which identify double mutants with significantly decreased growth rates.

It should be pointed out that these conclusions from double-mutant analysis, while they apply to many pathways and mutants, are not universal. For example, genes can act in opposite directions in pathways, so that knocking out both produces a near-normal phenotype, while each single mutant is severely affected (in opposite directions). A well-studied example occurs during early development in Drosophila, wherein gene products from the hunchback and nanos genes are present in the egg, and act in opposite directions to direct anterior-posterior pattern formation. Something similar often happens in signal transduction pathways, where knocking out a negative regulator of the pathway causes a hyper-activation phenotype, while knocking out a positively acting component produces an opposite phenotype. In linear pathways with a single "output", when knockout mutations in two oppositely-acting genes are combined in the same individual, the phenotype of the double mutant is typically the same as the phenotype of the single mutant whose normal gene product acts downstream in the pathway.

Methods of detecting SSL mutants

SGA and dSLAM

Synthetic genetic arrays (SGA) and diploid based synthetic lethality analysis of microarrays (dSLAM) are two key methods which have been used to identify synthetic sick lethal mutants and characterize negative epistatic relationships. Sequencing of the entire yeast genome has made it possible to generate a library of knock-out mutants for nearly every gene in the genome. These molecularly bar-coded mutants greatly facilitate high-throughput epistasis studies, as they can be pooled and used to generate the necessary double mutants. Both SGA and dSLAM approaches rely on these yeast knockout strains which are transformed/mated to generate haploid double mutants. Microarray profiling is then used to compare the fitness of these single and double mutants. In the case of SGA, the double mutants examined are haploid and collected after mating with a mutant strain followed by several rounds of selection. dSLAM strains of both single and double mutants originate from the same diploid heterozygote strain (indicated by “diploid” of “dSLAM”). In the case of dSLAM analysis the fitness of single and double mutants is assessed by microarray analysis of a growth competition assay.

Epistatic miniarray profiles (E-MAPs)

In order to develop a richer understanding of genetic interactions, experimental approaches are shifting away from this binary classification of phenotypes as wild type or synthetic lethal. The E-MAP approach is particularly compelling because of its ability to highlight both alleviating and aggravating effects and this capacity is what distinguishes this method from others such as SGA and dSLAM. Furthermore, not only does the E-MAP identify both types of interactions but also recognizes gradations in these interactions and the severity of the masked phenotype, represented by the interaction score applied to each pair of genes.

E-MAPs exploit an SGA approach in order to analyze genetic interactions in a high-throughput manner. While the method has been particularly developed for examining epistasis in S. cerevisiae, it could be applied to other model organisms as well. An E-MAP collates data generated from the systematic generation of double mutant strains for a large clearly defined group of genes. Each phenotypic response is quantified by imaging colony size to determine growth rate. This fitness score is compared to the predicted fitness for each single mutant, resulting in a genetic interaction score. Hierarchical clustering of this data to group genes with similar interaction profiles allows for the identification of epistatic relationships between genes with and without known function. By sorting the data in this way, genes known to interact will cluster together alongside genes which exhibit a similar pattern of interactions but whose function has not yet been identified. The E-MAP data is therefore able to place genes into new functions within well characterized pathways. Consider for example E-MAP presented by Collins et al. which clusters the transcriptional elongation factor Dst1 [6] alongside components of the mid region of the Mediator complex, which is involved in transcriptional regulation. [7] This suggests a new role for Dst1, functioning in concert with Mediator.

The choice of genes examined within a given E-MAP is critical to achieving fruitful results. It is particularly important that a significant subset of the genes examined have been well established in the literature. These genes are thus able to act as controls for the E-MAP allowing for greater certainty in analyzing the data from uncharacterized genes. Clusters organized by sub-cellular localization and general cellular processes (e.g. cell cycle) have yielded profitable results in S. cerevisiae. Data from protein-protein interaction studies can also provide a useful basis for selecting gene groups for E-MAP data. We would expect genes which exhibit physical interactions to also demonstrate interactions at the genetic level and thus these can serve as adequate controls for E-MAP data. Collins et al. (2007) carried out a comparison of E-MAP scores and physical interaction data from large-scale affinity purification methods (AP-MS) and their data demonstrate that an E-MAP approach identifies protein-protein interactions with a specificity equal to that of traditional methods such as AP-MS .

High throughput methods of examining epistatic relationships face difficulties, however as the number of possible gene pairs is extremely large (~20 million in S. cerevisiae) and the estimated density of genetic interactions is quite low. [8] These difficulties can be countered by examining all possible interactions in a single cluster of genes rather than examining pairs across the whole genome. If well chosen, these functional clusters contain a significantly higher density of genetic interactions than other regions of the genome and thus allows for a higher rate of detection while dramatically decreasing the number of gene pairs to be examined. [8]

Generation of mutant strains: DAmP

Generating data for the E-MAP depends upon the creation of thousands of double mutant strains; a study of 483 alleles, for example, resulted in an E-MAP with ~100,000 distinct double mutant pairs. The generation of libraries of essential gene mutants presents significant difficulties however, as these mutations have a lethal phenotype. Thus, E-MAP studies rely upon strains with intermediate expression levels of these genes. The decreased abundance by messenger RNA perturbation (DAmP) strategy is particularly common for the high-throughput generation of mutants necessary for this kind of analysis and allows for the partial disruption of essential genes without loss of viability. [9] DAmP relies upon the destabilization of mRNA transcripts by integrating an antibiotic selectable marker into the 3’UTR, downstream of the stop codon (figure 2). mRNA’s with 3’ extended transcripts are rapidly targeted for degradation and the result is a downregulation of the gene of interest while it remains under the control of its native promoter. In the case of non-essential genes, deletion strains may be used. Tagging at the deletion sites with molecular barcodes, unique 20-bp sequences, allows for the identification and study of relative fitness levels in each mutant strain.

Related Research Articles

<span class="mw-page-title-main">Mutation</span> Alteration in the nucleotide sequence of a genome

In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.

<span class="mw-page-title-main">Dominance (genetics)</span> One gene variant masking the effect of another in the other copy of the gene

In genetics, dominance is the phenomenon of one variant (allele) of a gene on a chromosome masking or overriding the effect of a different variant of the same gene on the other copy of the chromosome. The first variant is termed dominant and the second recessive. This state of having two different variants of the same gene on each chromosome is originally caused by a mutation in one of the genes, either new or inherited. The terms autosomal dominant or autosomal recessive are used to describe gene variants on non-sex chromosomes (autosomes) and their associated traits, while those on sex chromosomes (allosomes) are termed X-linked dominant, X-linked recessive or Y-linked; these have an inheritance and presentation pattern that depends on the sex of both the parent and the child. Since there is only one copy of the Y chromosome, Y-linked traits cannot be dominant or recessive. Additionally, there are other forms of dominance such as incomplete dominance, in which a gene variant has a partial effect compared to when it is present on both chromosomes, and co-dominance, in which different variants on each chromosome both show their associated traits.

<i>Saccharomyces cerevisiae</i> Species of yeast

Saccharomyces cerevisiae is a species of yeast. The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have been originally isolated from the skin of grapes. It is one of the most intensively studied eukaryotic model organisms in molecular and cell biology, much like Escherichia coli as the model bacterium. It is the microorganism behind the most common type of fermentation. S. cerevisiae cells are round to ovoid, 5–10 μm in diameter. It reproduces by budding.

<span class="mw-page-title-main">Molecular genetics</span> Scientific study of genes at the molecular level

Molecular genetics is a sub-field of biology that addresses how differences in the structures or expression of DNA molecules manifests as variation among organisms. Molecular genetics often applies an "investigative approach" to determine the structure and/or function of genes in an organism's genome using genetic screens. The field of study is based on the merging of several sub-fields in biology: classical Mendelian inheritance, cellular biology, molecular biology, biochemistry, and biotechnology. Researchers search for mutations in a gene or induce mutations in a gene to link a gene sequence to a specific phenotype. Molecular genetics is a powerful methodology for linking mutations to genetic conditions that may aid the search for treatments/cures for various genetics diseases.

A genetic screen or mutagenesis screen is an experimental technique used to identify and select individuals who possess a phenotype of interest in a mutagenized population. Hence a genetic screen is a type of phenotypic screen. Genetic screens can provide important information on gene function as well as the molecular events that underlie a biological process or pathway. While genome projects have identified an extensive inventory of genes in many different organisms, genetic screens can provide valuable insight as to how those genes function.

<span class="mw-page-title-main">Protein complex</span> Type of stable macromolecular complex

A protein complex or multiprotein complex is a group of two or more associated polypeptide chains. Protein complexes are distinct from multienzyme complexes, in which multiple catalytic domains are found in a single polypeptide chain.

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "gene-by-gene" approach.

Genetics, a discipline of biology, is the science of heredity and variation in living organisms.

In genetics, complementation occurs when two strains of an organism with different homozygous recessive mutations that produce the same mutant phenotype have offspring that express the wild-type phenotype when mated or crossed. Complementation will ordinarily occur if the mutations are in different genes. Complementation may also occur if the two mutations are at different sites within the same gene, but this effect is usually weaker than that of intergenic complementation. In the case where the mutations are in different genes, each strain's genome supplies the wild-type allele to "complement" the mutated allele of the other strain's genome. Since the mutations are recessive, the offspring will display the wild-type phenotype. A complementation test can be used to test whether the mutations in two strains are in different genes. Complementation is usually weaker or absent if the mutations are in the same gene. The convenience and essence of this test is that the mutations that produce a phenotype can be assigned to different genes without the exact knowledge of what the gene product is doing on a molecular level. The complementation test was developed by American geneticist Edward B. Lewis.

A suppressor mutation is a second mutation that alleviates or reverts the phenotypic effects of an already existing mutation in a process defined synthetic rescue. Genetic suppression therefore restores the phenotype seen prior to the original background mutation. Suppressor mutations are useful for identifying new genetic sites which affect a biological process of interest. They also provide evidence between functionally interacting molecules and intersecting biological pathways.

Temperature-sensitive mutants are variants of genes that allow normal function of the organism at low temperatures, but altered function at higher temperatures. Cold sensitive mutants are variants of genes that allow normal function of the organism at higher temperatures, but altered function at low temperatures.

<span class="mw-page-title-main">Synthetic genetic array</span>

Synthetic genetic array analysis (SGA) is a high-throughput technique for exploring synthetic lethal and synthetic sick genetic interactions (SSL). SGA allows for the systematic construction of double mutants using a combination of recombinant genetic techniques, mating and selection steps. Using SGA methodology a query gene deletion mutant can be crossed to an entire genome deletion set to identify any SSL interactions, yielding functional information of the query gene and the genes it interacts with. A large-scale application of SGA in which ~130 query genes were crossed to the set of ~5000 viable deletion mutants in yeast revealed a genetic network containing ~1000 genes and ~4000 SSL interactions. The results of this study showed that genes with similar function tend to interact with one another and genes with similar patterns of genetic interactions often encode products that tend to work in the same pathway or complex. Synthetic Genetic Array analysis was initially developed using the model organism S. cerevisiae. This method has since been extended to cover 30% of the S. cerevisiae genome. Methodology has since been developed to allow SGA analysis in S.pombe and E. coli.

Synthetic lethality is defined as a type of genetic interaction where the combination of two genetic events results in cell death or death of an organism. Although the foregoing explanation is wider than this, it is common when referring to synthetic lethality to mean the situation arising by virtue of a combination of deficiencies of two or more genes leading to cell death, whereas a deficiency of only one of these genes does not. In a synthetic lethal genetic screen, it is necessary to begin with a mutation that does not result in cell death, although the effect of that mutation could result in a differing phenotype, and then systematically test other mutations at additional loci to determine which, in combination with the first mutation, causes cell death arising by way of deficiency or abolition of expression.

Synthetic rescue refers to a genetic interaction in which a cell that is nonviable, sensitive to a specific drug, or otherwise impaired due to the presence of a genetic mutation becomes viable when the original mutation is combined with a second mutation in a different gene. The second mutation can either be a loss-of-function mutation or a gain-of-function mutation.

<span class="mw-page-title-main">Reverse genetics</span> Method in molecular genetics

Reverse genetics is a method in molecular genetics that is used to help understand the function(s) of a gene by analysing the phenotypic effects caused by genetically engineering specific nucleic acid sequences within the gene. The process proceeds in the opposite direction to forward genetic screens of classical genetics. While forward genetics seeks to find the genetic basis of a phenotype or trait, reverse genetics seeks to find what phenotypes are controlled by particular genetic sequences.

Essential genes are indispensable genes for organisms to grow and reproduce offspring under certain environment. However, being essential is highly dependent on the circumstances in which an organism lives. For instance, a gene required to digest starch is only essential if starch is the only source of energy. Recently, systematic attempts have been made to identify those genes that are absolutely required to maintain life, provided that all nutrients are available. Such experiments have led to the conclusion that the absolutely required number of genes for bacteria is on the order of about 250–300. Essential genes of single-celled organisms encode proteins for three basic functions including genetic information processing, cell envelopes and energy production. Those gene functions are used to maintain a central metabolism, replicate DNA, translate genes into proteins, maintain a basic cellular structure, and mediate transport processes into and out of the cell. Compared with single-celled organisms, multicellular organisms have more essential genes related to communication and development. Most of the essential genes in viruses are related to the processing and maintenance of genetic information. In contrast to most single-celled organisms, viruses lack many essential genes for metabolism, which forces them to hijack the host's metabolism. Most genes are not essential but convey selective advantages and increased fitness. Hence, the vast majority of genes are not essential and many can be deleted without consequences, at least under most circumstances.

Transposon insertion sequencing (Tn-seq) combines transposon insertional mutagenesis with massively parallel sequencing (MPS) of the transposon insertion sites to identify genes contributing to a function of interest in bacteria. The method was originally established by concurrent work in four laboratories under the acronyms HITS, INSeq, TraDIS, and Tn-Seq. Numerous variations have been subsequently developed and applied to diverse biological systems. Collectively, the methods are often termed Tn-Seq as they all involve monitoring the fitness of transposon insertion mutants via DNA sequencing approaches.

<span class="mw-page-title-main">Epistasis</span> Dependence of a gene mutations phenotype on mutations in other genes

Epistasis is a phenomenon in genetics in which the effect of a gene mutation is dependent on the presence or absence of mutations in one or more other genes, respectively termed modifier genes. In other words, the effect of the mutation is dependent on the genetic background in which it appears. Epistatic mutations therefore have different effects on their own than when they occur together. Originally, the term epistasis specifically meant that the effect of a gene variant is masked by that of a different gene.

BCK2, also named CTR7, is an early cell cycle regulator expressed by the yeast Saccharomyces cerevisiae. It was first discovered in a screen for genes whose overexpression would suppress the phenotypes of PKC1 pathway mutations. Though its mechanism is currently unknown, it is believed to interact with Swi4 and Mcm1, both important transcriptional regulators of early cell cycle.

Genetic interaction networks represent the functional interactions between pairs of genes in an organism and are useful for understanding the relation between genotype and phenotype. The majority of genes do not code for particular phenotypes. Instead, phenotypes often result from the interaction between several genes. In humans, "Each individual carries ~4 million genetic variants and polymorphisms, the overwhelming majority of which cannot be pinpointed as the single cause for a given phenotype. Instead, the effects of genetic variants may combine with one another both additively and synergistically, and each variant's contribution to a quantitative trait or disease risk could depend on the genotypes of dozens of other variants. Interactions between genetic variants, along with the environmental conditions, are likely to play a major role in determining the phenotype that arises from a given genotype." Genetic interaction networks help to understand genetic interactions by identifying such interactions between pairs of genes.

References

  1. 1 2 Roth, F.; Lipshitz, H. & Andrews, B. (2009). "Q&A: Epistasis". J. Biol. 8 (4): 35. doi:10.1186/jbiol144. PMC   2688915 . PMID   19486505.
  2. 1 2 Jasnos L, Korona R (Apr 2007). "Epistatic buffering of fitness loss in yeast double deletion strains". Nature Genetics. 39 (4): 550–554. doi:10.1038/ng1986. PMID   17322879. S2CID   19392818.
  3. Fiedler, D.; et al. (2009). "Functional Organization of the S. cerevisiae Phosphorylation Network". Cell. 136 (5): 952–963. doi:10.1016/j.cell.2008.12.039. PMC   2856666 . PMID   19269370.
  4. "Lin-26 Transcription factor lin-26 [Caenorhabditis elegans] - Gene - NCBI".
  5. "Lin-36 Protein lin-36 [Caenorhabditis elegans] - Gene - NCBI".
  6. "DST1 Dst1p [Saccharomyces cerevisiae S288C] - Gene - NCBI".
  7. Collins; et al. (2007). "Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map". Nature. 446 (12): 806–810. Bibcode:2007Natur.446..806C. doi:10.1038/nature05649. PMID   17314980. S2CID   4419709.
  8. 1 2 Schuldiner; et al. (2005). "Exploration of the Function and Organization of the Yeast Early Secretory Pathway through an Epistatic Miniarray Profile". Cell. 123 (3): 507–519. doi: 10.1016/j.cell.2005.08.031 . PMID   16269340.
  9. Schuldiner; et al. (2006). "Quantitative genetic analysis in Saccharomyces cerevisiae using epistatic miniarray profiles (E-MAPs) and its application to chromatin functions". Methods. 40 (4): 344–352. doi:10.1016/j.ymeth.2006.07.034. PMID   17101447. S2CID   34923823.