Disease gene identification

Last updated

Disease gene identification is a process by which scientists identify the mutant genotypes responsible for an inherited genetic disorder. Mutations in these genes can include single nucleotide substitutions, single nucleotide additions/deletions, deletion of the entire gene, and other genetic abnormalities.

Contents

Significance

Knowledge of which genes (when non-functional) cause which disorders will simplify diagnosis of patients and provide insights into the functional characteristics of the mutation. The advent of modern-day high-throughput sequencing technologies combined with insights provided from the growing field of genomics is resulting in more rapid disease gene identification, thus allowing scientists to identify more complex mutations.

Chromosomes on the left show possible disease gene locations (as identified by any of the below methods) for affected individuals. Red area in the 'composite chromosome' on the right signifies the overlap of these regions, and thus the most probable location of the disease gene Disease Gene Mapping with Multiple Chromosomes.jpg
Chromosomes on the left show possible disease gene locations (as identified by any of the below methods) for affected individuals. Red area in the 'composite chromosome' on the right signifies the overlap of these regions, and thus the most probable location of the disease gene

Generic gene identification procedure

Disease gene identification techniques often follow the same overall procedure. DNA is first collected from several patients who are believed to have the same genetic disease. Then, their DNA samples are analyzed and screened to determine probable regions where the mutation could potentially reside. These techniques are mentioned below. These probable regions are then lined-up with one another and the overlapping region should contain the mutant gene. If enough of the genome sequence is known, that region is searched for candidate genes. Coding regions of these genes are then sequenced until a mutation is discovered or another patient is discovered, in which case the analysis can be repeated, potentially narrowing down the region of interest.

The differences between most disease gene identification procedures are in the second step (where DNA samples are analyzed and screened to determine regions in which the mutation could reside).

Pre-genomics techniques

Without the aid of the whole-genome sequences, pre-genomics investigations looked at select regions of the genome, often with only minimal knowledge of the gene sequences they were looking at. Genetic techniques capable of providing this sort of information include Restriction Fragment Length Polymorphism (RFLP) analysis and microsatellite analysis.

Loss of heterozygosity (LOH)

Loss of heterozygosity (LOH) is a technique that can only be used to compare two samples from the same individual. LOH analysis is often used when identifying cancer-causing oncogenes in that one sample consists of (mutant) tumor DNA and the other (control) sample consists of genomic DNA from non-cancerous cells from the same individual. RFLPs and microsatellite markers provide patterns of DNA polymorphisms, which can be interpreted as residing in a heterozygous region or a homozygous region of the genome. Provided that all individuals are affected with the same disease resulting from a manifestation of a deletion of a single copy of the same gene, all individuals will contain one region where their control sample is heterozygous but the mutant sample is homozygous - this region will contain the disease gene. [1] [2]

Post-genomics techniques

With the advent of modern laboratory techniques such as High-throughput sequencing and software capable of genome-wide analysis, sequence acquisition has become increasingly less expensive and time-consuming, thus providing significant benefits to science in the form of more efficient disease gene identification techniques.

Identity by descent mapping

Identity by descent (IBD) mapping generally uses single nucleotide polymorphism (SNP) arrays to survey known polymorphic sites throughout the genome of affected individuals and their parents and/or siblings, both affected and unaffected. While these SNPs probably do not cause the disease, they provide valuable insight into the makeup of the genomes in question. A region of the genome is considered identical by descent if contiguous SNPs share the same genotype. When comparing an affected individual to his/her affected sibling, all identical regions are recorded (ex. Shaded in red in above figure). Given that an affected sibling and an unaffected sibling do not have the same disease phenotype, their DNA must by definition be different (barring the presence of a genetic or environmental modifier). Thus, the IBD mapping results can be further supplemented by removing any regions that are identical in both affected individuals and unaffected siblings. [3] This is then repeated for multiple families, thus generating a small, overlapping fragment, which theoretically contains the disease gene.

Homozygosity/autozygosity mapping

Homozygosity/Autozygosity mapping is a powerful technique, but is only valid when searching for a mutation segregating within a small, closed population. Such a small population, possibly created by the founder effect, will have a limited gene pool, and thus any inherited disease will probably be a result of two copies of the same mutation segregating on the same haplotype. Since affected individuals will probably be homozygous in the regions, looking at SNPs in a region is an adequate marker of regions of homozygosity and heterozygosity. Modern day SNP arrays are used to survey the genome and identify large regions of homozygosity. Homozygous blocks in the genomes of affected individuals can then be laid on top of each other, and the overlapping region should contain the disease gene. [4]

This analysis is often extended by analyzing autozygosity, an extension of homozygosity, in the genomes of affected individuals. [5] This can be accomplished by plotting a cumulative LOD score alongside the overlaid blocks of homozygosity. By taking into consideration the population allele frequencies for all SNPs via autozygosity mapping, the results of homozygosity can be confirmed. [5] Furthermore, if two suspicious regions appear as a result of homozygosity mapping, autozygosity mapping may be able to distinguish between the two (ex. If one block of homozygosity is a result of a very non-diverse region of the genome, the LOD score will be very low).

Tools for Homozygosity Mapping

  1. HomSI: a homozygous stretch identifier from next-generation sequencing data [6] A tool that identifies homozygous regions using deep sequence data.

Genome-wide knockdown studies

Genome-wide knockdown studies are an example of the reverse genetics made possible by the acquisition of whole genome sequences, and the advent of genomics and gene-silencing technologies, mainly siRNA and deletion mapping. Genome-wide knockdown studies involve systematic knockdown or deletion of genes or segments of the genome. [7] This is generally done in prokaryotes or in a tissue culture environment due to the massive number of knockdowns that must be performed. [8] After the systematic knockout is completed (and possibly confirmed by mRNA expression analysis), the phenotypic results of the knockdown/knockout can be observed. Observation parameters can be selected to target a highly specific phenotype. [8] The resulting dataset is then queried for samples which exhibit phenotypes matching the disease in question – the gene(s) knocked down/out in said samples can then be considered candidate disease genes for the individual in question.

Whole exome sequencing

Whole exome sequencing is a brute-force approach that involves using modern day sequencing technology and DNA sequence assembly tools to piece together all coding portions of the genome. The sequence is then compared to a reference genome and any differences are noted. After filtering out all known benign polymorphisms, synonymous changes, and intronic changes (that do not affect splice sites), only potentially pathogenic variants will be left. This technique can be combined with other techniques to further exclude potentially pathogenic variants should more than one be identified. [9]

See also

Related Research Articles

The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a specific gene depends on the number of copies of each chromosome found in that species, also referred to as ploidy. In diploid species like humans, two full sets of chromosomes are present, meaning each individual has two alleles for any given gene. If both alleles are the same, the genotype is referred to as homozygous. If the alleles are different, the genotype is referred to as heterozygous.

In molecular biology, restriction fragment length polymorphism (RFLP) is a technique that exploits variations in homologous DNA sequences, known as polymorphisms, populations, or species or to pinpoint the locations of genes within a sequence. The term may refer to a polymorphism itself, as detected through the differing locations of restriction enzyme sites, or to a related laboratory technique by which such differences can be illustrated. In RFLP analysis, a DNA sample is digested into fragments by one or more restriction enzymes, and the resulting restriction fragments are then separated by gel electrophoresis according to their size.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.

Gene knockouts are a widely used genetic engineering technique that involves the targeted removal or inactivation of a specific gene within an organism's genome. This can be done through a variety of methods, including homologous recombination, CRISPR-Cas9, and TALENs.

<span class="mw-page-title-main">Molecular genetics</span> Scientific study of genes at the molecular level

Molecular genetics is a branch of biology that addresses how differences in the structures or expression of DNA molecules manifests as variation among organisms. Molecular genetics often applies an "investigative approach" to determine the structure and/or function of genes in an organism's genome using genetic screens. 

A genetic screen or mutagenesis screen is an experimental technique used to identify and select individuals who possess a phenotype of interest in a mutagenized population. Hence a genetic screen is a type of phenotypic screen. Genetic screens can provide important information on gene function as well as the molecular events that underlie a biological process or pathway. While genome projects have identified an extensive inventory of genes in many different organisms, genetic screens can provide valuable insight as to how those genes function.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome that is present in a sufficiently large fraction of considered population.

<span class="mw-page-title-main">Frameshift mutation</span> Mutation that shifts codon alignment

A frameshift mutation is a genetic mutation caused by indels of a number of nucleotides in a DNA sequence that is not divisible by three. Due to the triplet nature of gene expression by codons, the insertion or deletion can change the reading frame, resulting in a completely different translation from the original. The earlier in the sequence the deletion or insertion occurs, the more altered the protein. A frameshift mutation is not the same as a single-nucleotide polymorphism in which a nucleotide is replaced, rather than inserted or deleted. A frameshift mutation will in general cause the reading of the codons after the mutation to code for different amino acids. The frameshift mutation will also alter the first stop codon encountered in the sequence. The polypeptide being created could be abnormally short or abnormally long, and will most likely not be functional.

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.

<span class="mw-page-title-main">Loss of heterozygosity</span>

Loss of heterozygosity (LOH) is a type of genetic abnormality in diploid organisms in which one copy of an entire gene and its surrounding chromosomal region are lost. Since diploid cells have two copies of their genes, one from each parent, a single copy of the lost gene still remains when this happens, but any heterozygosity is no longer present.

<span class="mw-page-title-main">Gene mapping</span> Process of locating specific genes

Gene mapping or genome mapping describes the methods used to identify the location of a gene on a chromosome and the distances between genes. Gene mapping can also describe the distances between different sites within a gene.

<span class="mw-page-title-main">Ancestry-informative marker</span>

In population genetics, an ancestry-informative marker (AIM) is a single-nucleotide polymorphism that exhibits substantially different frequencies between different populations. A set of many AIMs can be used to estimate the proportion of ancestry of an individual derived from each population.

<span class="mw-page-title-main">Oncogenomics</span> Sub-field of genomics

Oncogenomics is a sub-field of genomics that characterizes cancer-associated genes. It focuses on genomic, epigenomic and transcript alterations in cancer.

<span class="mw-page-title-main">1000 Genomes Project</span> International research effort on genetic variation

The 1000 Genomes Project, launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one thousand anonymous participants from a number of different ethnic groups within the following three years, using newly developed technologies which were faster and less expensive. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.

Methylated DNA immunoprecipitation is a large-scale purification technique in molecular biology that is used to enrich for methylated DNA sequences. It consists of isolating methylated DNA fragments via an antibody raised against 5-methylcytosine (5mC). This technique was first described by Weber M. et al. in 2005 and has helped pave the way for viable methylome-level assessment efforts, as the purified fraction of methylated DNA can be input to high-throughput DNA detection methods such as high-resolution DNA microarrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). Nonetheless, understanding of the methylome remains rudimentary; its study is complicated by the fact that, like other epigenetic properties, patterns vary from cell-type to cell-type.

Virtual karyotype is the digital information reflecting a karyotype, resulting from the analysis of short sequences of DNA from specific loci all over the genome, which are isolated and enumerated. It detects genomic copy number variations at a higher resolution for level than conventional karyotyping or chromosome-based comparative genomic hybridization (CGH). The main methods used for creating virtual karyotypes are array-comparative genomic hybridization and SNP arrays.

Molecular Inversion Probe (MIP) belongs to the class of Capture by Circularization molecular techniques for performing genomic partitioning, a process through which one captures and enriches specific regions of the genome. Probes used in this technique are single stranded DNA molecules and, similar to other genomic partitioning techniques, contain sequences that are complementary to the target in the genome; these probes hybridize to and capture the genomic target. MIP stands unique from other genomic partitioning strategies in that MIP probes share the common design of two genomic target complementary segments separated by a linker region. With this design, when the probe hybridizes to the target, it undergoes an inversion in configuration and circularizes. Specifically, the two target complementary regions at the 5’ and 3’ ends of the probe become adjacent to one another while the internal linker region forms a free hanging loop. The technology has been used extensively in the HapMap project for large-scale SNP genotyping as well as for studying gene copy alterations and characteristics of specific genomic loci to identify biomarkers for different diseases such as cancer. Key strengths of the MIP technology include its high specificity to the target and its scalability for high-throughput, multiplexed analyses where tens of thousands of genomic loci are assayed simultaneously.

<span class="mw-page-title-main">Exome sequencing</span> Sequencing of all the exons of a genome

Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome. It consists of two steps: the first step is to select only the subset of DNA that encodes proteins. These regions are known as exons—humans have about 180,000 exons, constituting about 1% of the human genome, or approximately 30 million base pairs. The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.

Bulked segregant analysis (BSA) is a technique used to identify genetic markers associated with a mutant phenotype. This allows geneticists to discover genes conferring certain traits of interest, such as disease resistance or susceptibility.

References

  1. Baker SJ, Fearon ER, Nigro JM, Hamilton SR, Preisinger AC, Jessup JM, vanTuinen P, Ledbetter DH, Barker DF, Nakamura Y, White R, Vogelstein B (April 1989). "Chromosome 17 deletions and p53 gene mutations in colorectal carcinomas". Science. 244 (4901): 217–21. Bibcode:1989Sci...244..217B. doi:10.1126/science.2649981. PMID   2649981.
  2. Lee AS, Seo YC, Chang A, Tohari S, Eu KW, Seow-Choen F, McGee JO (September 2000). "Detailed deletion mapping at chromosome 11q23 in colorectal carcinoma". Br. J. Cancer. 83 (6): 750–5. doi:10.1054/bjoc.2000.1366. PMC   2363538 . PMID   10952779.
  3. Bell R, Herring SM, Gokul N, Monita M, Grove ML, Boerwinkle E, Doris PA (June 2011). "High-resolution identity by descent mapping uncovers the genetic basis for blood pressure differences between spontaneously hypertensive rat lines". Circ Cardiovasc Genet. 4 (3): 223–31. doi:10.1161/CIRCGENETICS.110.958934. PMC   3116070 . PMID   21406686.
  4. Sherman EA, Strauss KA, Tortorelli S, Bennett MJ, Knerr I, Morton DH, Puffenberger EG (November 2008). "Genetic mapping of glutaric aciduria, type 3, to chromosome 7 and identification of mutations in c7orf10". Am. J. Hum. Genet. 83 (5): 604–9. doi:10.1016/j.ajhg.2008.09.018. PMC   2668038 . PMID   18926513.
  5. 1 2 Broman KW, Weber JL (December 1999). "Long homozygous chromosomal segments in reference families from the centre d'Etude du polymorphisme humain". Am. J. Hum. Genet. 65 (6): 1493–500. doi:10.1086/302661. PMC   1288359 . PMID   10577902.
  6. Zeliha Görmez; Burcu Bakir-Gungor; Mahmut Şamil Sağıroğlu (Feb 2014). "HomSI: a homozygous stretch identifier from next-generation sequencing data". Bioinformatics. 30 (3): 445–447. doi: 10.1093/bioinformatics/btt686 . PMID   24307702.
  7. Luo J, Emanuele MJ, Li D, Creighton CJ, Schlabach MR, Westbrook TF, Wong KK, Elledge SJ (May 2009). "A genome-wide RNAi screen identifies multiple synthetic lethal interactions with the Ras oncogene". Cell. 137 (5): 835–48. doi:10.1016/j.cell.2009.05.006. PMC   2768667 . PMID   19490893.
  8. 1 2 Fortier S, Bilodeau M, Macrae T, Laverdure JP, Azcoitia V, Girard S, Chagraoui J, Ringuette N, Hébert J, Krosl J, Mayotte N, Sauvageau G (2010). "Genome-wide interrogation of Mammalian stem cell fate determinants by nested chromosome deletions". PLOS Genet. 6 (12): e1001241. doi: 10.1371/journal.pgen.1001241 . PMC   3000362 . PMID   21170304.
  9. Chen WJ, Lin Y, Xiong ZQ, Wei W, Ni W, Tan GH, Guo SL, He J, Chen YF, Zhang QJ, Li HF, Lin Y, Murong SX, Xu J, Wang N, Wu ZY (December 2011). "Exome sequencing identifies truncating mutations in PRRT2 that cause paroxysmal kinesigenic dyskinesia". Nat. Genet. 43 (12): 1252–5. doi:10.1038/ng.1008. PMID   22101681. S2CID   16129198.
  10. Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, Di Genova G, Ueda H, Cordell HJ, Eaves IA, Dudbridge F, Twells RC, Payne F, Hughes W, Nutland S, Stevens H, Carr P, Tuomilehto-Wolf E, Tuomilehto J, Gough SC, Clayton DG, Todd JA (2001). "Haplotype tagging for the identification of common disease genes". Nat Genet. 29 (2): 233–7. doi:10.1038/ng1001-233. PMID   11586306. S2CID   3388593.