Haploview

Last updated

Haploview [1] is a commonly used bioinformatics software which is designed to analyze and visualize patterns of linkage disequilibrium (LD) in genetic data. Haploview can also perform association studies, choosing tagSNPs [2] and estimating haplotype frequencies. Haploview is developed and maintained by Dr. Mark Daly's lab at the MIT/Harvard Broad Institute.

Haploview currently supports the following functionalities:

Related Research Articles

Single-nucleotide polymorphism Single nucleotide position in genomic DNA at which different sequence alternatives exist

In genetics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

Haplotype Group of genes from one parent

A haplotype is a group of alleles in an organism that are inherited together from a single parent.

In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is higher or lower than what would be expected if the loci were independent and associated randomly.

The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.

Identity by descent Identical nucleotide sequence due to inheritance without recombination from a common ancestor

A DNA segment is identical by state (IBS) in two or more individuals if they have identical nucleotide sequences in this segment. An IBS segment is identical by descent (IBD) in two or more individuals if they have inherited it from a common ancestor without recombination, that is, the segment has the same ancestral origin in these individuals. DNA segments that are IBD are IBS per definition, but segments that are not IBD can still be IBS due to the same mutations in different individuals or recombinations that do not alter the segment.

Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence.

PupaSuite is an interactive web-based SNP analysis tool that allows for the selection of relevant SNPs within a gene, based on different characteristics of the SNP itself, such as validation status, type, frequency/population data and putative functional properties. Also, PupaSuite provides information about LD parameters and identifies haplotype blocks and tag SNPs.

A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. It is possible to identify genetic variation and association to phenotypes without genotyping every SNP in a chromosomal region. This reduces the expense and time of mapping genome areas associated with disease, since it eliminates the need to study every individual SNP. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.

Genome-wide association study Study to research genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait.

In genomics, a genome-wide association study, also known as whole genome association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

dbSNP

The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Although the name of the database implies a collection of one class of polymorphisms only, it in fact contains a range of molecular variation: (1) SNPs, (2) short deletion and insertion polymorphisms (indels/DIPs), (3) microsatellite markers or short tandem repeats (STRs), (4) multinucleotide polymorphisms (MNPs), (5) heterozygous sequences, and (6) named variants. The dbSNP accepts apparently neutral polymorphisms, polymorphisms corresponding to known phenotypes, and regions of no variation. It was created in September 1998 to supplement GenBank, NCBI’s collection of publicly available nucleic acid and protein sequences.

WGAViewer is a bioinformatics software tool which is designed to visualize, annotate, and help interpret the results generated from a genome wide association study (GWAS). Alongside the P values of association, WGAViewer allows a researcher to visualize and consider other supporting evidence, such as the genomic context of the SNP, linkage disequilibrium (LD) with ungenotyped SNPs, gene expression database, and the evidence from other GWAS projects, when determining the potential importance of an individual SNP.

Quantitative trait loci mapping or QTL mapping is the process of identifying genomic regions that potentially contain genes responsible for important economic, health or environmental characters. Mapping QTLs is an important activity that plant breeders and geneticists routinely use to associate potential causal genes with phenotypes of interest. Family-based QTL mapping is a variant of QTL mapping where multiple-families are used.

Imputation in genetics refers to the statistical inference of unobserved genotypes. It is achieved by using known haplotypes in a population, for instance from the HapMap or the 1000 Genomes Project in humans, thereby allowing to test for association between a trait of interest and experimentally untyped genetic variants, but whose genotypes have been statistically inferred ("imputed"). Genotype imputation is usually performed on SNP, the most common kind of genetic variation.

The interdisciplinary research field of Computational and Statistical Genetics uses the latest approaches in genomics, quantitative genetics, computational sciences, bioinformatics and statistics to develop and apply computationally efficient and statistically robust methods to sort through increasingly rich and massive genome wide data sets to identify complex genetic patterns, gene functionalities and interactions, disease and phenotype associations involving the genomes of various organisms. This field is also often referred to as computational genomics. This is an important discipline within the umbrella field computational biology.

Mega2 allows the applied statistical geneticist to convert one's data from several input formats to a large number output formats suitable for analysis by commonly used software packages. In a typical human genetics study, the analyst often needs to use a variety of different software programs to analyze the data, and these programs usually require that the data be formatted to their precise input specifications. Conversion of one's data into these multiple different formats can be tedious, time-consuming, and error-prone. Mega2, by providing validated conversion pipelines, can accelerate the analyses while reducing errors.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

Haplogroup E-M2 Human Y-chromosome DNA haplogroup

Haplogroup E-M2 is a human Y-chromosome DNA haplogroup. It is primarily distributed in Sub-Saharan Africa. E-M2 is the predominant subclade in Western Africa, Central Africa, Southern Africa and the African Great Lakes, and occurs at moderate frequencies in North Africa and Middle East. E-M2 has several subclades, but many of these subhaplogroups are included in either E-L485 or E-U175. E-M2 is especially common in native Africans speaking Niger-Congo languages and was spread to Southern and Eastern Africa through the Bantu expansion.

In statistical genetics, linkage disequilibrium score regression is a technique that aims to quantify the separate contributions of polygenic effects and various confounding factors, such as population stratification, based on summary statistics from genome-wide association studies (GWASs). The approach involves using regression analysis to examine the relationship between linkage disequilibrium scores and the test statistics of the single-nucleotide polymorphisms (SNPs) from the GWAS. Here, the "linkage disequilibrium score" for a SNP "is the sum of LD r2 measured with all other SNPs".

In genetics, a haplotype block is a region of an organism's genome in which there is little evidence of a history of genetic recombination, and which contain only a small number of distinct haplotypes. According to the haplotype-block model, such blocks should show high levels of linkage disequilibrium and be separated from one another by numerous recombination events. The boundaries of haplotype blocks cannot be directly observed; they must instead be inferred indirectly through the use of algorithms. However, some evidence suggests that different algorithms for identifying haplotype blocks give very different results when used on the same data, though another study suggests that their results are generally consistent. The National Institutes of Health funded the HapMap project to catalog haplotype blocks throughout the human genome.

Snagger is a bioinformatics software program for selecting tag SNPs using pairwise r2 linkage disequilibrium. It is implemented as extension to the popular software, Haploview, and is freely available under the MIT License. Snagger distinguishes itself from existing single nucleotide polymorphism (SNP) selection algorithms, including Tagger, by providing user options that allow for:

References

  1. Barrett J.C.; Fry B.; Maller J.; Daly M.J. (2005). "Haploview: analysis and visualization of LD and haplotype maps". Bioinformatics. 21 (2): 263–5. doi: 10.1093/bioinformatics/bth457 . PMID   15297300.
  2. de Bakker P. I.; Yelensky R.; Pe'er I.; Gabriel S. B.; Daly M. J.; Altshuler D. (2005). "Efficiency and power in genetic association studies". Nature Genetics. 37 (11): 1217–23. doi:10.1038/ng1669. PMID   16244653.