Mutalyzer is a web-based software tool which was primarily developed to check the description of sequence variants identified in a gene during genetic testing. [1] Mutalyzer applies the rules of the standard human sequence variant nomenclature and can correct descriptions accordingly. Apart from the sequence variant description, Mutalyzer requires a DNA sequence record containing the transcript and protein feature annotation as a reference. Mutalyzer 2 accepts GenBank and Locus Reference Genomic (LRG) records. The annotation is also used to apply the correct codon translation tables and generate DNA and protein variant descriptions for any organism. The Mutalyzer server supports programmatic access via a SOAP Web service described in the Web Services Description Language (WSDL) and an HTTP/RPC+JSON web service.
Genetic testing is generally performed in families with hereditary disease. Any sequence variant identified in a gene can be described in test reports using the position of the change and the nucleotide or amino acid involved. With this simple rule, a deletion of the nucleotide guanine (G) in a stretch of 4 G nucleotides might be described in 4 different ways, when each of the G positions is used. Although different descriptions do not affect the functional consequences of the change, they may obfuscate the fact that two persons share the same variant or the real frequency of a variant in the population. The standard human sequence variant nomenclature proposed by the Human Genome Variation Society was developed to solve this problem. [2] Proper variant descriptions are expected to facilitate searches for more information about the functional consequences in the literature and in gene variant or locus-specific databases (LSDBs).
Mutalyzer is used by the Leiden Open Variation Database (LOVD), which stores sequence variant information for many human genes, to check variant descriptions before submission of new data. [3] This helps data sharing, display and integration with other genetic resources (e.g., Ensembl, UCSC Genome Browser, NCBI sequence viewer)
An allele is a variation of the same sequence of nucleotides at the same place on a long DNA molecule, as described in leading textbooks on genetics and evolution. The word is a short form of "allelomorph".
A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.
In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome that is present in a sufficiently large fraction of considered population.
Multilocus sequence typing (MLST) is a technique in molecular biology for the typing of multiple loci, using DNA sequences of internal fragments of multiple housekeeping genes to characterize isolates of microbial species.
The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.
Thiopurine methyltransferase or thiopurine S-methyltransferase (TPMT) is an enzyme that in humans is encoded by the TPMT gene. A pseudogene for this locus is located on chromosome 18q.
The completion of the human genome sequencing in the early 2000s was a turning point in genomics research. Scientists have conducted series of research into the activities of genes and the genome as a whole. The human genome contains around 3 billion base pairs nucleotide, and the huge quantity of data created necessitates the development of an accessible tool to explore and interpret this information in order to investigate the genetic basis of disease, evolution, and biological processes. The field of genomics has continued to grow, with new sequencing technologies and computational tool making it easier to study the genome.
The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.
Cav1.4 also known as the calcium channel, voltage-dependent, L type, alpha 1F subunit (CACNA1F), is a human gene.
The 1000 Genomes Project, launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one thousand anonymous participants from a number of different ethnic groups within the following three years, using newly developed technologies which were faster and less expensive. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.
The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Although the name of the database implies a collection of one class of polymorphisms only, it in fact contains a range of molecular variation: (1) SNPs, (2) short deletion and insertion polymorphisms (indels/DIPs), (3) microsatellite markers or short tandem repeats (STRs), (4) multinucleotide polymorphisms (MNPs), (5) heterozygous sequences, and (6) named variants. The dbSNP accepts apparently neutral polymorphisms, polymorphisms corresponding to known phenotypes, and regions of no variation. It was created in September 1998 to supplement GenBank, NCBI’s collection of publicly available nucleic acid and protein sequences.
Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast.
The Leiden Open Variation Database (LOVD) is a free, flexible web-based open source database developed in the Leiden University Medical Center in the Netherlands, designed to collect and display variants in the DNA sequence. The focus of an LOVD is usually the combination between a gene and a genetic (heritable) disease. All sequence variants found in individuals are collected in the database, together with information about whether they could be causally connected to the disease (i.e. a disease-causing variant or mutation) or not (i.e. a non-disease causing variant). Specialized doctors (clinical geneticists) use LOVDs to diagnose and advise patients carrying a genetic disease. Ideally, if a patient has been screened for mutations and one has been found, information in LOVD can predict the progress of the disease.
GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.
Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome. It consists of two steps: the first step is to select only the subset of DNA that encodes proteins. These regions are known as exons—humans have about 180,000 exons, constituting about 1% of the human genome, or approximately 30 million base pairs. The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.
DECIPHER is a web-based resource and database of genomic variation data from analysis of patient DNA. It documents submicroscopic chromosome abnormalities and pathogenic sequence variants, from over 25000 patients and maps them to the human genome using Ensembl or UCSC Genome Browser. In addition it catalogues the clinical characteristics from each patient and maintains a database of microdeletion/duplication syndromes, together with links to relevant scientific reports and support groups.
Locus Reference Genomic (LRG) is a DNA sequence format that was developed to aid in curating locus specific databases (LSDBs) that record DNA sequence variation which can result in inherited diseases. LRGs have fixed sequences that are independent of the genome so that they provide a stable framework for reporting variants. The LRG format uses extensible markup language (XML) to provide highly structured single records containing the genomic DNA sequence for individual genes along with the mRNAs and proteins encoded by these genes. LRG records are recommended in the Human Genome Variation Society Nomenclature guidelines as reference sequences to report sequence variants in LSDBs and the literature.
A gene is said to be polymorphic if more than one allele occupies that gene's locus within a population. In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to generally be considered polymorphic.
Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.
ANNOVAR is a bioinformatics software tool for the interpretation and prioritization of single nucleotide variants (SNVs), insertions, deletions, and copy number variants (CNVs) of a given genome.