The PeroxiBase database has been created at the University of Geneva (Switzerland) at the end of 2003, by two plant biologists specialised in the study of plant peroxidases. It was first limited to class III peroxidases (plant peroxidases) and was then expanded to include all possible haem and non-haem peroxidase protein sequences. Many researchers and bioinformaticians from the University of Geneva joined their efforts to develop the database and rapidly increase the number of peroxidase sequences. Since 2005, the database accepts external contributions, which are verified by PeroxiBase curators. The majority of haem and non-haem peroxidase sequences can now be found in the PeroxiBase. [1]
The database is hosted by the Swiss Institute of Bioinformatics.
Peroxidase sequences come from other general public databases (NCBI, TIGR, UniProt KnowledgeBase: all the databases used are listed in the PeroxiBase), either as pre-existing annotated sequences, or from raw data (whole genomic sequencing projects).
Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combines biology, chemistry, physics, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques.
In the fields of molecular biology and genetics, a genome is all genetic information of an organism. It consists of nucleotide sequences of DNA. The genome includes both the genes and the noncoding DNA, as well as mitochondrial DNA and chloroplast DNA. The study of the genome is called genomics. The genomes of several organisms have been sequenced and genes analyzed. The human genome project which sequenced the entire genome for Homo sapiens was successfully completed in April 2003.
Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.
Peroxidases or peroxide reductases are a large group of enzymes which play a role in various biological processes. They are named after the fact that they commonly break up peroxides.
Cytochrome c peroxidase, or CCP, is a water-soluble heme-containing enzyme of the peroxidase family that takes reducing equivalents from cytochrome c and reduces hydrogen peroxide to water:
Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism and to annotate protein-coding genes and other important genome-encoded features. The genome sequence of an organism includes the collective DNA sequences of each chromosome in the organism. For a bacterium containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of autosomes and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences.
In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases. EST approaches have largely been superseded by whole genome and transcriptome sequencing and metagenome sequencing.
In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.
Metagenomics is the study of genetic material recovered directly from environmental samples. The broad field may also be referred to as environmental genomics, ecogenomics or community genomics.
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, United States.
Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 34.0, was released in March 2021 and contains 19,179 families.
The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a physical and a functional standpoint. It remains the world's largest collaborative biological project. Planning started after the idea was picked up in 1984 by the US government, the project formally launched in 1990, and was declared complete on April 14, 2003. Level "complete genome" was achieved in May 2021.
Warren Richard Gish is the owner of Advanced Biocomputing LLC. He joined Washington University in St. Louis as a junior faculty member in 1994, and was a Research Associate Professor of Genetics from 2002 to 2007.
Haem peroxidases (or heme peroxidases) are haem-containing enzymes that use hydrogen peroxide as the electron acceptor to catalyse a number of oxidative reactions. Most haem peroxidases follow the reaction scheme:
In molecular biology, the di-haem cytochrome c peroxidase family is a group of distinct cytochrome c peroxidases (CCPs) that contain two haem groups. Similar to other cytochrome c peroxidases, they reduce hydrogen peroxide to water using c-type haem as an oxidizable substrate. However, since they possess two, instead of one, haem prosthetic groups, this family of bacterial CCPs reduce hydrogen peroxide without the need to generate semi-stable free radicals. The two haem groups have significantly different redox potentials. The high potential haem feeds electrons from electron shuttle proteins to the low potential haem, where peroxide is reduced. The CCP protein itself is structured into two domains, each containing one c-type haem group, with a calcium-binding site at the domain interface. This family also includes MauG proteins, whose similarity to di-haem CCP was previously recognised.
In molecular biology, the DyP-type peroxidase family is a family of haem peroxidase enzymes. Haem peroxidases were originally divided into two superfamilies, namely, the animal peroxidases and the plant peroxidases, which include fungal and bacterial peroxidases. The DyP family constitutes a novel class of haem peroxidase. Because these enzymes were derived from fungal sources, the DyP family was thought to be structurally related to the class II secretory fungal peroxidases. However, the DyP family exhibits only low sequence similarity to classical fungal peroxidases, such as LiP and MnP, and does not contain the conserved proximal and distal histidines and an essential arginine found in other plant peroxidase superfamily members.
De novo transcriptome assembly is the de novo sequence assembly method of creating a transcriptome without the aid of a reference genome.
The Earth Microbiome Project (EMP) is an initiative founded by Janet Jansson, Jack Gilbert and Rob Knight in 2010 to collect natural samples and to analyze the microbial community around the globe.
WormBase is an online biological database about the biology and genome of the nematode model organism Caenorhabditis elegans and contains information about other related nematodes. WormBase is used by the C. elegans research community both as an information resource and as a place to publish and distribute their results. The database is regularly updated with new versions being released every two months. WormBase is one of the organizations participating in the Generic Model Organism Database (GMOD) project.