Bioinformatic Harvester

Last updated
Bioinformatic Harvester
Developer(s) Urban Liebel, Björn Kindler
Stable release
4 / May 24, 2011;13 years ago (2011-05-24)
Operating system Web based
Type Bioinformatics tool
Website harvester.kit.edu [ permanent dead link ]

The Bioinformatic Harvester was a bioinformatic meta search engine created by the European Molecular Biology Laboratory [1] and subsequently hosted and further developed by KIT Karlsruhe Institute of Technology for genes and protein-associated information. Harvester currently works for human, mouse, rat, zebrafish, drosophila and arabidopsis thaliana based information. Harvester cross-links >50 popular bioinformatic resources and allows cross searches. Harvester serves tens of thousands of pages every day to scientists and physicians. Since 2014 the service is down.

Contents

How Harvester works

Harvester collects information from protein and gene databases along with information from so called "prediction servers." Prediction server e.g. provide online sequence analysis for a single protein. Harvesters search index is based on the IPI and UniProt protein information collection. The collections consists of:

Text based information

From the following databases:

Databases rich in graphical elements

These databases are not collected, but are crosslinked, being displayed via iframes. An iframe is a window within an HTML page for an embedded view of and interactive access to the linked database. Several such iframes are combined on a single Harvester protein page. This allows simultaneous, convenient comparison of information from several databases.

Access from external application

What one can find

Harvester allows a combination of different search terms and single words.

Search Examples:

See also

Literature

Notes and references

  1. Manoj, M; Elizabeth, Jacob (Oct 2008). "Information retrieval on Internet using meta-search engines: A review" (PDF). Journal of Scientific & Industrial Research. 67 (10): 739–746. ISSN   0022-4456.

Related Research Articles

<span class="mw-page-title-main">National Center for Biotechnology Information</span> Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

The GFP-cDNA project documents the localisation of proteins to subcellular compartments of the eukaryotic cell applying fluorescence microscopy. Experimental data are complemented with bioinformatic analyses and published online in a database. A search function allows the finding of proteins containing features or motifs of particular interest. The project is a collaboration of the research groups of Rainer Pepperkok at the European Molecular Biology Laboratory (EMBL) and Stefan Wiemann at the German Cancer Research Centre (DKFZ).

HomoloGene, a tool of the United States National Center for Biotechnology Information (NCBI), is a system for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes.

Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.

UniGene was a NCBI database of the transcriptome and thus, despite the name, not primarily a database for genes. Each entry is a set of transcripts that appear to stem from the same transcription locus. Information on protein similarities, gene expression, cDNA clones, and genomic location is included with each entry.

<span class="mw-page-title-main">KIF20A</span> Protein-coding gene in the species Homo sapiens

Kinesin-like protein KIF20A is a protein that in humans is encoded by the KIF20A gene.

<span class="mw-page-title-main">Mitochondrial pyruvate carrier 2</span> Protein-coding gene in the species Homo sapiens

Mitochondrial pyruvate carrier 2 (MPC2) also known as brain protein 44 (BRP44) is a protein that in humans is encoded by the MPC2 gene. It is a member of the Mitochondrial Pyruvate Carrier (MPC) protein family. This protein is involved in transport of pyruvate across the inner membrane of mitochondria in preparation for the pyruvate dehydrogenase reaction.

<span class="mw-page-title-main">ITFG3</span> Protein-coding gene in the species Homo sapiens

Protein ITFG3 also known as family with sequence similarity 234 member A (FAM234A) is a protein that in humans is encoded by the ITFG3 gene. Here, the gene is explored as encoded by mRNA found in Homo sapiens. The FAM234A gene is conserved in mice, rats, chickens, zebrafish, dogs, cows, frogs, chimpanzees, and rhesus monkeys. Orthologs of the gene can be found in at least 220 organisms including the tropical clawed frog, pandas, and Chinese hamsters. The gene is located at 16p13.3 and has a total of 19 exons. The mRNA has a total of 3224 bp and the protein has 552 aa. The molecular mass of the protein produced by this gene is 59660 Da. It is expressed in at least 27 tissue types in humans, with the greatest presence in the duodenum, fat, small intestine, and heart.

The Vertebrate Genome Annotation (VEGA) database is a biological database dedicated to assisting researchers in locating specific areas of the genome and annotating genes or regions of vertebrate genomes. The VEGA browser is based on Ensembl web code and infrastructure and provides a public curation of known vertebrate genes for the scientific community. The VEGA website is updated frequently to maintain the most current information about vertebrate genomes and attempts to present consistently high-quality annotation of all its published vertebrate genomes or genome regions. VEGA was developed by the Wellcome Trust Sanger Institute and is in close association with other annotation databases, such as ZFIN, the Havana Group and GenBank. Manual annotation is currently more accurate at identifying splice variants, pseudogenes, polyadenylation features, non-coding regions and complex gene arrangements than automated methods.

GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

<span class="mw-page-title-main">KIAA0895L</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein KIAA0895-like also known as LOC653319, is a protein that in humans is encoded by the KIAA0895L gene.

The International Protein Index (IPI) is a defunct protein database launched in 2001 by the European Bioinformatics Institute (EBI), and closed in 2011. Its purpose was to provide the proteomics community with a resource that enables

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

<span class="mw-page-title-main">TMEM81</span> Protein-coding gene in the species Homo sapiens

Transmembrane Protein 81 or TMEM81 is a protein that in humans is encoded by the TMEM81 gene. TMEM81 is a poorly-characterized transmembrane protein which contains an extracellular immunoglobulin domain.

The Global Biodata Coalition is an organization promoting biocuration and fostering support of research funders for the sustainability of biological data resources.