Bioinformatic Harvester

Bioinformatic Harvester
Developer(s)	Urban Liebel, Björn Kindler
Stable release	4 / May 24, 2011;13 years ago
Operating system	Web based
Type	Bioinformatics tool
Website	harvester.kit.edu [ permanent dead link ]

Last updated June 22, 2024

The Bioinformatic Harvester was a bioinformatic meta search engine created by the European Molecular Biology Laboratory ^[1] and subsequently hosted and further developed by KIT Karlsruhe Institute of Technology for genes and protein-associated information. Harvester currently works for human, mouse, rat, zebrafish, drosophila and arabidopsis thaliana based information. Harvester cross-links >50 popular bioinformatic resources and allows cross searches. Harvester serves tens of thousands of pages every day to scientists and physicians. Since 2014 the service is down.

How Harvester works

Harvester collects information from protein and gene databases along with information from so called "prediction servers." Prediction server e.g. provide online sequence analysis for a single protein. Harvesters search index is based on the IPI and UniProt protein information collection. The collections consists of:

~72.000 human, ~57.000 mouse, ~41.000 rat, ~51.000 zebrafish, ~35.000 arabidopsis protein pages, which cross-link ~50 major bioinformatic resources.

Harvester crosslinks several types of information

Text based information

From the following databases:

UniProt, one of the largest protein databases
SOURCE, convenient gene information overview
Simple Modular Architecture Research Tool (SMART)
SOSUI, predicts transmembrane domains
PSORT, predicts protein localisation
HomoloGene, compares proteins from different species
gfp-cdna, protein localisation with fluorescence microscopy
International Protein Index (IPI)

Databases rich in graphical elements

These databases are not collected, but are crosslinked, being displayed via iframes. An iframe is a window within an HTML page for an embedded view of and interactive access to the linked database. Several such iframes are combined on a single Harvester protein page. This allows simultaneous, convenient comparison of information from several databases.

NCBI-BLAST, an algorithm for comparing biological sequences from the NCBI
Ensembl, automatic gene annotation by the EMBL-EBI and Sanger Institute
FlyBase is a database of model organism Drosophila melanogaster
GoPubMed is a knowledge-based search engine for biomedical texts
iHOP, information hyperlinked over proteins via gene/protein synonyms
Mendelian Inheritance in Man project catalogues all the known diseases
RZPD, German resources Center for genome research in Berlin/Heidelberg
STRING, Search Tool for the Retrieval of Interacting Genes/Proteins, developed by EMBL, SIB and UZH
Zebrafish Information Network
LOCATE subcellular localisation database (mouse)

Access from external application

Genome browser, working draft assemblies for genomes UCSC
Google Scholar
Mitocheck
PolyMeta, meta search engine for Google, Yahoo, MSN, Ask, Exalead, AllTheWeb, GigaBlast

What one can find

Harvester allows a combination of different search terms and single words.

Search Examples:

Gene-name: "golga3"
Gene-alias: "ADAP-S ADAS ADHAPS ADPS" (one gene name is sufficient)
Gene-Ontologies: "Enzyme linked receptor protein signaling pathway"
Unigene-Cluster: "Hs.449360"
Go-annotation: "intra-Golgi transport"
Molecular function: "protein kinase binding"
Protein: "Q9NPD3"
Protein domain: "SH2 sar"
Protein Localisation: "endoplasmic reticulum"
Chromosome: "2q31"
Disease relevant: use the word "diseaselink"
Combinations: "golgi diseaselink" (finds all golgi proteins associated with a disease)
mRNA: "AL136897"
Word: "Cancer"
Comment: "highly expressed in heart"
Author: "Merkel, Schmidt"
Publication or project: "cDNA sequencing project"

Literature

Liebel U, Kindler B, Pepperkok R (August 2004). "'Harvester': a fast meta search engine of human protein resources". Bioinformatics. 20 (12): 1962–3. doi: 10.1093/bioinformatics/bth146 . PMID 14988114.
Liebel U, Kindler B, Pepperkok R (2005). "Bioinformatic "Harvester": A Search Engine for Genome-Wide Human, Mouse, and Rat Protein Resources". GTPases Regulating Membrane Dynamics. Methods in Enzymology. Vol. 404. pp. 19–26. doi:10.1016/S0076-6879(05)04003-6. ISBN 9780121828097. PMID 16413254.

Notes and references

↑ Manoj, M; Elizabeth, Jacob (Oct 2008). "Information retrieval on Internet using meta-search engines: A review" (PDF). Journal of Scientific & Industrial Research. 67 (10): 739–746. ISSN 0022-4456.

External links

Official website ^{[ permanent dead link ]} Bioinformatic Harvester V at KIT Karlsruhe Institute of Technology
"Harvester42 at KIT - integrating 50 general search engines". Archived from the original on 2013-01-06. Retrieved 2013-01-06.

Related Research Articles

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

The GFP-cDNA project documents the localisation of proteins to subcellular compartments of the eukaryotic cell applying fluorescence microscopy. Experimental data are complemented with bioinformatic analyses and published online in a database. A search function allows the finding of proteins containing features or motifs of particular interest. The project is a collaboration of the research groups of Rainer Pepperkok at the European Molecular Biology Laboratory (EMBL) and Stefan Wiemann at the German Cancer Research Centre (DKFZ).

HomoloGene, a tool of the United States National Center for Biotechnology Information (NCBI), is a system for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes.

Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.

UniGene was a NCBI database of the transcriptome and thus, despite the name, not primarily a database for genes. Each entry is a set of transcripts that appear to stem from the same transcription locus. Information on protein similarities, gene expression, cDNA clones, and genomic location is included with each entry.

Kinesin-like protein KIF20A is a protein that in humans is encoded by the KIF20A gene.

Mitochondrial pyruvate carrier 2 (MPC2) also known as brain protein 44 (BRP44) is a protein that in humans is encoded by the MPC2 gene. It is a member of the Mitochondrial Pyruvate Carrier (MPC) protein family. This protein is involved in transport of pyruvate across the inner membrane of mitochondria in preparation for the pyruvate dehydrogenase reaction.

Protein ITFG3 also known as family with sequence similarity 234 member A (FAM234A) is a protein that in humans is encoded by the ITFG3 gene. Here, the gene is explored as encoded by mRNA found in Homo sapiens. The FAM234A gene is conserved in mice, rats, chickens, zebrafish, dogs, cows, frogs, chimpanzees, and rhesus monkeys. Orthologs of the gene can be found in at least 220 organisms including the tropical clawed frog, pandas, and Chinese hamsters. The gene is located at 16p13.3 and has a total of 19 exons. The mRNA has a total of 3224 bp and the protein has 552 aa. The molecular mass of the protein produced by this gene is 59660 Da. It is expressed in at least 27 tissue types in humans, with the greatest presence in the duodenum, fat, small intestine, and heart.

The Vertebrate Genome Annotation (VEGA) database is a biological database dedicated to assisting researchers in locating specific areas of the genome and annotating genes or regions of vertebrate genomes. The VEGA browser is based on Ensembl web code and infrastructure and provides a public curation of known vertebrate genes for the scientific community. The VEGA website is updated frequently to maintain the most current information about vertebrate genomes and attempts to present consistently high-quality annotation of all its published vertebrate genomes or genome regions. VEGA was developed by the Wellcome Trust Sanger Institute and is in close association with other annotation databases, such as ZFIN, the Havana Group and GenBank. Manual annotation is currently more accurate at identifying splice variants, pseudogenes, polyadenylation features, non-coding regions and complex gene arrangements than automated methods.

GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

Uncharacterized protein KIAA0895-like also known as LOC653319, is a protein that in humans is encoded by the KIAA0895L gene.

The International Protein Index (IPI) is a defunct protein database launched in 2001 by the European Bioinformatics Institute (EBI), and closed in 2011. Its purpose was to provide the proteomics community with a resource that enables

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

Transmembrane Protein 81 or TMEM81 is a protein that in humans is encoded by the TMEM81 gene. TMEM81 is a poorly-characterized transmembrane protein which contains an extracellular immunoglobulin domain.

The Global Biodata Coalition is an organization promoting biocuration and fostering support of research funders for the sustainability of biological data resources.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Manoj, M; Elizabeth, Jacob (Oct 2008). "Information retrieval on Internet using meta-search engines: A review" (PDF). Journal of Scientific & Industrial Research. 67 (10): 739–746. ISSN 0022-4456.

[1]