EB-eye

Last updated

EB-eye
EBI Search
Formation2006
TypeNon-profit academic organization
ServicesResearch and services in bioinformatics
Website ebi.ac.uk

The EB-eye, also known as EBI Search, is a search engine that provides uniform access to the biological data resources hosted at the European Bioinformatics Institute (EBI). [1] [2]

Contents

The EB-eye – the EBI search engine for biological data

The European Bioinformatics Institute is a non-profit academic organisation that forms part of the European Molecular Biology Laboratory (EMBL). The EBI is a centre for research and services in bioinformatics. The Institute manages databases of biological data including nucleotide sequences, protein sequences and macromolecular structures.

The mission of the EBI

What is the EB-eye ?

The EB-eye is a fast and efficient search engine that currently provides easy and uniform access to biological data resources hosted at the EBI.

The project was started in August 2006 and is developed on top of the Apache Lucene technology. It is a Java framework that provides extremely powerful indexing and search capabilities.

The EB-eye presents the hits of a search in a very simple way and acts as a gateway to access biological entries and related information in dedicated portals. One of the key features of EB-eye is the capability to coherently display the relationships that exist between diverse databases allowing the user to navigate this network of cross-references.

The user can search globally across all EBI data resources through the "Global Search" box or even create more specific queries on targeted resources by using the EB-eye Query Builder.

EB-eye publicly exposes both a web and a Web services RESTful interface.

Access to the EB-eye

The global search is available on the EBI web site. You can simply type some query terms into the text search box there and press the search button (or press Enter). The system then displays a summary page with a list of various data sets and the number of matches found in each of them.

Global search examples

Query builder

The query builder allows users to create and save complex queries on the available data to get specific search results. See the complex query examples section.

What can the user search for?

Many resources at EBI are indexed within the search engine, but some are not. The EB-eye can search only the information that gets indexed. This implies that other search engines operating on biological data might yield different results. As a rule of thumb, the EB-eye search engine index identifiers, names, descriptions, keywords and cross-references.

Complex query examples

It is also possible to search using cross-references.

Help and documentation for EB-eye

Further information about how to use this search engine is available at the EB-eye help and documentation.

Programmatic access to the EB-eye

The EB-eye is also programmatically accessible through Web services technologies using the EB-eye RESTful interface. The EB-eye RESTful WADL (Web Application Description Language) is publicly available. See also the main Web services pages at the EBI.

Other Lucene-based search engine in biology/bioinformatics

Lucene has been around for a while now. Many bioinformatics centres have been experimenting with its use with biological data and databases. A pioneering development in this field is headed by Dr. Don Gilbert at Indiana University, called LuceGene, a part of the GMOD (Generic Software Components for Model Organisms Databases) initiative. Another example is the search engine in the UniProt web site which is also based on Lucene and adds features such as sorting large data sets, subqueries across data sets and group-by queries. Lucene is also used in QuALM a question answering system for Wikipedia.

Related Research Articles

<span class="mw-page-title-main">National Center for Biotechnology Information</span> Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

In the field of bioinformatics, a sequence database is a type of biological database that is composed of a large collection of computerized ("digital") nucleic acid sequences, protein sequences, or other polymer sequences stored on a computer. The UniProt database is an example of a protein sequence database. As of 2013 it contained over 40 million sequences and is growing at an exponential rate. Historically, sequences were published in paper form, but as the number of sequences grew, this storage method became unsustainable.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, United States.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

The Bioinformatic Harvester was a bioinformatic meta search engine created by the European Molecular Biology Laboratory and subsequently hosted and further developed by KIT Karlsruhe Institute of Technology for genes and protein-associated information. Harvester currently works for human, mouse, rat, zebrafish, drosophila and arabidopsis thaliana based information. Harvester cross-links >50 popular bioinformatic resources and allows cross searches. Harvester serves tens of thousands of pages every day to scientists and physicians. Since 2014 the service is down.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

Expasy is an online bioinformatics resource operated by the SIB Swiss Institute of Bioinformatics. It is an extensible and integrative portal which provides access to over 160 databases and software tools and supports a range of life science and clinical research areas, from genomics, proteomics and structural biology, to evolution and phylogeny, systems biology and medical chemistry. The individual resources are hosted in a decentralized way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions.

The European Molecular Biology network (EMBnet) is an international scientific network and interest group that aims to enhance bioinformatics services by bringing together bioinformatics expertises and capacities. On 2011 EMBnet has 37 nodes spread over 32 countries. The nodes include bioinformatics related university departments, research institutes and national service providers.

The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was first introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.

<span class="mw-page-title-main">ChEMBL</span> Chemical database of bioactive molecules also having drug-like properties

ChEMBL or ChEMBLdb is a manually curated chemical database of bioactive molecules with drug inducing properties. It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK.

The BRENDA tissue ontology (BTO) represents a comprehensive structured encyclopedia. It provides terms, classifications, and definitions of tissues, organs, anatomical structures, plant parts, cell cultures, cell types, and cell lines of organisms from all taxonomic groups (animals, plants, fungi, protozoon) as enzyme sources. The information is connected to the functional data in the BRENDA ("BRaunschweig ENzyme DAtabase“) enzyme information system.

The International Protein Index (IPI) is a defunct protein database launched in 2001 by the European Bioinformatics Institute (EBI), and closed in 2011. Its purpose was to provide the proteomics community with a resource that enables

<span class="mw-page-title-main">Rolf Apweiler</span>

Rolf Apweiler is a director of European Bioinformatics Institute (EBI) part of the European Molecular Biology Laboratory (EMBL) with Ewan Birney.

BioSamples (BioSD) is a database at European Bioinformatics Institute for the information about the biological samples used in sequencing.

<span class="mw-page-title-main">European Nucleotide Archive</span> Online database from the EBI on Nucleotides

The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.

<span class="mw-page-title-main">Experimental factor ontology</span>

Experimental factor ontology, also known as EFO, is an open-access ontology of experimental variables particularly those used in molecular biology. The ontology covers variables which include aspects of disease, anatomy, cell type, cell lines, chemical compounds and assay information. EFO is developed and maintained at the EMBL-EBI as a cross-cutting resource for the purposes of curation, querying and data integration in resources such as Ensembl, ChEMBL and Expression Atlas.

<span class="mw-page-title-main">BacDive</span> Online database for bacteria

BacDive is a bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity.

In molecular biology, MobiDB is a curated biological database designed to offer a centralized resource for annotations of intrinsic protein disorder. Protein disorder is a structural feature characterizing a large number of proteins with prominent members known as intrinsically unstructured proteins. The database features three levels of annotation: manually curated, indirect and predicted. By combining different data sources of protein disorder into a consensus annotation, MobiDB aims at giving the best possible picture of the "disorder landscape" of a given protein of interest.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. Squizzato S.; Park Y.M.; Buso N.; Gur T.; Cowley A.; Li W.; Uludag M.; Pundir S.; Cham J.A.; McWilliam H.; Lopez R. (2015). "The EBI Search engine: providing search and retrieval functionality for biological data from EMBL-EBI". Nucleic Acids Res. 43 (W1): W585-8. doi:10.1093/nar/gkv316. PMC   4489232 . PMID   25855807.
  2. Valentin F.; Squizzato S.; Goujon M.; McWilliam H.; Paern J.; Lopez R. (2010). "Fast and efficient searching of biological data resources—using EB-eye". Brief Bioinform. 11 (4): 375–384. doi: 10.1093/bib/bbp065 . PMC   2905521 . PMID   20150321.