Search as a service

Last updated

Search as a service is a branch of software as a service (SaaS), focused on enterprise search or site-specific web search.

Contents

Searching is an important part of any business database function, either through internal databases, internal document stores, or through the content of a website. This is needed for both internal company staff and for external customers. Although a simple database query such as "List existing customers with a postal code for Argleton" is a trivial piece of in-house software development, probably through SQL, this is a simplistic example. More complex searches such as "Find all product brochure text that references the Bindeez product" or "Search the customer-uploaded reviews for any synonyms of 'caught fire' and 'pets' or 'children'" are more difficult to implement. Search, especially free text search or text searching through images of scanned documents, is a specialist discipline.

Externally-provided search services

By outsourcing the search function to a specialist search company through software as a service, a more capable search function may be available to even the smallest organisation. Two methods are popular for this:

One method searches a company's publicly visible web presence. An existing search engine such as Bing or Google is encouraged to web crawl this site, as they would normally do so anyway. [1] A link to the company's favoured search partner is coded onto their web site as a simple HTML web form or search box. When a query is submitted, this search box searches the main Google (or other) corpus for the text string, but only for results from that particular web site. These results are then displayed on the site's page, as if they were returned by the site itself. This feature is very easily implemented: the search form simply includes a site: qualifier in the query string passed to the search engine. [2]

Search as a service

The second method is more sophisticated, although more complex. It can support enterprise search too, searching through private resources that are not visible to the public web. Only this form is commonly termed 'Search as a service'. A search provider company offers a search service and a contract is agreed with the client to support their searches. The client then uses the provider's API to upload content data or indexing metadata (if already available) for the content to be searched. The provider then constructs a search index for this content. If the content is free text data or similar unstructured data, then it is first tokenised by Lucene, or similar process.[i]

Search as a service may also be particularly useful for mobile applications, where the client device is limited for storage, processing speed and connection bandwidth. This approach is taken by Algolia, a popular player in the field. Alternately, newer service providers like ExpertRec[4] have further simplified the approach by avoiding having to upload data via API and instead by having data extracted by a crawler and then tokenised by Lucene/Solr.

Search as a service should not be confused with federated search, such as Z39.50. These are also services where an agent queries one or more external search engines. In these cases, however, the search engine providers are closely coupled to the content databases. The remoting service passes only the query and the results, not the content metadata to populate the search indexes.

See also

Notes

    Related Research Articles

    In general computing, a search engine is an information retrieval system designed to help find information stored on a computer system. It is an information retrieval software program that discovers, crawls, transforms, and stores information for retrieval and presentation in response to user queries. The search results are usually presented in a list and are commonly called hits. A search engine normally consists of four components, as follows: a search interface, a crawler, an indexer, and a database. The crawler traverses a document collection, deconstructs document text, and assigns surrogates for storage in the search engine index. Online search engines store images, link data and metadata for the document as well.

    <span class="mw-page-title-main">Desktop search</span>

    Desktop search tools search within a user's own computer files as opposed to searching the Internet. These tools are designed to find information on the user's PC, including web browser history, e-mail archives, text documents, sound files, images, and video. A variety of desktop search programs are now available; see this list for examples. Most desktop search programs are standalone applications. Desktop search products are software alternatives to the search software included in the operating system, helping users sift through desktop files, emails, attachments, and more.

    In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases.

    A video search engine is a web-based search engine which crawls the web for video content. Some video search engines parse externally hosted content while others allow content to be uploaded and hosted on their own servers. Some engines also allow users to search by video format type and by length of the clip. The video search results are usually accompanied by a thumbnail view of the video.

    Content Repository API for Java (JCR) is a specification for a Java platform application programming interface (API) to access content repositories in a uniform manner. The content repositories are used in content management systems to keep the content data and also the metadata used in content management systems (CMS) such as versioning metadata. The specification was developed under the Java Community Process as JSR-170, and as JSR-283. The main Java package is javax.jcr.

    <span class="mw-page-title-main">Web API</span> HTTP-based application programming interface used in web development

    A web API is an application programming interface (API) for either a web server or a web browser. As a web development concept, it can be related to a web application's client side. A server-side web API consists of one or more publicly exposed endpoints to a defined request–response message system, typically expressed in JSON or XML by means of an HTTP-based web server. A server API (SAPI) is not considered a server-side web API, unless it is publicly accessible by a remote web application.

    <span class="mw-page-title-main">Search engine</span> Software system that is designed to search for information on the World Wide Web

    A search engine is a software system that finds web pages that match a web search. It searches the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). The information may be a mix of hyperlinks to web pages, images, videos, infographics, articles, and other types of files. As of January 2022, Google is by far the world's most used search engine, with a market share of 90.6%, and the world's other most used search engines were Bing, Yahoo!, Baidu, Yandex, and DuckDuckGo.

    A database search engine is a search engine that operates on material stored in a digital database.

    The EB-eye, also known as EBI Search, is a search engine that provides uniform access to the biological data resources hosted at the European Bioinformatics Institute (EBI).

    <span class="mw-page-title-main">Apache Solr</span> Open-source enterprise-search platform

    Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.

    <span class="mw-page-title-main">BASE (search engine)</span> Academic search engine

    BASE is a multi-disciplinary search engine to scholarly internet resources, created by Bielefeld University Library in Bielefeld, Germany. It is based on free and open-source software such as Apache Solr and VuFind. It harvests OAI metadata from institutional repositories and other academic digital libraries that implement the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), and then normalizes and indexes the data for searching. In addition to OAI metadata, the library indexes selected web sites and local data collections, all of which can be searched via a single search interface.

    <span class="mw-page-title-main">SharePoint</span> Web application platform

    SharePoint is a web-based collaborative platform that integrates natively with Microsoft 365. Launched in 2001, SharePoint is primarily sold as a document management and storage system, although it is also used for sharing information through an intranet, implementing internal applications, and for implementing business processes.

    dtSearch Corp. is a software company which specializes in text retrieval software. It was founded in 1991, and is headquartered in Bethesda, Maryland. Its current range of software includes products for enterprise desktop search, Intranet/Internet spidering and search, and search engines for developers (SDK) to integrate into other software applications.

    <span class="mw-page-title-main">Windows Search</span> Desktop search platform by Microsoft

    Windows Search is a content index desktop search platform by Microsoft introduced in Windows Vista as a replacement for both the previous Indexing Service of Windows 2000 and the optional MSN Desktop Search for Windows XP and Windows Server 2003, designed to facilitate local and remote queries for files and non-file items in compatible applications including Windows Explorer. It was developed after the postponement of WinFS and introduced to Windows constituents originally touted as benefits of that platform.

    <span class="mw-page-title-main">Reverse image search</span> Content-based image retrieval

    Reverse image search is a content-based image retrieval (CBIR) query technique that involves providing the CBIR system with a sample image that it will then base its search upon; in terms of information retrieval, the sample image is very useful. In particular, reverse image search is characterized by a lack of search terms. This effectively removes the need for a user to guess at keywords or terms that may or may not return a correct result. Reverse image search also allows users to discover content that is related to a specific sample image or the popularity of an image, and to discover manipulated versions and derivative works.

    OpenSearchServer is an open-source application server allowing development of index-based applications such as search engines. Available since April 2009 on SourceForge for download, OpenSearchServer was developed under the GPL v3 license and offers a series of full text lexical analyzers. It can be installed on different platforms.

    <span class="mw-page-title-main">Elasticsearch</span> Search engine

    Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is dual-licensed under the source-available Server Side Public License and the Elastic license, while other parts fall under the proprietary (source-available) Elastic License. Official clients are available in Java, .NET (C#), PHP, Python, Ruby and many other languages. According to the DB-Engines ranking, Elasticsearch is the most popular enterprise search engine.

    Algolia is a proprietary search-as-a-service platform designed for use cases that require high quality and relevant search.

    Microsoft Azure Cognitive Search, formerly known as Azure Search, is a component of the Microsoft Azure Cloud Platform providing indexing and querying capabilities for data uploaded to Microsoft servers. The Search as a service framework is intended to provide developers with complex search capabilities for mobile and web development while hiding infrastructure requirements and search algorithm complexities. Azure Search is a recent addition to Microsoft's Infrastructure as a Service (IaaS) approach.

    References

    1. "Google Custom Search Engine".
    2. "Search operators". Google Inc.