Search appliance

Last updated July 29, 2024

A search appliance (SA) is a type of computer appliance which is attached to a corporate network for the purpose of indexing the content shared across that network in a way that is similar to a web search engine.^[1]^[2]

Architecture

A search appliance is usually made up of several components. These include a gathering component, a standardizing component, a data storage area, a search component, a user interface component, and a management interface component:^[3]

The gathering component is usually a web crawler or file crawler that goes out on a network or the Internet and gathers files and data from specified locations. This might include SMB shared directories, NFS shared directories, databases, and web pages. The crawler might either copy files to the search appliance, or only copy the metadata about the file.
A standardizing component takes the data from the gathering component and transposes it into a standardized format for storage in the data storage component. It then places it in the data storage area.
The data storage component holds metadata about the files and might also contain copies of the actual file or data as well as the metadata about the file.
The search component searches through the stored metadata from the files and provides the information to the search interface in the form of query results. It also can provide links to the copies of the files stored on the search appliance, or it can provide links to the original files in the source locations.
The search interface is the component where users compose their search queries. It provides instructions to the search component and displays query results to the user.
The management interface lets administrators manage user accounts, permissions, adding and deleting search indexes, crawl job scheduling, and other relevant functions.

Related Research Articles

In computing, a search engine is an information retrieval software system designed to help find information stored on one or more computer systems. Search engines discover, crawl, transform, and store information for retrieval and presentation in response to user queries. The search results are usually presented in a list and are commonly called hits. The most widely used type of search engine is a web search engine, which searches for information on the World Wide Web.

<span class="mw-page-title-main">Web crawler</span> Software which systematically browses the World Wide Web

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

WinFS was the code name for a canceled data storage and management system project based on relational databases, developed by Microsoft and first demonstrated in 2003. It was intended as an advanced storage subsystem for the Microsoft Windows operating system, designed for persistence and management of structured, semi-structured and unstructured data.

<span class="mw-page-title-main">Metasearch engine</span> Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

Yahoo! Search is a search engine owned and operated by Yahoo!, using Microsoft Bing to power results.

Enterprise content management (ECM) extends the concept of content management by adding a timeline for each content item and, possibly, enforcing processes for its creation, approval, and distribution. Systems using ECM generally provide a secure repository for managed items, analog or digital. They also include one methods for importing content to manage new items, and several presentation methods to make items available for use. Although ECM content may be protected by digital rights management (DRM), it is not required. ECM is distinguished from general content management by its cognizance of the processes and procedures of the enterprise for which it is created.

YaCy is a free distributed search engine built on the principles of peer-to-peer (P2P) networks, created by Michael Christen in 2003. The engine is written in Java and distributed on several hundred computers, as of September 2006, so-called YaCy-peers.

IBM Storage Protect is a data protection platform that gives enterprises a single point of control and administration for backup and recovery. It is the flagship product in the IBM Spectrum Protect family.

A video search engine is a web-based search engine which crawls the web for video content. Some video search engines parse externally hosted content while others allow content to be uploaded and hosted on their own servers. Some engines also allow users to search by video format type and by length of the clip. The video search results are usually accompanied by a thumbnail view of the video.

A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.

Content-addressable storage (CAS), also referred to as content-addressed storage or fixed-content storage, is a way to store information so it can be retrieved based on its content, not its name or location. It has been used for high-speed storage and retrieval of fixed content, such as documents stored for compliance with government regulations. Content-addressable storage is similar to content-addressable memory.

In computer science, the semantic desktop is a collective term for ideas related to changing a computer's user interface and data handling capabilities so that data are more easily shared between different applications or tasks and so that data that once could not be automatically processed by a computer could be. It also encompasses some ideas about being able to share information automatically between different people. This concept is very much related to the Semantic Web, but is distinct insofar as its main concern is the personal use of information.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

Windows Search is a content index and desktop search platform by Microsoft introduced in Windows Vista as a replacement for the previous Indexing Service of Windows 2000, Windows XP, and Windows Server 2003, designed to facilitate local and remote queries for files and non-file items in the Windows Shell and in compatible applications. It was developed after the postponement of WinFS and introduced to Windows several benefits of that platform.

A single-page application (SPA) is a web application or website that interacts with the user by dynamically rewriting the current web page with new data from the web server, instead of the default method of a web browser loading entire new pages. The goal is faster transitions that make the website feel more like a native app.

The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data, adverse events, metagenomic data, etc. Currently it is supported and continuously developed by US Food and Drug Administration, George Washington University, and by DNA-HIVE, WHISE-Global and Embleema. HIVE currently operates fully functionally within the US FDA supporting wide variety (+60) of regulatory research and regulatory review projects as well as for supporting MDEpiNet medical device postmarket registries. Academic deployments of HIVE are used for research activities and publications in NGS analytics, cancer research, microbiome research and in educational programs for students at GWU. Commercial enterprises use HIVE for oncology, microbiology, vaccine manufacturing, gene editing, healthcare-IT, harmonization of real-world data, in preclinical research and clinical studies.

Nirvana was virtual object storage software developed and maintained by General Atomics.

References

↑ "Google and Thunderstone deliver plug and search to the enterprise", Infoworld.com October 2004.
↑ "Googles Mini search appliance", ZDnet.com April 2005 ^{[ dead link ]}
↑ "Search Appliance 2.1 Configuration Guide". Archived from the original on 2012-04-02. Retrieved 2011-09-19.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Google and Thunderstone deliver plug and search to the enterprise", Infoworld.com October 2004.

[2] "Googles Mini search appliance", ZDnet.com April 2005 ^{[ dead link ]}

[3] "Search Appliance 2.1 Configuration Guide". Archived from the original on 2012-04-02. Retrieved 2011-09-19.

[1]

[2]

[3]

Search appliance

Contents

Architecture

See also

Related Research Articles

References