Sphinx (search engine)

Last updated

Sphinx
Developer(s) Andrew Aksyonoff
Initial release2001;22 years ago (2001)
Stable release
3.5.1 [1]   OOjs UI icon edit-ltr-progressive.svg / 3 February 2023;8 months ago (3 February 2023)
Written in C++
Operating system Linux, Windows, Solaris, FreeBSD, NetBSD, Mac OS, AIX
Type Search and index
License GPLv2 until version 2 and commercial; proprietary since version 3
Website sphinxsearch.com   OOjs UI icon edit-ltr-progressive.svg

Sphinx is a fulltext search engine that provides text search functionality to client applications.

Contents

Overview

Sphinx can be used either as a stand-alone server or as a storage engine ("SphinxSE") for the MySQL family of databases. When run as a standalone server Sphinx operates similar to a DBMS and can communicate with MySQL, MariaDB and PostgreSQL through their native protocols or with any ODBC-compliant DBMS via ODBC. MariaDB, a fork of MySQL, is distributed with SphinxSE. [2]

SphinxAPI

If Sphinx is run as a stand-alone server, it is possible to use SphinxAPI to connect an application to it. Official implementations of the API are available for PHP, Java, Perl, Ruby and Python languages. Unofficial implementations for other languages, as well as various third party [3] plugins and modules are also available. Other data sources can be indexed via pipe in a custom XML format. [4]

SphinxQL

The Sphinx search daemon supports the MySQL binary network protocol and can be accessed with the regular MySQL API and/or clients. Sphinx supports a subset of SQL known as SphinxQL. It supports standard querying of all index types with SELECT, modifying RealTime indexes with INSERT, REPLACE, and DELETE, and more.

SphinxSE

Sphinx can also provide a special storage engine for MariaDB and MySQL databases. This allows those MySQL, MariaDB to communicate with Sphinx's searchd to run queries and obtain results. Sphinx indices are treated like regular SQL tables. The SphinxSE storage engine is shipped with MariaDB.

Full-text fields and indexing

Sphinx is configured to examine a data set via its Indexer. The Indexer process creates a full-text index (a special data structure that enables quick keyword searches) from the given data/text. Full-text fields are the resulting content that is indexed by Sphinx; they can be (quickly) searched for keywords. Fields are named, and you can limit your searches to a single field (e.g. search through "title" only) or a subset of fields (e.g. to "title" and "abstract" only). Sphinx's index format generally supports up to 256 fields. Note that the original data is not stored in the Sphinx index, but are discarded during the Indexing process; Sphinx assumes that you store those contents elsewhere.

Attributes

Attributes are additional values associated with each document that can be used to perform additional filtering and sorting during search. Attributes are named. Attribute names are case insensitive. Attributes are not full-text indexed; they are stored in the index as is. Currently supported attribute types are:

(since 1.10-beta);

(since 2.1.1-beta); [5] [6]

JSON attributes in Sphinx

Sphinx, like classic SQL databases, works with a so-called fixed schema, that is, a set of predefined attribute columns. These work well when most of the data stored actually has values: mapping sparse data to static columns can be cumbersome. Assume for example that you're running a price comparison or an auction site with many different products categories. Some of the attributes like the price or the vendor are identical across all goods. But from there, for laptops, you also need to store the weight, screen size, HDD type, RAM size, etc. And, say, for shovels, you probably want to store the color, the handle length, and so on. So it's manageable across a single category, but all the distinct fields that you need for all the goods across all the categories are legion. The JSON field can be used to overcome this. Inside the JSON attribute you don't need a fixed structure. You can have various keys which may or may not be present in all documents. When you try to filter on one of these keys, Sphinx will ignore documents that don't have the key in the JSON attribute and will work only with those documents that have it.

License

Up until version 3, Sphinx is dual licensed; either:

  1. GNU General Public License version 2 or
  2. proprietary licensing is available for use-cases which are not within the terms of the GNU GPLv2.

Since version 3, Sphinx has become proprietary, with a promise to release its source code in the future [7]

Sphinx use examples

Feature list

Performance and scalability

Fork

In 2017, key members of the original Sphinx team formed a fork of the project called Manticore. [18] [19] The Manticore team has set itself the following goal: to deliver fast, stable and powerful free software for full text search. Manticore team keep it's fork open source, releasing it under GPLv2 license [20] as opposed to Original Sphinx search, which closing the source from the third version.

See also

Related Research Articles

<span class="mw-page-title-main">MySQL</span> SQL database engine software

MySQL is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer's storage system, manages users, allows for network access and facilitates testing database integrity and creation of backups.

In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An application written using ODBC can be ported to other platforms, both on the client and server side, with few changes to the data access code.

<span class="mw-page-title-main">SQLite</span> Serverless relational database management system (RDBMS)

SQLite is a database engine written in the C programming language. It is not a standalone app; rather, it is a library that software developers embed in their apps. As such, it belongs to the family of embedded databases. It is the most widely deployed database engine, as it is used by several of the top web browsers, operating systems, mobile phones, and other embedded systems.

A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL).

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

<span class="mw-page-title-main">Microsoft Data Access Components</span> Framework

Microsoft Data Access Components is a framework of interrelated Microsoft technologies that allows programmers a uniform and comprehensive way of developing applications that can access almost any data store. Its components include: ActiveX Data Objects (ADO), OLE DB, and Open Database Connectivity (ODBC). There have been several deprecated components as well, such as the Jet Database Engine, MSDASQL, and Remote Data Services (RDS). Some components have also become obsolete, such as the former Data Access Objects API and Remote Data Objects.

SQLyog is a GUI tool for the RDBMS MySQL. It is developed by Webyog, Inc., based in Bangalore, India, and Santa Clara, California. SQLyog is being used by more than 30,000 customers worldwide and has been downloaded more than 2,000,000 times.

<span class="mw-page-title-main">Apache Solr</span> Open-source enterprise-search platform

Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.

<span class="mw-page-title-main">Apache CouchDB</span> Document-oriented NoSQL database

Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

<span class="mw-page-title-main">Windows Search</span> Desktop search platform by Microsoft

Windows Search is a content index desktop search platform by Microsoft introduced in Windows Vista as a replacement for both the previous Indexing Service of Windows 2000 and the optional MSN Desktop Search for Windows XP and Windows Server 2003, designed to facilitate local and remote queries for files and non-file items in compatible applications including Windows Explorer. It was developed after the postponement of WinFS and introduced to Windows constituents originally touted as benefits of that platform.

A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.

<span class="mw-page-title-main">MarkLogic Server</span>

MarkLogic Server is a document-oriented database developed by MarkLogic. It is a NoSQL multi-model database that evolved from an XML database to natively store JSON documents and RDF triples, the data model for semantics. MarkLogic is designed to be a data hub for operational and analytical data.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

<span class="mw-page-title-main">Amazon DynamoDB</span> NoSQL database service

Amazon DynamoDB is a fully managed proprietary NoSQL database offered by Amazon.com as part of the Amazon Web Services portfolio. DynamoDB offers a fast persistent Key-Value Datastore with built-in support for replication, autoscaling, encryption at rest, and on-demand backup among other features.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

Azure Cosmos DB is a globally distributed, multi-model database service offered by Microsoft. It is designed to provide high availability, scalability, and low-latency access to data for mission-critical applications. Unlike traditional relational databases, Cosmos DB is a NoSQL database, which means it can handle unstructured and semi-structured data types.

The following outline is provided as an overview of and topical guide to MySQL:

RavenDB is an open-source fully ACID document-oriented database written in C#, developed by Hibernating Rhinos Ltd. It is cross-platform, supported on Windows, Linux, and Mac OS. RavenDB stores data as JSON documents and can be deployed in distributed clusters with master-master replication.

References

  1. "Feb 3, 2023. Sphinx 3.5.1 released" . Retrieved 15 June 2023.
  2. "AskMonty: About SphinxSE". kb.askmonty.org. Monty Program AB. Retrieved 16 August 2013.
  3. "Sphinx Wiki: Third Party Tools". sphinxsearch.com. Sphinx Search Wiki. Retrieved 16 August 2013.
  4. "xmlpipe2". sphinxsearch.com. Sphinx Search Documentation. Retrieved 16 August 2013.
  5. "JSON Attributes in Sphinx 2.1.1". sphinxsearch.com. Sphinx Search Blog. 7 February 2013. Retrieved 16 August 2013.
  6. "Full JSON Support in Trunk". sphinxsearch.com. Sphinx Search Blog. 8 August 2013. Retrieved 16 August 2013.
  7. "Sphinx | Open Source Search Server".
  8. "Sphinx at Craigslist". craigslist.org. Craigslist. Retrieved 17 August 2013.
  9. "GM Recruitment". aleph-networks.com. Aleph-networks. Retrieved 1 October 2012.
  10. "Lightning Fast PHP Site Search". tradebit.com. Tradebit. Retrieved 17 August 2013.
  11. "Sphinx Search beta for Vbulletin 4.0". vbulletin.com. Vbulletin. Retrieved 17 August 2013.
  12. "Sphinx Search Extension for MediaWiki". mediawiki.org. MediaWiki: Svemir Brkic, Paul Grinberg. Retrieved 17 August 2013.
  13. "Powered by Sphinx Search: Boardreader". sphinxsearch.com. Sphinx Search. Retrieved 17 August 2013.
  14. 1 2 "Powered by Sphinx". sphinxsearch.com/.
  15. "About Sphinx". sphinxsearch.com. Sphinx Search. Retrieved 16 August 2013.
  16. 1 2 "Powered by Sphinx". sphinxsearch.com. Sphinx Search. Retrieved 10 May 2015.
  17. "Craigslist: Factsheet". craigslist.org. Craigslist. Archived from the original on 5 August 2012. Retrieved 16 August 2013.
  18. "About Manticore Search". manticoresearch.com. Retrieved 24 April 2023.
  19. "Manticore Search — форк Sphinx: отчёт за 3 года". Хабр (in Russian). Retrieved 24 April 2023.
  20. "Сравниваем поисковики для сайта: Manticore и Sphinx". Журнал (in Russian). Retrieved 24 April 2023.

Further reading