Azure Cognitive Search

Last updated
Azure Cognitive Search
Developer(s) Microsoft
Available inEnglish
Type Indexing and querying cloud platform
Website azure.microsoft.com/en-us/services/search/

Microsoft Azure Cognitive Search, formerly known as Azure Search, is a component of the Microsoft Azure Cloud Platform providing indexing and querying capabilities for data uploaded to Microsoft servers. The Search as a service framework is intended to provide developers with complex search capabilities for mobile and web development while hiding infrastructure requirements and search algorithm complexities. Azure Search is a recent addition to Microsoft's Infrastructure as a Service (IaaS) approach.

Contents

History

In 2008 Microsoft released the Azure platform with a cloud based component code-named project Red Dog. [1] The years leading up to 2013 were spent developing the Azure framework within the scope of a Microsoft environment. In 2013 Microsoft issued a general announcement announcing IaaS and detailing new features of Azure, including the new Azure Search. [2]

Azure Search as a Service

Azure Search is an API based service that provides REST APIs via protocols such as OData or integrated libraries such as the .NET SDK. Primarily the service consists of the creation of data indexes and search requests within the index.

Data to be searched is uploaded into logical containers called indexes. An interface schema is created as part of the logical index container that provides the API hooks used to return search results with additional features integrated into Azure Search. Azure Search provides two different indexing engines: Microsofts own proprietary natural language processing technology or Apache Lucene analyzers. [3] The Microsoft search engine is ostensibly built on Elasticsearch. [4]

IaaS and PaaS

Azure offers both the platform via web interface (Platform as a Service) and the hardware via virtual servers allocated to Azure accounts for data storage and processing (Infrastructure as a Service). [5] Azure Search resides within the Microsoft IaaS and PaaS suite as a service, I.E. Search as a Service (SaaS).

Features

Queries

A search string can be specified as one of the query parameters to retrieve matching documents. Azure Search supports search strings using simple query syntax. [6] Supported features include logical operators, the suffix operator, and query with Lucene query syntax. [7] (currently in preview) As an example,

white+house 

will search for documents containing both "white" and "house". Lucene query syntax provides features similar to simple query syntax for logical operators and wildcard searches while also supporting more complicated functions such as proximity search and fuzzy search,

AI Enrichments

Pre-built AI powered enrichments (known as cognitive skills) can be used to extract text from images, blobs, and other unstructured data sources. Examples of built-in cognitive skills are: extraction of text from images, automatic language translation and extraction of named entities from text. Developers can also create custom skills and apply them to the AI enrichment pipeline. The main purpose of AI enrichments is to extract structure out of unstructured information in order to make it searchable.

Language Support

Azure Search currently supports 56 different languages. Each supported language extension is equipped with a text analyzer to account for differing characteristics pertaining to the specific language. Both analyzers backed by Lucene and analyzers backed by Microsofts natural language processing technology are supported. These analyzers provide features such as text segmentation, word normalization, and entity recognition when processing text documents. The list of supported languages can be found in the Microsoft Azure Documentation. [8]

Search Suggestions

Type-ahead queries or auto-complete search bars provide potential search terms while a user types. The suggestions capability is provided as an optional component specified within an index called a suggester construction. [9] The suggester construction provides information about the list of fields to be considered as content sources for suggestions.

Hit Highlighting

The snippet of text in the search results matching the search query can be highlighted by specifying a set of field names as one of the query parameters for hit highlighting.

Faceted Navigation

Faceted Navigation allows users to specify a field to facet in the query parameters passed to Azure Search. Users can drill down or filter search results by using criteria such as categories, prices and brand. There are several parameters providing customization of faceting capabilities such as sort and intervals. For example, if you specify

facet=rating, sort:-value

The returning results will contains all groups with a rating in descending order by value. Faceted navigation is common in most e-commerce sites such as Amazon. [10]

Geo-spatial Support

Azure Search supports geo-spatial information. This allows users to explore data based on a specified geographic location. An overview of Geo-spatial support can be found in Azure Search and Geo-spatial Data. [11]

Related Research Articles

<span class="mw-page-title-main">Apache Nutch</span> Open source web crawler

Apache Nutch is a highly extensible and scalable open source web crawler software project.

Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a standard foundation for non-research search applications.

In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases.

Content Repository API for Java (JCR) is a specification for a Java platform application programming interface (API) to access content repositories in a uniform manner. The content repositories are used in content management systems to keep the content data and also the metadata used in content management systems (CMS) such as versioning metadata. The specification was developed under the Java Community Process as JSR-170, and as JSR-283. The main Java package is javax.jcr.

Oracle Spatial and Graph, formerly Oracle Spatial, is a free option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping, facilities management, and geographic information systems (AM/FM/GIS), to wireless location services and location-enabled e-business. The graph features in Oracle Spatial and Graph include Oracle Network Data Model (NDM) graphs used in traditional network applications in major transportation, telcos, utilities and energy organizations and RDF semantic graphs used in social networks and social interactions and in linking disparate data sets to address requirements from the research, health sciences, finance, media and intelligence communities.

The EB-eye, also known as EBI Search, is a search engine that provides uniform access to the biological data resources hosted at the European Bioinformatics Institute (EBI).

<span class="mw-page-title-main">RDF4J</span>

Eclipse RDF4J is an open-source framework for storing, querying, and analysing RDF data. It was created by the Dutch software company Aduna as part of "On-To-Knowledge", a semantic web project that ran from 1999 to 2002. It contains implementations of an in-memory triplestore and an on-disk triplestore, along with two separate Servlet packages that can be used to manage and provide access to these triplestores, on a permanent server. The RDF4J Rio package contains a simple API for Java-based RDF parsers and writers. Parsers and writers for popular RDF serialisations are distributed along with RDF4J, and users can easily extend the list by putting their parsers and writers on the Java classpath when running their application.

<span class="mw-page-title-main">Apache Solr</span> Open-source enterprise-search platform

Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.

Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

<span class="mw-page-title-main">Windows Search</span> Desktop search platform by Microsoft

Windows Search is a content index desktop search platform by Microsoft introduced in Windows Vista as a replacement for both the previous Indexing Service of Windows 2000 and the optional MSN Desktop Search for Windows XP and Windows Server 2003, designed to facilitate local and remote queries for files and non-file items in compatible applications including Windows Explorer. It was developed after the postponement of WinFS and introduced to Windows constituents originally touted as benefits of that platform.

<span class="mw-page-title-main">Microsoft Azure</span> Cloud computing service created by Microsoft

Microsoft Azure, often referred to as Azure, is a cloud computing service operated by Microsoft for application management via Microsoft-managed data centers. It provides software as a service (SaaS), platform as a service (PaaS) and infrastructure as a service (IaaS) and supports many different programming languages, tools, and frameworks, including both Microsoft-specific and third-party software and systems.

<span class="mw-page-title-main">LGTE</span>

Lucene Geographic and Temporal (LGTE) is an information retrieval tool developed at Technical University of Lisbon which can be used as a search engine or as evaluation system for information retrieval techniques for research purposes. The first implementation powered by LGTE was the search engine of DIGMAP, a project co-funded by the community programme eContentplus between 2006 and 2008, which was aimed to provide services available on the web over old digitized maps from a group of partners over Europe including several National Libraries.

OpenSearchServer is an open-source application server allowing development of index-based applications such as search engines. Available since April 2009 on SourceForge for download, OpenSearchServer was developed under the GPL v3 license and offers a series of full text lexical analyzers. It can be installed on different platforms.

<span class="mw-page-title-main">Apache Hive</span> Database engine

Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries (HiveQL) into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids portability of SQL-based applications to Hadoop. While initially developed by Facebook, Apache Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive included in Amazon Elastic MapReduce on Amazon Web Services.

<span class="mw-page-title-main">LogicalDOC</span> Document management system

LogicalDOC is a proprietary document management system that is designed to handle and share documents within an organization. LogicalDOC is a content repository, with Lucene indexing, Activiti workflow, and a set of automatic import procedures. The system was developed using Java technology.

<span class="mw-page-title-main">FUJITSU Cloud IaaS Trusted Public S5</span> Cloud computing platform

FUJITSU Cloud IaaS Trusted Public S5 is a Fujitsu cloud computing platform that aims to deliver standardized enterprise-class public cloud services globally. It offers Infrastructure-as-a-Service (IaaS) from Fujitsu's data centres to provide computing resources that can be employed on-demand and suited to customers' needs. The service ensures a high level of reliability that is sufficient for deployment in mission-critical systems.

The following tables compare the major enterprise search software vendors in their classes.

Algolia is a proprietary search engine offering, usable through the software as a service (SaaS) model.

Azure Cosmos DB is Microsoft's proprietary globally distributed, multi-model database service "for managing data at planet-scale" launched in May 2017. It is schema-agnostic, horizontally scalable, and generally classified as a NoSQL database.

References

  1. Foley, Mary Jo. "Red Dog: Five questions with Microsoft mystery man Dave Cutler | ZDNet". ZDNet. Retrieved 2016-02-04.
  2. "Azure IaaS Goes GA: It's Time to Head to the Cloud | Applied Information Sciences Blog". 17 April 2013. Retrieved 2016-02-04.
  3. "Add language analyzers to string fields - Azure Cognitive Search".
  4. "Microsoft Azure Search Preview". Microsoft Enterprise Technologies. 12 February 2015. Retrieved 2016-02-04.
  5. "Azure Search 101 - Getting started with Azure Search with Liam Cavanagh". azure.microsoft.com. Retrieved 2016-02-04.
  6. "SimpleQueryParser (Lucene 4.7.0 API)". lucene.apache.org. Retrieved 2016-02-02.
  7. "org.apache.lucene.queryparser.classic (Lucene 4.10.2 API)". lucene.apache.org. Retrieved 2016-02-02.
  8. "Language support (Azure Search Service REST API)". msdn.microsoft.com. Retrieved 2016-02-04.
  9. "Suggesters". msdn.microsoft.com. Retrieved 2016-02-04.
  10. "Design better faceted navigation for your websites | Web design | Creative Bloq". www.creativebloq.com. Retrieved 2016-02-12.
  11. "Azure Search and Geospatial Data (Channel 9)". Channel 9. Retrieved 2016-02-04.