Azure Cognitive Search

Last updated
Azure Cognitive Search
Developer(s) Microsoft
Available inEnglish
Type Indexing and querying cloud platform
Website azure.microsoft.com/en-us/services/search/

Microsoft Azure Cognitive Search, formerly known as Azure Search, is a component of the Microsoft Azure Cloud Platform providing indexing and querying capabilities for data uploaded to Microsoft servers. The Search as a service framework is intended to provide developers with complex search capabilities for mobile and web development while hiding infrastructure requirements and search algorithm complexities. Azure Search is a recent addition to Microsoft's Infrastructure as a Service (IaaS) approach.

Contents

History

In 2008 Microsoft released the Azure platform with a cloud based component code-named project Red Dog. [1] The years leading up to 2013 were spent developing the Azure framework within the scope of a Microsoft environment. In 2013 Microsoft issued a general announcement announcing IaaS and detailing new features of Azure, including the new Azure Search. [2]

Azure Search as a Service

Azure Search is an API based service that provides REST APIs via protocols such as OData or integrated libraries such as the .NET SDK. Primarily the service consists of the creation of data indexes and search requests within the index.

Data to be searched is uploaded into logical containers called indexes. An interface schema is created as part of the logical index container that provides the API hooks used to return search results with additional features integrated into Azure Search. Azure Search provides two different indexing engines: Microsofts own proprietary natural language processing technology or Apache Lucene analyzers. [3] The Microsoft search engine is ostensibly built on Elasticsearch. [4]

IaaS and PaaS

Azure offers both the platform via web interface (Platform as a Service) and the hardware via virtual servers allocated to Azure accounts for data storage and processing (Infrastructure as a Service). [5] Azure Search resides within the Microsoft IaaS and PaaS suite as a service, I.E. Search as a Service (SaaS).

Features

Queries

A search string can be specified as one of the query parameters to retrieve matching documents. Azure Search supports search strings using simple query syntax. [6] Supported features include logical operators, the suffix operator, and query with Lucene query syntax. [7] (currently in preview) As an example,

white+house 

will search for documents containing both "white" and "house". Lucene query syntax provides features similar to simple query syntax for logical operators and wildcard searches while also supporting more complicated functions such as proximity search and fuzzy search,

AI Enrichments

Pre-built AI powered enrichments (known as cognitive skills) can be used to extract text from images, blobs, and other unstructured data sources. Examples of built-in cognitive skills are: extraction of text from images, automatic language translation and extraction of named entities from text. Developers can also create custom skills and apply them to the AI enrichment pipeline. The main purpose of AI enrichments is to extract structure out of unstructured information in order to make it searchable.

Language Support

Azure Search currently supports 56 different languages. Each supported language extension is equipped with a text analyzer to account for differing characteristics pertaining to the specific language. Both analyzers backed by Lucene and analyzers backed by Microsofts natural language processing technology are supported. These analyzers provide features such as text segmentation, word normalization, and entity recognition when processing text documents. The list of supported languages can be found in the Microsoft Azure Documentation. [8]

Search Suggestions

Type-ahead queries or auto-complete search bars provide potential search terms while a user types. The suggestions capability is provided as an optional component specified within an index called a suggester construction. [9] The suggester construction provides information about the list of fields to be considered as content sources for suggestions.

Hit Highlighting

The snippet of text in the search results matching the search query can be highlighted by specifying a set of field names as one of the query parameters for hit highlighting.

Faceted Navigation

Faceted Navigation allows users to specify a field to facet in the query parameters passed to Azure Search. Users can drill down or filter search results by using criteria such as categories, prices and brand. There are several parameters providing customization of faceting capabilities such as sort and intervals. For example, if you specify

facet=rating, sort:-value

The returning results will contains all groups with a rating in descending order by value. Faceted navigation is common in most e-commerce sites such as Amazon. [10]

Geo-spatial Support

Azure Search supports geo-spatial information. This allows users to explore data based on a specified geographic location. An overview of Geo-spatial support can be found in Azure Search and Geo-spatial Data. [11]

Related Research Articles

Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a standard foundation for production search applications.

In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases.

Content Repository API for Java (JCR) is a specification for a Java platform application programming interface (API) to access content repositories in a uniform manner. The content repositories are used in content management systems to keep the content data and also the metadata used in content management systems (CMS) such as versioning metadata. The specification was developed under the Java Community Process as JSR-170, and as JSR-283. The main Java package is javax.jcr.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

Oracle Spatial and Graph, formerly Oracle Spatial, is a free option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping, facilities management, and geographic information systems (AM/FM/GIS), to wireless location services and location-enabled e-business. The graph features in Oracle Spatial and Graph include Oracle Network Data Model (NDM) graphs used in traditional network applications in major transportation, telcos, utilities and energy organizations and RDF semantic graphs used in social networks and social interactions and in linking disparate data sets to address requirements from the research, health sciences, finance, media and intelligence communities.

The EB-eye, also known as EBI Search, is a search engine that provides uniform access to the biological data resources hosted at the European Bioinformatics Institute (EBI).

<span class="mw-page-title-main">RDF4J</span>

Eclipse RDF4J is an open-source framework for storing, querying, and analysing RDF data. It was created by the Dutch software company Aduna as part of "On-To-Knowledge", a semantic web project that ran from 1999 to 2002. It contains implementations of an in-memory triplestore and an on-disk triplestore, along with two separate Servlet packages that can be used to manage and provide access to these triplestores, on a permanent server. The RDF4J Rio package contains a simple API for Java-based RDF parsers and writers. Parsers and writers for popular RDF serialisations are distributed along with RDF4J, and users can easily extend the list by putting their parsers and writers on the Java classpath when running their application.

<span class="mw-page-title-main">Apache Solr</span> Open-source enterprise-search platform

Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

<span class="mw-page-title-main">Windows Search</span> Desktop search platform by Microsoft

Windows Search is a content index and desktop search platform by Microsoft introduced in Windows Vista as a replacement for the previous Indexing Service of Windows 2000, Windows XP, and Windows Server 2003, designed to facilitate local and remote queries for files and non-file items in the Windows Shell and in compatible applications. It was developed after the postponement of WinFS and introduced to Windows several benefits of that platform.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

<span class="mw-page-title-main">Microsoft Azure</span> Cloud computing platform by Microsoft

Microsoft Azure, or just Azure, is the cloud computing platform developed by Microsoft. It offers management, access and development of applications and services to individuals, companies, and governments through its global infrastructure. It also provides a range of capabilities, including software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). Microsoft Azure supports many programming languages, tools, and frameworks, including Microsoft-specific and third-party software and systems.

<span class="mw-page-title-main">LGTE</span>

Lucene Geographic and Temporal (LGTE) is an information retrieval tool developed at Technical University of Lisbon which can be used as a search engine or as evaluation system for information retrieval techniques for research purposes. The first implementation powered by LGTE was the search engine of DIGMAP, a project co-funded by the community programme eContentplus between 2006 and 2008, which was aimed to provide services available on the web over old digitized maps from a group of partners over Europe including several National Libraries.

OpenSearchServer is an open-source application server allowing development of index-based applications such as search engines. Available since April 2009 on SourceForge for download, OpenSearchServer was developed under the GPL v3 license and offers a series of full text lexical analyzers. It can be installed on different platforms.

In computing, Open Data Protocol (OData) is an open protocol that allows the creation and consumption of queryable and interoperable Web service APIs in a standard way. Microsoft initiated OData in 2007. Versions 1.0, 2.0, and 3.0 are released under the Microsoft Open Specification Promise. Version 4.0 was standardized at OASIS, with a release in March 2014. In April 2015 OASIS submitted OData v4 and OData JSON Format v4 to ISO/IEC JTC 1 for approval as an international standard. In December 2016, ISO/IEC published OData 4.0 Core as ISO/IEC 20802-1:2016 and the OData JSON Format as ISO/IEC 20802-2:2016.

<span class="mw-page-title-main">LogicalDOC</span> Document management system

LogicalDOC is a proprietary cloud-based document management system that is designed to handle and share documents within an organization. LogicalDOC is a content repository, with Lucene indexing, Activiti workflow, and a set of automatic import procedures. The system was developed using Java technology.

<span class="mw-page-title-main">FUJITSU Cloud IaaS Trusted Public S5</span> Cloud computing platform

FUJITSU Cloud IaaS Trusted Public S5 is a Fujitsu cloud computing platform that aims to deliver standardized enterprise-class public cloud services globally. It offers Infrastructure-as-a-Service (IaaS) from Fujitsu's data centres to provide computing resources that can be employed on-demand and suited to customers needs.

Algolia is a French proprietary search-as-a-service platform, with its headquarters in San Francisco and offices in Paris and London. Its main product is a web search platform for individual websites.

<span class="mw-page-title-main">Cosmos DB</span> Cloud-based NoSQL database service

Azure Cosmos DB is a globally distributed, multi-model database service offered by Microsoft. It is designed to provide high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a NoSQL and vector database, which means it can handle unstructured, semi-structured, structured, and vector data types.

References

  1. Foley, Mary Jo. "Red Dog: Five questions with Microsoft mystery man Dave Cutler". ZDNet. Retrieved 2016-02-04.
  2. "Azure IaaS Goes GA: It's Time to Head to the Cloud | Applied Information Sciences Blog". 17 April 2013. Retrieved 2016-02-04.
  3. "Add language analyzers to string fields - Azure Cognitive Search".
  4. "Microsoft Azure Search Preview". Microsoft Enterprise Technologies. 12 February 2015. Retrieved 2016-02-04.
  5. "Azure Search 101 - Getting started with Azure Search with Liam Cavanagh". azure.microsoft.com. Retrieved 2016-02-04.
  6. "SimpleQueryParser (Lucene 4.7.0 API)". lucene.apache.org. Retrieved 2016-02-02.
  7. "org.apache.lucene.queryparser.classic (Lucene 4.10.2 API)". lucene.apache.org. Retrieved 2016-02-02.
  8. "Language support (Azure Search Service REST API)". msdn.microsoft.com. Retrieved 2016-02-04.
  9. "Suggesters". msdn.microsoft.com. Retrieved 2016-02-04.
  10. "Design better faceted navigation for your websites | Web design | Creative Bloq". www.creativebloq.com. Retrieved 2016-02-12.
  11. "Azure Search and Geospatial Data (Channel 9)". Channel 9. Retrieved 2016-02-04.