Multimodal search

Last updated June 03, 2024

Multimodal search is a type of search that uses different methods to get relevant results. They can use any kind of search, search by keyword, search by concept, search by example, etc.

Introduction

A multimodal search engine is designed to imitate the flexibility and agility of how the human mind works to create, process and refuse irrelevant ideas. So, the more elements you have in the input of the search engine to compare, the more accurate the results can be. Multimodal search engines use different inputs of different nature and methods of search at the same time with the possibility of combining the results by merging all of the input elements of the search. There are also engines that can use a feedback of the results with the evaluation of the user to perform a more appropriate and relevant search.

Nowadays, mobile devices have been developed to a point that they can perform infinite functions from any place at any time, thanks to the internet and GPS connections. Touch screens, motion sensors and voice recognition are now featured on mobile devices called smartphones. All the features and functions make it possible to execute multimodal searches from any place in the world at any time.

Search elements

The use of text is an option, as well as multimedia searching, image, video, audio, and voice search. Even the location of the user can help the search engine to perform a more effective search, adaptable to every situation. Nowadays, different ways to interact with a search engine are being discovered, in terms of input elements of the search and in the variety of results obtained.

Personal context

Many queries from mobiles are location-based (LBS), that use the location of the user to interact with the applications. If available, the browser uses the device GPS, or computes an approximate location based on cell tower triangulation, with the permission of the user, who must be agree to share his/her location with the application in the download. Therefore, multimodal searches use not only audiovisual content that the user provides directly, but also the context where the user is, like his/her location, language, time at the moment, web site or document where the user is surfing, or other elements that can help to improve of a search in every situation.

Classification of the results

The multimodal search engine works in parallel, whilst at the same time, performs a search of more to less relevance of every element introduced directly or indirectly (personal context). Afterwards, it provides a combination of all the results, merging every element with its associated weight for every descriptor.

The engine analyzes every element and tags them, so a comparison of the tags can be made with existent indexed information in databases. A classification of the results proceeds, to show them from more to less relevance.

It’s necessary to define the importance of every input element. There are search engines that do this automatically, however there are also engines where the user can do it manually, giving more or less weight to every element of the search. It’s also important that the user provides the appropriate and essential information for the search; too much information can confuse the system and provide unsatisfactory results. With multimodal searches users can get better results than with a simple search, but multimodal searches must process more input information. It can also spend more time to process it and require more memory space.

An efficient search engine interprets the query of the users, realizes his/her intention and applies a strategy to use an appropriate search, i.e. the engine adapts to every input query and also to the combination of the elements and methods.

Applications

Nowadays, existing multimodal search engines are not very complex, and some of them are in an experimental phase. Some of the more simple engines are Google Images or Bing , web interfaces that use text and images as inputs to find images in the output.

MMRetrieval is a multimodal experimental search engine that uses multilingual and multimedia information through a web interface. The engine searches the different inputs in parallel and merges all the results by different chosen methods. The engine also provides different multistage retrieval, as well as a single text index baseline to be able to compare all the different phases of search.

There are a lot of applications for mobile devices, using the context of the user, like based-location services, and using also text, images, audios or videos that the user provides at the moment or with saved files, or even interacting with the voice.

Related Research Articles

Google Search is a search engine operated by Google. It allows users to search for information on the Internet by entering keywords or phrases. Google Search uses algorithms to analyze and rank websites based on their relevance to the search query. It is the most popular search engine worldwide.

Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

In computing, a search engine is an information retrieval software system designed to help find information stored on one or more computer systems. Search engines discover, crawl, transform, and store information for retrieval and presentation in response to user queries. The search results are usually presented in a list and are commonly called hits. The most widely used type of search engine is a web search engine, which searches for information on the World Wide Web.

<span class="mw-page-title-main">Metasearch engine</span> Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

<span class="mw-page-title-main">Autocomplete</span> Computing feature predicting ending to a word a user is typing

Autocomplete, or word completion, is a feature in which an application predicts the rest of a word a user is typing. In Android and iOS smartphones, this is called predictive text. In graphical user interfaces, users can typically press the tab key to accept a suggestion or the down arrow key to accept one of several.

A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML document, choosing the appearance of a page, or jumping to positions in multimedia content.

Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data.

A video search engine is a web-based search engine which crawls the web for video content. Some video search engines parse externally hosted content while others allow content to be uploaded and hosted on their own servers. Some engines also allow users to search by video format type and by length of the clip. The video search results are usually accompanied by a thumbnail view of the video.

A search engine results page (SERP) is a webpage that is displayed by a search engine in response to a query by a user. The main component of a SERP is the listing of results that are returned by the search engine in response to a keyword query.

Multimedia search enables information search using queries in multiple data types including text and other multimedia formats. Multimedia search can be implemented through multimodal search interfaces, i.e., interfaces that allow to submit search queries not only as textual requests, but also through other media. We can distinguish two methodologies in multimedia search:

Social search is a behavior of retrieving and searching on a social searching engine that mainly searches user-generated content such as news, videos and images related search queries on social media like Facebook, LinkedIn, Twitter, Instagram and Flickr. It is an enhanced version of web search that combines traditional algorithms. The idea behind social search is that instead of ranking search results purely based on semantic relevance between a query and the results, a social search system also takes into account social relationships between the results and the searcher. The social relationships could be in various forms. For example, in LinkedIn people search engine, the social relationships include social connections between searcher and each result, whether or not they are in the same industries, work for the same companies, belong the same social groups, and go the same schools, etc.

Query expansion (QE) is the process of reformulating a given query to improve retrieval performance in information retrieval operations, particularly in the context of query understanding. In the context of search engines, query expansion involves evaluating a user's input and expanding the search query to match additional documents. Query expansion involves techniques such as:

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

Ranking of query is one of the fundamental problems in information retrieval (IR), the scientific/engineering discipline behind search engines. Given a query $q$ and a collection $D$ of documents that match the query, the problem is to rank, that is, sort, the documents in $D$ according to some criterion so that the "best" results appear early in the result list displayed to the user. Ranking in terms of information retrieval is an important concept in computer science and is used in many different applications such as search engine queries and recommender systems. A majority of search engines use ranking algorithms to provide users with accurate and relevant results.

<span class="mw-page-title-main">Reverse image search</span> Content-based image retrieval

Reverse image search is a content-based image retrieval (CBIR) query technique that involves providing the CBIR system with a sample image that it will then base its search upon; in terms of information retrieval, the sample image is very useful. In particular, reverse image search is characterized by a lack of search terms. This effectively removes the need for a user to guess at keywords or terms that may or may not return a correct result. Reverse image search also allows users to discover content that is related to a specific sample image or the popularity of an image, and to discover manipulated versions and derivative works.

Spatial contextual awareness consociates contextual information such as an individual's or sensor's location, activity, the time of day, and proximity to other people or objects and devices. It is also defined as the relationship between and synthesis of information garnered from the spatial environment, a cognitive agent, and a cartographic map. The spatial environment is the physical space in which the orientation or wayfinding task is to be conducted; the cognitive agent is the person or entity charged with completing a task; and the map is the representation of the environment which is used as a tool to complete the task.

Multimodal Architecture and Interfaces is an open standard developed by the World Wide Web Consortium since 2005. It was published as a Recommendation of the W3C on October 25, 2012. The document is a technical report specifying a multimodal system architecture and its generic interfaces to facilitate integration and multimodal interaction management in a computer system. It has been developed by the W3C's Multimodal Interaction Working Group.

Contextual search is a form of optimizing web-based search results based on context provided by the user and the computer being used to enter the query. Contextual search services differ from current search engines based on traditional information retrieval that return lists of documents based on their relevance to the query. Rather, contextual search attempts to increase the precision of results based on how valuable they are to individual users.

A 3D Content Retrieval system is a computer system for browsing, searching and retrieving three dimensional digital contents from a large database of digital images. The most original way of doing 3D content retrieval uses methods to add description text to 3D content files such as the content file name, link text, and the web page title so that related 3D content can be found through text retrieval. Because of the inefficiency of manually annotating 3D files, researchers have investigated ways to automate the annotation process and provide a unified standard to create text descriptions for 3D contents. Moreover, the increase in 3D content has demanded and inspired more advanced ways to retrieve 3D information. Thus, shape matching methods for 3D content retrieval have become popular. Shape matching retrieval is based on techniques that compare and contrast similarities between 3D models.

Local search engine optimization is similar to (national) SEO in that it is also a process affecting the visibility of a website or a web page in a web search engine's unpaid results often referred to as "natural", "organic", or "earned" results. In general, the higher ranked on the search results page and more frequently a site appears in the search results list, the more visitors it will receive from the search engine's users; these visitors can then be converted into customers. Local SEO, however, differs in that it is focused on optimizing a business's online presence so that its web pages will be displayed by search engines when users enter local searches for its products or services. Ranking for local search involves a similar process to general SEO but includes some specific elements to rank a business for local search.

References

Query-Adaptive Fusion for Multimodal Search,Lyndon Kennedy, Student Member IEEE, Shih-Fu Chang, Fellow IEEE, and Apostol Natsev
Context-aware Querying for Multimodal Search Engines, Jonas Etzold, Arnaud Brousseau, Paul Grimm and Thomas Steiner Archived 2012-04-15 at the Wayback Machine
Apply Multimodal Search and Relevance Feedback In a Digital Video Library, Thesis of Yu Zhong
Aplicació rica d’internet per a la consulta amb text i imatge al repositori de vídeos de la Corporació Catalana de Mitjans Audiovisuals, Ramon Salla, Universitat Politècnica de Catalunya

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.