A web query or web search query is a query that a user enters into a web search engine to satisfy their information needs. Web search queries are distinctive in that they are often plain text and boolean search directives are rarely used. They vary greatly from standard query languages, which are governed by strict syntax rules as command languages with keyword or positional parameters.
There are three broad categories that cover most web search queries: informational, navigational, and transactional. [1] These are also called "do, know, go." [2] Although this model of searching was not theoretically derived, the classification has been empirically validated with actual search engine queries. [3]
Search engines often support a fourth type of query that is used far less frequently:
Most commercial web search engines do not disclose their search logs, so information about what users are searching for on the Web is difficult to come by. [5] Nevertheless, research studies started to appear in 1998. [6] [7] A 2001 study, [8] which analyzed the queries from the Excite search engine, showed some interesting characteristics of web searches:
A study of the same Excite query logs revealed that 19% of the queries contained a geographic term (e.g., place names, zip codes, geographic features, etc.). [9]
Studies also show that, in addition to short queries (queries with few terms), there are predictable patterns of how users change their queries. [10]
A 2005 study of Yahoo's query logs revealed that 33% of the queries from the same users were repeat queries and that in 87% of cases the user would click on the same result. [11] This suggests that many users use repeat queries to revisit or re-find information. This analysis is confirmed by a Bing search engine blog post which stated that about 30% of queries are navigational queries. [12]
In addition, research has shown that query term frequency distributions conform to the power law, or long tail distribution curves. That is, a small portion of the terms observed in a large query log (e.g. > 100 million queries) are used most often, while the remaining terms are used less often individually. [13] This example of the Pareto principle (or 80–20 rule) allows search engines to employ optimization techniques such as index or database partitioning, caching and pre-fetching. In addition, studies have been conducted into linguistically-oriented attributes that can recognize if a web query is navigational, informational or transactional. [14]
A 2011 study found that the average length of queries had grown steadily over time and the average length of non-English language queries had increased more than English ones. [15] Google implemented the hummingbird update in August 2013 to handle longer search queries since more searches are conversational (e.g. "where is the nearest coffee shop?"). [16]
With search engines that support Boolean operators and parentheses, a technique traditionally used by librarians can be applied. A user who is looking for documents that cover several topics or facets may want to describe each of them by a disjunction of characteristic words, such as vehicles OR cars OR automobiles
. A faceted query is a conjunction of such facets; e.g. a query such as (electronic OR computerized OR DRE) AND (voting OR elections OR election OR balloting OR electoral)
is likely to find documents about electronic voting even if they omit one of the words "electronic" or "voting", or even both. [17]
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.
In information science and information retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user. Relevance may include concerns such as timeliness, authority or novelty of the result.
A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve information. A well known example is the Structured Query Language (SQL).
In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases.
A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.
A search engine results page (SERP) is a webpage that is displayed by a search engine in response to a query by a user. The main component of a SERP is the listing of results that are returned by the search engine in response to a keyword query.
Exploratory search is a specialization of information exploration which represents the activities carried out by searchers who are:
Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user feedback, and to use information about whether or not those results are relevant to perform a new query. We can usefully distinguish between three types of feedback: explicit feedback, implicit feedback, and blind or "pseudo" feedback.
Social search is a behavior of retrieving and searching on a social searching engine that mainly searches user-generated content such as news, videos and images related search queries on social media like Facebook, LinkedIn, Twitter, Instagram and Flickr. It is an enhanced version of web search that combines traditional algorithms. The idea behind social search is that instead of ranking search results purely based on semantic relevance between a query and the results, a social search system also takes into account social relationships between the results and the searcher. The social relationships could be in various forms. For example, in LinkedIn people search engine, the social relationships include social connections between searcher and each result, whether or not they are in the same industries, work for the same companies, belong the same social groups, and go the same schools, etc.
Query expansion (QE) is the process of reformulating a given query to improve retrieval performance in information retrieval operations, particularly in the context of query understanding. In the context of search engines, query expansion involves evaluating a user's input and expanding the search query to match additional documents. Query expansion involves techniques such as:
Human–computer information retrieval (HCIR) is the study and engineering of information retrieval techniques that bring human intelligence into the search process. It combines the fields of human-computer interaction (HCI) and information retrieval (IR) and creates systems that improve search by taking into account the human context, or through a multi-step search process that provides the opportunity for human feedback.
A Web query topic classification/categorization is a problem in information science. The task is to assign a Web search query to one or more predefined categories, based on its topics. The importance of query classification is underscored by many services provided by Web search. A direct application is to provide better search result pages for users with interests of different categories. For example, the users issuing a Web query "apple" might expect to see Web pages related to the fruit apple, or they may prefer to see products or news related to the computer company. Online advertisement services can rely on the query classification results to promote different products more accurately. Search result pages can be grouped according to the categories predicted by a query classification algorithm. However, the computation of query classification is non-trivial. Different from the document classification tasks, queries submitted by Web search users are usually short and ambiguous; also the meanings of the queries are evolving over time. Therefore, query topic classification is much more difficult than traditional document classification tasks.
A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.
Ranking of query is one of the fundamental problems in information retrieval (IR), the scientific/engineering discipline behind search engines. Given a query q and a collection D of documents that match the query, the problem is to rank, that is, sort, the documents in D according to some criterion so that the "best" results appear early in the result list displayed to the user. Ranking in terms of information retrieval is an important concept in computer science and is used in many different applications such as search engine queries and recommender systems. A majority of search engines use ranking algorithms to provide users with accurate and relevant results.
An intelligent medical search engine is a vertical search engine that uses expert system technology to provide personalized medical information.
Collaborative search engines (CSE) are Web search engines and enterprise searches within company intranets that let users combine their efforts in information retrieval (IR) activities, share information resources collaboratively using knowledge tags, and allow experts to guide less experienced people through their searches. Collaboration partners do so by providing query terms, collective tagging, adding comments or opinions, rating search results, and links clicked of former (successful) IR activities to users having the same or a related information need.
Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.
In web analytics, a session, or visit is a unit of measurement of a user's actions taken within a period of time or with regard to completion of a task. Sessions are also used in operational analytics and provision of user-specific recommendations. There are two primary methods used to define a session: time-oriented approaches based on continuity in user activity and navigation-based approaches based on continuity in a chain of requested pages.
User intent, otherwise known as query intent or search intent, is the identification and categorization of what a user online intended or wanted to find when they typed their search terms into an online web search engine for the purpose of search engine optimisation or conversion rate optimisation. Examples of user intent are fact-checking, comparison shopping or navigating to other websites.
Query understanding is the process of inferring the intent of a search engine user by extracting semantic meaning from the searcher’s keywords. Query understanding methods generally take place before the search engine retrieves and ranks results. It is related to natural language processing but specifically focused on the understanding of search queries. Query understanding is at the heart of technologies like Amazon Alexa, Apple's Siri. Google Assistant, IBM's Watson, and Microsoft's Cortana.
{{cite journal}}
: Cite journal requires |journal=
(help)