Web query

Last updated

A web query or web search query is a query that a user enters into a web search engine to satisfy their information needs. Web search queries are distinctive in that they are often plain text and boolean search directives are rarely used. They vary greatly from standard query languages, which are governed by strict syntax rules as command languages with keyword or positional parameters.

Contents

Types

There are three broad categories that cover most web search queries: informational, navigational, and transactional. [1] These are also called "do, know, go." [2] Although this model of searching was not theoretically derived, the classification has been empirically validated with actual search engine queries. [3]

Search engines often support a fourth type of query that is used far less frequently:

Characteristics

A list of search suggestions for a search query Dewiki-suggesting-searchplugin.png
A list of search suggestions for a search query

Most commercial web search engines do not disclose their search logs, so information about what users are searching for on the Web is difficult to come by. [5] Nevertheless, research studies started to appear in 1998. [6] [7] A 2001 study, [8] which analyzed the queries from the Excite search engine, showed some interesting characteristics of web searches:

A study of the same Excite query logs revealed that 19% of the queries contained a geographic term (e.g., place names, zip codes, geographic features, etc.). [9]

Studies also show that, in addition to short queries (queries with few terms), there are predictable patterns of how users change their queries. [10]

A 2005 study of Yahoo's query logs revealed that 33% of the queries from the same users were repeat queries and that in 87% of cases the user would click on the same result. [11] This suggests that many users use repeat queries to revisit or re-find information. This analysis is confirmed by a Bing search engine blog post which stated that about 30% of queries are navigational queries. [12]

In addition, research has shown that query term frequency distributions conform to the power law, or long tail distribution curves. That is, a small portion of the terms observed in a large query log (e.g. > 100 million queries) are used most often, while the remaining terms are used less often individually. [13] This example of the Pareto principle (or 80–20 rule) allows search engines to employ optimization techniques such as index or database partitioning, caching and pre-fetching. In addition, studies have been conducted into linguistically-oriented attributes that can recognize if a web query is navigational, informational or transactional. [14]

A 2011 study found that the average length of queries had grown steadily over time and the average length of non-English language queries had increased more than English ones. [15] Google implemented the hummingbird update in August 2013 to handle longer search queries since more searches are conversational (e.g. "where is the nearest coffee shop?"). [16]

Structured queries

With search engines that support Boolean operators and parentheses, a technique traditionally used by librarians can be applied. A user who is looking for documents that cover several topics or facets may want to describe each of them by a disjunction of characteristic words, such as vehicles OR cars OR automobiles. A faceted query is a conjunction of such facets; e.g. a query such as (electronic OR computerized OR DRE) AND (voting OR elections OR election OR balloting OR electoral) is likely to find documents about electronic voting even if they omit one of the words "electronic" or "voting", or even both. [17]

See also

Related Research Articles

Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

In information science and information retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user. Relevance may include concerns such as timeliness, authority or novelty of the result.

A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL).

In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases.

<span class="mw-page-title-main">Search engine</span> Software system that is designed to search for information on the World Wide Web

A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). When a user enters a query into a search engine, the engine scans its index of web pages to find those that are relevant to the user's query. The results are then ranked by relevancy and displayed to the user. The information may be a mix of links to web pages, images, videos, infographics, articles, research papers, and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories and social bookmarking sites, which are maintained by human editors, search engines also maintain real-time information by running an algorithm on a web crawler. Any internet-based content that cannot be indexed and searched by a web search engine falls under the category of deep web.

Search Engine Results Pages (SERP) are the pages displayed by search engines in response to a query by a user. The main component of the SERP is the listing of results that are returned by the search engine in response to a keyword query. The page that a search engine returns after a user submits a search query. In addition to organic search results, search engine results pages (SERPs) usually include paid search and pay-per-click (PPC) ads.

Exploratory search is a specialization of information exploration which represents the activities carried out by searchers who are:

Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user feedback, and to use information about whether or not those results are relevant to perform a new query. We can usefully distinguish between three types of feedback: explicit feedback, implicit feedback, and blind or "pseudo" feedback.

Query expansion (QE) is the process of reformulating a given query to improve retrieval performance in information retrieval operations, particularly in the context of query understanding. In the context of search engines, query expansion involves evaluating a user's input and expanding the search query to match additional documents. Query expansion involves techniques such as:

Keyword research is a practice search engine optimization (SEO) professionals used to find and research search terms that users enter into search engines when looking for products, services or general information. Keywords are related to queries which are asked by users in search engines. There are four types of queries :

Human–computer information retrieval (HCIR) is the study and engineering of information retrieval techniques that bring human intelligence into the search process. It combines the fields of human-computer interaction (HCI) and information retrieval (IR) and creates systems that improve search by taking into account the human context, or through a multi-step search process that provides the opportunity for human feedback.

A Web query topic classification/categorization is a problem in information science. The task is to assign a Web search query to one or more predefined categories, based on its topics. The importance of query classification is underscored by many services provided by Web search. A direct application is to provide better search result pages for users with interests of different categories. For example, the users issuing a Web query "apple" might expect to see Web pages related to the fruit apple, or they may prefer to see products or news related to the computer company. Online advertisement services can rely on the query classification results to promote different products more accurately. Search result pages can be grouped according to the categories predicted by a query classification algorithm. However, the computation of query classification is non-trivial. Different from the document classification tasks, queries submitted by Web search users are usually short and ambiguous; also the meanings of the queries are evolving over time. Therefore, query topic classification is much more difficult than traditional document classification tasks.

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

Ranking of query is one of the fundamental problems in information retrieval (IR), the scientific/engineering discipline behind search engines. Given a query q and a collection D of documents that match the query, the problem is to rank, that is, sort, the documents in D according to some criterion so that the "best" results appear early in the result list displayed to the user. Ranking in terms of information retrieval is an important concept in computer science and is used in many different applications such as search engine queries and recommender systems. A majority of search engines use ranking algorithms to provide users with accurate and relevant results.

An intelligent medical search engine is a vertical search engine that uses expert system technology to provide personalized medical information.

Collaborative search engines (CSE) are Web search engines and enterprise searches within company intranets that let users combine their efforts in information retrieval (IR) activities, share information resources collaboratively using knowledge tags, and allow experts to guide less experienced people through their searches. Collaboration partners do so by providing query terms, collective tagging, adding comments or opinions, rating search results, and links clicked of former (successful) IR activities to users having the same or a related information need.

<span class="mw-page-title-main">Learning to rank</span> Use of machine learning to rank items

Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.

In web analytics, a session, or visit is a unit of measurement of a user's actions taken within a period of time or with regard to completion of a task. Sessions are also used in operational analytics and provision of user-specific recommendations. There are two primary methods used to define a session: time-oriented approaches based on continuity in user activity and navigation-based approaches based on continuity in a chain of requested pages.

User intent, otherwise known as query intent or search intent, is the identification and categorization of what a user online intended or wanted to find when they typed their search terms into an online web search engine for the purpose of search engine optimisation or conversion rate optimisation. Examples of user intent are fact-checking, comparison shopping or navigating to other websites.

Query understanding is the process of inferring the intent of a search engine user by extracting semantic meaning from the searcher’s keywords. Query understanding methods generally take place before the search engine retrieves and ranks results. It is related to natural language processing but specifically focused on the understanding of search queries. Query understanding is at the heart of technologies like Amazon Alexa, Apple's Siri. Google Assistant, IBM's Watson, and Microsoft's Cortana.

References

  1. Broder, A. (2002). A taxonomy of Web search. SIGIR Forum, 36(2), 3–10.
  2. Gibbons, Kevin (2013-01-11). "Do, Know, Go: How to Create Content at Each Stage of the Buying Cycle". Search Engine Watch. Retrieved 24 May 2014.
  3. Jansen, B. J., Booth, D., and Spink, A. (2008) Determining the informational, navigational, and transactional intent of Web queries, Information Processing & Management. 44(3), 1251-1266.
  4. Moore, Ross. "Connectivity servers". Cambridge University Press. Retrieved 24 May 2014.
  5. Dawn Kawamoto and Elinor Mills (2006), AOL apologizes for release of user search data
  6. Jansen, B. J., Spink, A., Bateman, J., and Saracevic, T. 1998. Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1), 5 -17.
  7. Silverstein, C., Henzinger, M., Marais, H., & Moricz, M. (1999). Analysis of a very large Web search engine query log. SIGIR Forum, 33(1), 6–12.
  8. Amanda Spink; Dietmar Wolfram; Major B. J. Jansen; Tefko Saracevic (2001). "Searching the web: The public and their queries" (PDF). Journal of the American Society for Information Science and Technology. 52 (3): 226–234. CiteSeerX   10.1.1.23.9800 . doi:10.1002/1097-4571(2000)9999:9999<::AID-ASI1591>3.3.CO;2-I.
  9. Mark Sanderson & Janet Kohler (2004). "Analyzing geographic queries". Proceedings of the Workshop on Geographic Information (SIGIR '04).
  10. Jansen, B. J., Booth, D. L., & Spink, A. (2009). Patterns of query modification during Web searching. Journal of the American Society for Information Science and Technology. 60(3), 557-570. 60(7), 1358-1371.
  11. Jaime Teevan; Eytan Adar; Rosie Jones; Michael Potts (2005). "History repeats itself: Repeat Queries in Yahoo's query logs" (PDF). Proceedings of the 29th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR '06). pp. 703–704. doi:10.1145/1148170.1148326.[ permanent dead link ]
  12. "Bing Making search yours - Search Blog - Site Blogs - Bing Community". Archived from the original on 2011-03-14. Retrieved 2011-03-01.
  13. Ricardo Baeza-Yates (2005). "Applications of Web Query Mining". Advances in Information Retrieval. Lecture Notes in Computer Science. Vol. 3408. Springer Berlin / Heidelberg. pp. 7–22. doi:10.1007/978-3-540-31865-1_2. ISBN   978-3-540-25295-5.
  14. Alejandro Figueroa (2015). "Exploring effective features for recognizing the user intent behind web queries". Computers in Industry. Elsevier. 68: 162–169. doi:10.1016/j.compind.2015.01.005.
  15. Mona Taghavi; Ahmed Patel; Nikita Schmidt; Christopher Wills; Yiqi Tew (2011). "An analysis of web proxy logs with query distribution pattern approach for search engines". Journal of Computer Standards & Interfaces. 34 (1): 162–170. doi:10.1016/j.csi.2011.07.001.
  16. Sullivan, Danny (2013-09-26). "FAQ: All About The New Google "Hummingbird" Algorithm". Search Engine Land. Retrieved 24 May 2014.
  17. Vojkan Mihajlović; Djoerd Hiemstra; Henk Ernst Blok; Peter M.G. Apers (October 2006). "Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness" (PDF).{{cite journal}}: Cite journal requires |journal= (help)