Google Personalized Search

Last updated

Google Personalized Search is a personalized search feature of Google Search, introduced in 2004. All searches on Google Search are associated with a browser cookie record. [1] When a user performs a search, the search results are not only based on the relevance of each web page to the search term, but also on which websites the user (or someone else using the same browser) visited through previous search results. [1] This provides a more personalized experience that can increase the relevance of the search results for the particular user. Such filtering may also have side effects, such as the creation of a filter bubble.

Contents

Changes in Google's search algorithm in later years put less importance on user data, which means the impact of personalized search is limited on search results. Acting on criticism, Google has also made it possible to turn off the feature.

History

Personalized Search was originally introduced on March 29, 2004 as a beta test of a Google Labs project. [2] On April 20, 2005, it was made available as a non-beta service, but still separate from ordinary Google Search. [3] [4] On November 11, 2005, it became a part of the normal Google Search, but only to users with Google Accounts. [5]

Beginning on December 4, 2009, Personalized Search was applied to all users of Google Search, including those who are not logged into a Google Account. [1]

In addition to customizing results based on personal behavior and interests associated with a Google Account, Google also implemented social search results in October 2009 [6] based on people whom one knows. Operating on the assumption that one's associates share similar interests, these results would give a ranking boost to sites from within a user's "Social Circle". These two services integrated into regular results by February 2011 and expanded results by including content shared to users known through social networks. [7]

Data collection

Google's search algorithm is driven by collecting and storing web history in its databases. For non-authenticated users Google looks at anonymously stored browser cookies on a user's browser and compares the unique string with those stored within Google databases. Google accounts logged into Google Chrome use user's web history to learn what sites and content they like and base the search results presented on them. Using the data provided by the user Google constructs a profile including gender, age, languages, and interests based on prior behaviour using Google services. [8]

When a user performs a search using Google, the keywords or terms are used to generate ranked results based upon the PageRank algorithm. This algorithm, according to Google, is their "system of counting link votes and determining which pages are most important based upon them. These scores are then used along with many other things to determine if a page will rank well in a search." "PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages 'important.' Using these and other factors, Google provides its views on the pages' relative importance," [9]

Since the search division launched the very first version with customized search results in 2005 and began to give consideration to previously visited sites, new factors have been added to refine search results. According to Google, the conclusion they have made after many years of testing, the incomparably best indicator for deciding which results are relevant to the user is the search phrase itself - not user data - and that personalisation of search results is not as big a factor as it used to be. [10]

Harvard law professor Jonathan Zittrain disputed the extent to which personalization filters distort Google search results, saying that "the effects of search personalization have been light". [11] Further, Google provides the ability for users to shut off personalization features if they choose, [12] by deleting Google's record of their search history and setting Google to not remember their search keywords and visited links in the future.

Types of data collected

There are 50+ factors (called 'signals' by Google) used to determine search results. The top factors in personalizing search results are:

Each of these variables will factor into the personalization of a user's search results in hopes of quickly providing the most relevant results to the user to answer whatever question is being asked. [13]

Location data

Location data allows Google to provide information based upon current location and places that the user has visited in the past, based upon GPS location from an Android smartphone or the user's IP address. Google uses this location data to provide local listings grouped with search results using the Google Local platform featuring detailed reviews and ratings from Zagat. [14]

Search history

Search history was first used to personalize search results in 2005 based upon previous searches and clicked links by individual end users. Then, in 2009, Google announced that personalized search would no longer require a user to be logged in, and instead Google would use an anonymous cookie in a web browser to customize search results for those who were not logged in. [1]

Web history

Web history differs from search history, as it's a record of the actual pages that a user visits, but still provides contributing factors in ranking search results. Lastly, Google+ data is used in search results as Google is provided a lot of demographics about a user from this information, such as age, gender, location, work history, interests, and social connections. [13]

Social networks

Google's social networking service, Google+ also collects this demographic data including age, sex, location, career, and friends. This largely comes into play when presenting reviews and ratings from people within a user's immediate circle.

Effectiveness

In order to determine the actual impacts of search customization on end users, researchers at Northeastern University determined in a study with logged in users vs. a control group that 11.7% of results show differences due to personalization. The research showed that this result varies widely by search query and result ranking position. [15]

In the following example, the Portent Team performed a search query for 'JavaScript' (shown on the right) and then performed a search for 'Programming Textbooks' and 'Books on HTML' prior to searching for 'JavaScript, which changed the search results by bringing in three book listings that were not part of the original set of results. The study showed that of the various factors being tested, the two with the most measurable impact were whether the user was logged in with a Google account and the IP address of searching users. This same study also investigated the impact of the 11.7% personalization by utilizing Amazon Mechanical Turk (AMT) (a crowdsourcing Internet Marketplace and a part of Amazon Web Services) vs. a control group to determine the difference between the two. The results showed that the top ranked URLs are less likely to change based on personalization, and that the most personalization is taking place at lower ranks of the resulting pages. [13]

Reception

Several concerns have been brought up regarding the feature. It decreases the likelihood of finding new information, since it biases search results towards what the user has already found. It also introduces some privacy problems, since a user may not be aware that their search results are personalized for them, and it affects the search results of other people who use the same computer (unless they are logged in as a different user). The feature also has profound effects on the search engine optimization (SEO) industry, since search results are not ranked the same way for every user – thus making it more difficult to identify the effects of SEO efforts. [16] Personalization makes search experience inconsistent for different users requiring the SEO industry to be aware of both personalized and non-personalized search results to get an increase in ranking. [14]

Personalized search suffers from creating an abundance of background noise to search results. This can be seen as the carry-over effect where one search is performed followed by a subsequent search. The second search is influenced by the first search if a timeout period is not set at a high enough threshold. An example of the negative effects of the carry-over effect is a search for a store in Hawaii could carry-over the results of a previous, failed search that showed the same store in California, creating noise. [15]

However, in recent years new research had stated that search engines do not create the kind of filter bubbles previously thought. In a study of the political impact of search engines in seven countries carried out at Michigan State University, researchers discovered that search engines were a complement to other news sources that people already used. Users checked out an average of 4.5 news sources across various media to obtain an understanding, and those with a specific interest in politics checked even more. The researchers note that filter bubbles sound like a real problem and that they primarily appear to apply to people other than yourself.[ dubious ] Their conclusion is, nonetheless, that the problem is overblown, the evidence anecdotal, and it is impossible to see that search engines contribute to the creation of filter bubbles based on the empirical evidence produced by the study. [17]

See also

Related Research Articles

<span class="mw-page-title-main">Google Search</span> Search engine from Google

Google Search is a search engine provided and operated by Google. Handling more than 3.5 billion searches per day, it has a 92% share of the global search engine market. It is the most-visited website in the world. Additionally, it is the most searched and used search engine in the entire world.

Spamdexing is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building and repeating unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

<span class="mw-page-title-main">Collaborative filtering</span> Algorithm

Collaborative filtering (CF) is a technique used by recommender systems. Collaborative filtering has two senses, a narrow one and a more general one.

Findability is the ease with which information contained on a website can be found, both from outside the website and by users already on the website. Although findability has relevance outside the World Wide Web, the term is usually used in that context. Most relevant websites do not come up in the top results because designers and engineers do not cater to the way ranking algorithms work currently. Its importance can be determined from the first law of e-commerce, which states "If the user can’t find the product, the user can’t buy the product." As of December 2014, out of 10.3 billion monthly Google searches by Internet users in the United States, an estimated 78% are made to research products and services online.

Personalization consists of tailoring a service or product to accommodate specific individuals, sometimes tied to groups or segments of individuals. Personalization requires collecting data on individuals, including web browsing history, web cookies, and location. Companies and organizations use personalization to improve customer satisfaction, digital sales conversion, marketing results, branding, and improved website metrics as well as for advertising. Personalization is a key element in social media and recommender systems. Personalization affects every sector of society—work, leisure, and citizenship.

Local search is the use of specialized Internet search engines that allow users to submit geographically constrained searches against a structured database of local business listings. Typical local search queries include not only information about "what" the site visitor is searching for but also "where" information, such as a street address, city name, postal code, or geographic coordinates like latitude and longitude. Examples of local searches include "Hong Kong hotels", "Manhattan restaurants", and "Dublin car rental". Local searches exhibit explicit or implicit local intent. A search that includes a location modifier, such as "Bellevue, WA" or "14th arrondissement", is an explicit local search. A search that references a product or service that is typically consumed locally, such as "restaurant" or "nail salon", is an implicit local search.

<span class="mw-page-title-main">Search engine</span> Software system that is designed to search for information on the World Wide Web

A search engine is a software system that finds web pages that match a web search. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). The information may be a mix of hyperlinks to web pages, images, videos, infographics, articles, and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories and social bookmarking sites, which are maintained by human editors, search engines also maintain real-time information by running an algorithm on a web crawler. Any internet-based content that cannot be indexed and searched by a web search engine falls under the category of deep web.

Search Engine Results Pages (SERP) are the pages displayed by search engines in response to a query by a user. The main component of the SERP is the listing of results that are returned by the search engine in response to a keyword query.

Social search is a behavior of retrieving and searching on a social searching engine that mainly searches user-generated content such as news, videos and images related search queries on social media like Facebook, LinkedIn, Twitter, Instagram and Flickr. It is an enhanced version of web search that combines traditional algorithms. The idea behind social search is that instead of ranking search results purely based on semantic relevance between a query and the results, a social search system also takes into account social relationships between the results and the searcher. The social relationships could be in various forms. For example, in LinkedIn people search engine, the social relationships include social connections between searcher and each result, whether or not they are in the same industries, work for the same companies, belong the same social groups, and go the same schools, etc.

Collaborative search engines (CSE) are Web search engines and enterprise searches within company intranets that let users combine their efforts in information retrieval (IR) activities, share information resources collaboratively using knowledge tags, and allow experts to guide less experienced people through their searches. Collaboration partners do so by providing query terms, collective tagging, adding comments or opinions, rating search results, and links clicked of former (successful) IR activities to users having the same or a related information need.

Search neutrality is a principle that search engines should have no editorial policies other than that their results be comprehensive, impartial and based solely on relevance. This means that when a user types in a search engine query, the engine should return the most relevant results found in the provider's domain, without manipulating the order of the results, excluding results, or in any other way manipulating the results to a certain bias.

<span class="mw-page-title-main">PageRank</span> Algorithm used by Google Search to rank web pages

PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. According to Google:

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

A content discovery platform is an implemented software recommendation platform which uses recommender system tools. It utilizes user metadata in order to discover and recommend appropriate content, whilst reducing ongoing maintenance and development costs. A content discovery platform delivers personalized content to websites, mobile devices and set-top boxes. A large range of content discovery platforms currently exist for various forms of content ranging from news articles and academic journal articles to television. As operators compete to be the gateway to home entertainment, personalized television is a key service differentiator. Academic content discovery has recently become another area of interest, with several companies being established to help academic researchers keep up to date with relevant academic content and serendipitously discover new content.

Personalized search refers to web search experiences that are tailored specifically to an individual's interests by incorporating information about the individual beyond the specific query provided. There are two general approaches to personalizing search results, involving modifying the user's query and re-ranking search results.

<span class="mw-page-title-main">Filter bubble</span> Intellectual isolation involving search engines

A filter bubble or ideological frame is a state of intellectual isolation that can result from personalized searches. Personalized searches utilize website algorithms to selectively curate search results based on information about the user, such as their location, past click-behavior, and search history. Consequently, users become separated from information that disagrees with their viewpoints, effectively isolating them in their own cultural or ideological bubbles, resulting in a limited and customized view of the world. The choices made by these algorithms are only sometimes transparent. Prime examples include Google Personalized Search results and Facebook's personalized news-stream.

EdgeRank is the name commonly given to the algorithm that Facebook uses to determine what articles should be displayed in a user's News Feed. As of 2011, Facebook has stopped using the EdgeRank system and uses a machine learning algorithm that, as of 2013, takes more than 100,000 factors into account.

RankBrain is a machine learning-based search engine algorithm, the use of which was confirmed by Google on 26 October 2015. It helps Google to process search results and provide more relevant search results for users. In a 2015 interview, Google commented that RankBrain was the third most important factor in the ranking algorithm along with links and content. As of 2015, "RankBrain was used for less than 15% of queries." The results show that RankBrain produces results that are well within 10% of the Google search engine engineer team.

Local search engine optimization is similar to (national) SEO in that it is also a process affecting the visibility of a website or a web page in a web search engine's unpaid results often referred to as "natural", "organic", or "earned" results. In general, the higher ranked on the search results page and more frequently a site appears in the search results list, the more visitors it will receive from the search engine's users; these visitors can then be converted into customers. Local SEO, however, differs in that it is focused on optimizing a business's online presence so that its web pages will be displayed by search engines when users enter local searches for its products or services. Ranking for local search involves a similar process to general SEO but includes some specific elements to rank a business for local search.

Click tracking is when user click behavior or user navigational behavior is collected in order to derive insights and fingerprint users. Click behavior is commonly tracked using server logs which encompass click paths and clicked URLs. This log is often presented in a standard format including information like the hostname, date, and username. However, as technology develops, new software allows for in depth analysis of user click behavior using hypervideo tools. Given that the internet can be considered a risky environment, research strives to understand why users click certain links and not others. Research has also been conducted to explore the user experience of privacy with making user personal identification information individually anonymized and improving how data collection consent forms are written and structured.

References

  1. 1 2 3 4 "Personalized Search for everyone". Google. Retrieved July 12, 2010.
  2. " Google takes searching personally". Google. Retrieved July 12, 2010.
  3. " Google search gets personal". CNET. Retrieved July 12, 2010.
  4. "Search gets personal". Google. Retrieved July 12, 2010.
  5. "Google Personalized Search" Archived 2021-01-27 at the Wayback Machine .
  6. "Introducing Google Social Search: I finally found my friend's New York blog!". Google. Retrieved Dec 1, 2014
  7. "Google's Results Get More Personal With Search Plus Your World". Search Engine Land. Retrieved Dec 1, 2014.
  8. "Google Ads Settings". Google. Retrieved Feb 8, 2018.
  9. "What Is Google PageRank? A Guide For Searchers & Webmasters". 2007-04-26. Retrieved 2016-07-02.
  10. Grankvist, Per (2018). How Technology Makes It Harder to Understand the World (1 ed.). United Stories Publishing. pp. 179–180. ISBN   978-91-639-5990-5.
  11. Weisberg, Jacob (June 11, 2011). "Is Web personalization turning us into solipsistic twits?". Slate. Retrieved February 11, 2018.
  12. Ludwig, Amber. "Google Personalization on Your Search Results Plus How to Turn it Off". NGNG. Archived from the original on August 17, 2011. Retrieved August 15, 2011. Google customizing search results is an automatic feature, but you can shut this feature off.
  13. 1 2 3 "Guide to Personalized Search Results - Portent". 2014-08-28. Retrieved 2016-07-02.
  14. 1 2 "Guide to Personalized Search Results". Colborn, Ken. Portent. Retrieved Dec 1, 2014
  15. 1 2 "A Better Understanding of Personalized Search". Briggs, Justin. Retrieved on Dec 1 2014
  16. "Google Personalized Results Could Be Bad for Search" Archived 2012-05-18 at the Wayback Machine . Network World. Retrieved July 12, 2010.
  17. Dutton, William; Reisdorf, Bianca; et al. (May 2017). "Search and Politics: The Uses and Impacts of Search in Britain, France, Germany, Italy, Poland, Spain, and the United States". SSRN   2960697.