Keyword clustering

Last updated December 22, 2023

Keyword clustering is a practice search engine optimization (SEO) professionals use to segment target search terms into groups (clusters) relevant to each page of the website. After keyword research, search engine professionals cluster keywords into small groups which they spread across pages of the website to achieve higher rankings in the search engine results (SERP). Keyword clustering is a fully automated process performed by keyword clustering tools.

Method

Keyword clustering is based on the first ten search results (TOP-10) regardless of the search engine or custom settings. The TOP 10 search results are the first ten listings that a search engine shows for a certain search query. In most cases, the TOP-10 matches the first page of the search results.

The general algorithm of keyword clustering includes four steps that a tool completes to cluster keywords:

The tool takes keywords one by one from the list and sends them as search queries to the search engine. It scans the search results, pulls the ten first search listings, and matches them to each keyword from the list.
If a search engine returns the same search listings for two different keywords and the number of this listings is enough to trigger clustering, two keywords will be grouped together (clustered).
A minimum number of matches in the search results that trigger keyword clustering is called the clustering level. The clustering level is customizable, and most tools allow changing it in the settings prior to the keyword clustering. The clustering level affects the number of groups and keywords in the group after clustering. The higher clustering level produces more groups with fewer keywords in every group. This happens due to a minimum chance to have 9-10 matching documents on the search results page (it would include almost all pages in the TOP-10 of search results). On the opposite, the clustering level 1 or 2 will create a few groups with a lot of keywords in each of them. There are certain exceptions, but they are not common.
If a tool finds no matching URLs in the TOP-10 of the search results, these keywords are sent into a separate group.

Apart from the clustering level, there are also different types of the keyword clustering that affect the way all keywords within one group are linked to each other. Similar to the clustering level, the type of keyword clustering can be set prior to the clustering.

Types

Soft

A keyword clustering tool scans the list of keywords and then picks the most popular keyword. The most popular keyword is a keyword with the highest search volume. Then a tool compares the TOP 10 search result listings that showed up for the taken keyword to the TOP10 search results that showed up for another keyword to detect the number of matching URLs. If the detected number matches the selected grouping level, the keywords are grouped together.

As the result, all keywords within one group will be related to the keyword with the highest search volume, but they will not necessarily be related to each other (will not necessarily have matching URLs with each other).

Moderate

A keyword clustering tool scans the list of keywords and then picks a keyword with the highest search volume. Then a tool compares the TOP 10 search result listings that showed up for the taken keyword to the TOP10 search results that showed up for another keyword to detect the number of matching URLs. At the same time, a tool compares all keywords to each other. If the detected number of identical search listings matches the selected grouping level, the keywords are grouped together.

As the result, every keyword within one group will have a related keyword with matching URL or URLs in the same group. But two random pairs of keywords will not necessarily have matching URLs.

Hard

A keyword clustering tool scans the list of keywords and then picks a keyword with the highest search volume. Then a tool compares the TOP 10 search result listings that showed up for the taken keyword to the TOP10 search results that showed up for another keyword to detect the number of matching URLs. At the same time, a tool compares all keywords to each other and all matching URLs in the detected pairs. If the detected number of identical search listings matches the selected grouping level, the keywords are grouped together.

As the result, all keywords within a group will be related to each other by having the same matching URLs.

History

As the major part of the website optimization process, SEO professionals research keywords to get a pool of target search terms which they use to promote their website and get higher rankings in the search results. After they get a list of keywords related to the contents of the website, they segment the list into smaller groups. Each group is usually relevant to a certain page of the website or a certain topic. Originally, SEO professionals had to group out the keyword pool manually, by picking a keyword after keyword and identifying possible clusters. It could be done with the help of Google Adwords Keyword Tool but it still required a lot of manual work. There was a need in an automated algorithm that would segment keywords into clusters on auto-pilot.

Lemma-based keyword grouping

Prior to the keyword clustering, search engine optimization experts developed keyword grouping tools based on the process known as lemmatisation. Lemma is a base or dictionary form of a word (without inflectional endings). In linguistics, lemmatisation is a process of grouping together the different inflected forms of a word so they can be analyzed as a single item.^[2]

In search engine optimization, the process of lemmatisation includes four steps:

Keywords are picked from the list one-by-one;
Keywords are broken down into lemmas;
Keywords with the same lemmas are detected;
Keywords with matching lemmas are grouped together.

As the result, a search engine optimization specialist gets a list of keyword groups. Each keyword in a certain group has matching lemmas with all other keywords within this group.

SERP-based

Compared to lemma-based keyword grouping, SERP-based keyword clustering produces groups of keywords that might reveal no morphological matches, but will have matches in the search results. It allows search engine professionals getting a keyword structure close to what a search engine dictates.

Soft and Hard type of keyword clustering and the general algorithm was introduced by the Russian SEO expert Alexey Chekush in 2015. In the same year, he developed and introduced the automated tool that could cluster keywords.

Related Research Articles

Spamdexing is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building and repeating unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

Pay-per-click (PPC) is an internet advertising model used to drive traffic to websites, in which an advertiser pays a publisher when the ad is clicked.

In computer science, canonicalization is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.

<span class="mw-page-title-main">Anchor text</span> Visible, clickable text in a hyperlink

The anchor text, link label or link text is the visible, clickable text in an HTML hyperlink. The term "anchor" was used in older versions of the HTML specification for what is currently referred to as the a element, or <a>. The HTML specification does not have a specific term for anchor text, but refers to it as "text that the a element wraps around". In XML terms, the anchor text is the content of the element, provided that the content is text.

In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases.

Keyword stuffing is a search engine optimization (SEO) technique, considered webspam or spamdexing, in which keywords are loaded into a web page's meta tags, visible content, or backlink anchor text in an attempt to gain an unfair rank advantage in search engines. Keyword stuffing may lead to a website being temporarily or permanently banned or penalized on major search engines. The repetition of words in meta tags may explain why many search engines no longer use these tags. Nowadays, search engines focus more on the content that is unique, comprehensive, relevant, and helpful that overall makes the quality better which makes keyword stuffing useless, but it is still practiced by many webmasters.

Search engine marketing (SEM) is a form of Internet marketing that involves the promotion of websites by increasing their visibility in search engine results pages (SERPs) primarily through paid advertising. SEM may incorporate search engine optimization (SEO), which adjusts or rewrites website content and site architecture to achieve a higher ranking in search engine results pages to enhance pay per click (PPC) listings and increase the Call to action (CTA) on the website.

A scraper site is a website that copies content from other websites using web scraping. The content is then mirrored with the goal of creating revenue, usually through advertising and sometimes by selling user data.

The Sandbox effect is a name given to an observation of the way Google ranks web pages in its index. It is the subject of much debate—its existence has been written about since 2004, but not confirmed, with several statements to the contrary.

Search Engine Results Pages (SERP) are the pages displayed by search engines in response to a query by a user. The main component of the SERP is the listing of results that are returned by the search engine in response to a keyword query.

<span class="mw-page-title-main">Domain name auction</span>

A domain name auction facilitates the buying and selling of currently registered domain names, enabling individuals to purchase a previously registered domain that suits their needs from an owner wishing to sell. A Drop registrar offers sales of expiring domains; but with a domain auction there is no need to wait until a current owner allows the registration to lapse before purchasing the domain you most want to own. Domain auction sites allow users to search multiple domain names that are listed for sale by owner, and to place bids on the names they want to purchase. As in any auction, the highest bidder wins. The more desirable a domain name, the higher the winning bid, and auction sites often provide links to escrow agents to facilitate the safe transfer of funds and domain properties between the auctioning parties.

Keyword research is a practice search engine optimization (SEO) professionals use to find and analyze search terms that users enter into search engines when looking for products, services, or general information. Keywords are related to search queries.

Duplicate content is a term used in the field of search engine optimization to describe content that appears on more than one web page. The duplicate content can be substantial parts of the content within or across domains and can be either exactly duplicate or closely similar. When multiple pages contain essentially the same content, search engines such as Google and Bing can penalize or cease displaying the copying site in any relevant search results.

Hummingbird is the codename given to a significant algorithm change in Google Search in 2013. Its name was derived from the speed and accuracy of the hummingbird. The change was announced on September 26, 2013, having already been in use for a month. "Hummingbird" places greater emphasis on natural language queries, considering context and meaning over individual keywords. It also looks deeper at content on individual pages of a website, with improved ability to lead users directly to the most appropriate page rather than just a website's homepage.

RankBrain is a machine learning-based search engine algorithm, the use of which was confirmed by Google on 26 October 2015. It helps Google to process search results and provide more relevant search results for users. In a 2015 interview, Google commented that RankBrain was the third most important factor in the ranking algorithm, along with links and content. As of 2015, "RankBrain was used for less than 15% of queries." The results show that RankBrain produces results that are well within 10% of the Google search engine engineer team.

The domain authority of a website describes its relevance for a specific subject area or industry. Domain Authority is a search engine ranking score developed by Moz. This relevance has a direct impact on its ranking by search engines, trying to assess domain authority through automated analytic algorithms. The relevance of domain authority on website-listing in the Search Engine Results Page (SERPs) of search engines led to the birth of a whole industry of Black-Hat SEO providers, trying to feign an increased level of domain authority. The ranking by major search engines, e.g., Google’s PageRank is agnostic of specific industry or subject areas and assesses a website in the context of the totality of websites on the Internet. The results on the SERP page set the PageRank in the context of a specific keyword. In a less competitive subject area, even websites with a low PageRank can achieve high visibility in search engines, as the highest ranked sites that match specific search words are positioned on the first positions in the SERPs.

User intent, otherwise known as query intent or search intent, is the identification and categorization of what a user online intended or wanted to find when they typed their search terms into an online web search engine for the purpose of search engine optimisation or conversion rate optimisation. Examples of user intent are fact-checking, comparison shopping or navigating to other websites.

Local search engine optimization is similar to (national) SEO in that it is also a process affecting the visibility of a website or a web page in a web search engine's unpaid results often referred to as "natural", "organic", or "earned" results. In general, the higher ranked on the search results page and more frequently a site appears in the search results list, the more visitors it will receive from the search engine's users; these visitors can then be converted into customers. Local SEO, however, differs in that it is focused on optimizing a business's online presence so that its web pages will be displayed by search engines when users enter local searches for its products or services. Ranking for local search involves a similar process to general SEO but includes some specific elements to rank a business for local search.

Search engine scraping is the process of harvesting URLs, descriptions, or other information from search engines. This is a specific form of screen scraping or web scraping dedicated to search engines only.

References

↑ Chekushin, Alexey (2015-12-03). "Clustering alphabet" (in Russian). Retrieved 2016-08-03.
↑ "Lexicography". www.christianlehmann.eu. Retrieved 2016-08-03.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Chekushin, Alexey (2015-12-03). "Clustering alphabet" (in Russian). Retrieved 2016-08-03.

[2] "Lexicography". www.christianlehmann.eu. Retrieved 2016-08-03.

[1]

[2]