A tag cloud (also known as a word cloud or weighted list in visual design) is a visual representation of text data which is often used to depict keyword metadata on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color. [2] [3] When used as website navigation aids, the terms are hyperlinked to items associated with the tag.
In the language of visual design, a tag cloud (or word cloud) is one kind of "weighted list", as commonly used on geographic maps to represent the relative size of cities in terms of relative typeface size. An early printed example of a weighted list of English keywords was the "subconscious files" in Douglas Coupland's Microserfs (1995). A German appearance occurred in 1992. [4]
The specific visual form and common use of the term "tag cloud" rose to prominence in the first decade of the 21st century as a widespread feature of early Web 2.0 websites and blogs, used primarily to visualize the frequency distribution of keyword metadata that describe website content, and as a navigation aid.
The first tag clouds on a high-profile website were on the photo sharing site Flickr, created by Flickr co-founder and interaction designer Stewart Butterfield in 2004. That implementation was based on Jim Flanagan's Search Referral Zeitgeist, [5] a visualization of Web site referrers. Tag clouds were also popularized around the same time by Del.icio.us and Technorati, among others.
Oversaturation of the tag cloud method and ambivalence about its utility as a web-navigation tool led to a decline of usage among these early adopters. [6] Flickr gave a five-word acceptance speech for the 2006 "Best Practices" Webby Award, which simply stated "sorry about the tag clouds." [7]
A second generation of software development discovered a wider diversity of uses for tag clouds as a basic visualization method for text data. Several extensions of tag clouds have been proposed in this context.
There are three main types of tag cloud applications in social software, distinguished by their meaning rather than appearance. In the first type, there is a tag for the frequency of each item, whereas in the second type, there are global tag clouds where the frequencies are aggregated over all items and users. In the third type, the cloud contains categories, with size indicating number of subcategories.
In the first type, size represents the number of times that tag has been applied to a single item. [8] This is useful as a means of displaying metadata about an item that has been democratically "voted" on and where precise results are not desired.
In the second, more commonly used type,[ citation needed ] size represents the number of items to which a tag has been applied, as a presentation of each tag's popularity.
Instead of frequency, the size can be used to represent the significance of words and word co-occurrences, compared to a background corpus (for example, compared to all the text in Wikipedia). [9] This approach cannot be used standalone, but it relies on comparing the document frequencies to expected distributions.
In the third type, tags are used as a categorization method for content items. Tags are represented in a cloud where larger tags represent the quantity of content items in that category.
There are some approaches to construct tag clusters instead of tag clouds, e.g., by applying tag co-occurrences in documents. [10]
More generally, the same visual technique can be used to display non-tag data, [11] as in a word cloud or a data cloud.
The term keyword cloud is sometimes used as a search engine marketing (SEM) term that refers to a group of keywords that are relevant to a specific website. In recent years tag clouds have gained popularity because of their role in search engine optimization of Web pages as well as supporting the user in navigating the content in an information system efficiently. [12] Tag clouds as a navigational tool make the resources of a website more connected, [13] when crawled by a search engine spider, which may improve the site's search engine rank. From a user interface perspective they are often used to summarize search results to support the user in finding content in a particular information system more quickly. [14]
Tag clouds are typically represented using inline HTML elements. The tags can appear in alphabetical order, in a random order, they can be sorted by weight, and so on. Sometimes, further visual properties are manipulated in addition to font size, such as the font color, intensity, or weight. [15] Most popular is a rectangular tag arrangement with alphabetical sorting in a sequential line-by-line layout. The decision for an optimal layout should be driven by the expected user goals. [15] Some prefer to cluster the tags semantically so that similar tags will appear near each other [16] [17] [18] or use embedding techniques such as tSNE to position words. [9] Edges can be added to emphasize the co-occurrences of tags and visualize interactions. [9] Heuristics can be used to reduce the size of the tag cloud whether or not the purpose is to cluster the tags. [17]
Tag cloud visual taxonomy is determined by a number of attributes: tag ordering rule (e.g. alphabetically, by importance, by context, randomly, ordered for visual quality), shape of the entire cloud (e.g. rectangular, circle, given map borders), shape of tag bounds (rectangle, or character body), tag rotation (none, free, limited), vertical tag alignment (sticking to typographical baselines, free). A tag cloud on the web must address problems of modeling and controlling aesthetics, constructing a two-dimensional layout of tags, and all these must be done in short time on volatile browser platform. Tags clouds to be used on the web must be in HTML, not graphics, to make them robot-readable, they must be constructed on the client side using the fonts available in the browser, and they must fit in a rectangular box. [19]
A data cloud or cloud data is a data display which uses font size and/or color to indicate numerical values. [20] It is similar to a tag cloud [21] but instead of word count, displays data such as population or stock market prices.
A text cloud or word cloud is a visualization of word frequency in a given text as a weighted list. [23] The technique has recently[ when? ] been popularly used to visualize the topical content of political speeches. [22] [24]
Extending the principles of a text cloud, a collocate cloud provides a more focused view of a document or corpus. Instead of summarising an entire document, the collocate cloud examines the usage of a particular word. The resulting cloud contains the words which are often used in conjunction with the search word. These collocates are formatted to show frequency (as size) as well as collocational strength (as brightness). This provides interactive ways to browse and explore language. [25]
Tag clouds have been the subjects of investigation in several usability studies. The following summary is based on an overview of research results given by Lohmann et al.: [15]
Felix et al. [26] compared how human reading performance differs from traditional tag clouds that map numeric values to the size of the font and alternative designs that uses for example color or additional shapes like circle and bars. They also compared how different arrangement of the words affects performance.
In principle, the font size of a tag in a tag cloud is determined by its incidence. For a word cloud of categories like weblogs, frequency, for example, corresponds to the number of weblog entries that are assigned to a category. For smaller frequencies one can specify font sizes directly, from one to whatever the maximum font size. For larger values, a scaling should be made. In a linear normalization, the weight of a descriptor is mapped to a size scale of 1 through f, where and are specifying the range of available weights.
Since the number of indexed items per descriptor is usually distributed according to a power law, [28] for larger ranges of values, a logarithmic representation makes sense. [29]
Implementations of tag clouds also include text parsing and filtering out unhelpful tags such as common words, numbers, and punctuation.
There are also websites creating artificially or randomly weighted tag clouds, for advertising, or for humorous results.
Spamdexing is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building and repeating unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.
An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The current de facto standard is governed by the industry group WHATWG and is known as the HTML Living Standard.
An image retrieval system is a computer system used for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, title or descriptions to the images so that retrieval can be performed over the annotation words. Manual image annotation is time-consuming, laborious and expensive; to address this, there has been a large amount of research done on automatic image annotation. Additionally, the increase in social web applications and the semantic web have inspired the development of several web-based image annotation tools.
In information systems, a tag is a keyword or term assigned to a piece of information. This kind of metadata helps describe an item and allows it to be found again by browsing or searching. Tags are generally chosen informally and personally by the item's creator or by its viewer, depending on the system, although they may also be chosen from a controlled vocabulary.
In information retrieval, tf–idf, short for term frequency–inverse document frequency, is a measure of importance of a word to a document in a collection or corpus, adjusted for the fact that some words appear more frequently in general. It was often used as a weighting factor in searches of information retrieval, text mining, and user modeling. A survey conducted in 2015 showed that 83% of text-based recommender systems in digital libraries used tf–idf.
Product finders are information systems that help consumers to identify products within a large palette of similar alternative products. Product finders differ in complexity, the more complex among them being a special case of decision support systems. Conventional decision support systems, however, aim at specialized user groups, e.g. marketing managers, whereas product finders focus on consumers.
Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.
GPS signals are broadcast by Global Positioning System satellites to enable satellite navigation. Receivers on or near the Earth's surface can determine location, time, and velocity using this information. The GPS satellite constellation is operated by the 2nd Space Operations Squadron (2SOPS) of Space Delta 8, United States Space Force.
Knowledge management software is a subset of content management software, which contains a range of software that specializes in the way information is collected, stored and/or accessed. The concept of knowledge management is based on a range of practices used by an individual, a business, or a large corporation to identify, create, represent and redistribute information for a range of purposes. Software that enables an information practice or range of practices at any part of the processes of information management can be deemed to be called information management software. A subset of information management software that emphasizes an approach to build knowledge out of information that is managed or contained is often called knowledge management software.
The Corpus of Contemporary American English (COCA) is a one-billion-word corpus of contemporary American English. It was created by Mark Davies, retired professor of corpus linguistics at Brigham Young University (BYU).
WIKINDX is a free bibliographic and quotations/notes management and article authoring system designed either for single use and multi-user collaborative use across the internet. WIKINDX falls within the category of reference management software, but also provides functionality to write notes and entire papers.
Folksonomy is a classification system in which end users apply public tags to online items, typically to make those items easier for themselves or others to find later. Over time, this can give rise to a classification system based on those tags and how often they are applied or searched for, in contrast to a taxonomic classification designed by the owners of the content and specified when it is published. This practice is also known as collaborative tagging, social classification, social indexing, and social tagging. Folksonomy was originally "the result of personal free tagging of information [...] for one's own retrieval", but online sharing and interaction expanded it into collaborative forms. Social tagging is the application of tags in an open online environment where the tags of other users are available to others. Collaborative tagging is tagging performed by a group of users. This type of folksonomy is commonly used in cooperative and collaborative projects such as research, content repositories, and social bookmarking.
MAXQDA is a software program designed for computer-assisted qualitative and mixed methods data, text and multimedia analysis in academic, scientific, and business institutions. It is being developed and distributed by VERBI Software based in Berlin, Germany.
Patent visualisation is an application of information visualisation. The number of patents has been increasing, encouraging companies to consider intellectual property as a part of their strategy. Patent visualisation, like patent mapping, is used to quickly view a patent portfolio.
The normalized Google distance (NGD) is a semantic similarity measure derived from the number of hits returned by the Google search engine for a given set of keywords. Keywords with the same or similar meanings in a natural language sense tend to be "close" in units of normalized Google distance, while words with dissimilar meanings tend to be farther apart.
QDA Miner is mixed methods and qualitative data analysis software developed by Provalis Research. The program was designed to assist researchers in managing, coding and analyzing qualitative data.
WordStat is a content analysis and text mining software. It was first released in 1998 after being developed by Normand Peladeau from Provalis Research. The latest version 9 was released in 2021.
Social navigation is a form of social computing introduced by Paul Dourish and Matthew Chalmers in 1994, who defined it as when "movement from one item to another is provoked as an artifact of the activity of another or a group of others". According to later research in 2002, "social navigation exploits the knowledge and experience of peer users of information resources" to guide users in the information space, and that it is becoming more difficult to navigate and search efficiently with all the digital information available from the World Wide Web and other sources. Studying others' navigational trails and understanding their behavior can help improve one's own search strategy by guiding them to make more informed decisions based on the actions of others.
Interpolation sort is a sorting algorithm that is a kind of bucket sort. It uses an interpolation formula to assign data to the bucket. A general interpolation formula is:
{{cite web}}
: CS1 maint: bot: original URL status unknown (link)