Semantic Brand Score

Last updated
Semantic Brand Score SemanticBrandScore 01.jpg
Semantic Brand Score

The Semantic Brand Score is a measure designed to assess the importance of one or more brands, in different contexts and whenever textual data (even big data) is available. [1] [2] This metric has its foundations in graph theory and combines methods of text mining and social network analysis. [3] The Semantic Brand Score was developed based on the conceptualizations of brand equity proposed by Keller [4] and Aaker. [5] These well-known models inspired the measurement of a different construct on textual data: brand importance.

Contents

Brand equity is traditionally assessed through a series of models, which are often based on the administration of questionnaires to consumers or, for example, on financial evaluations. By contrast, the Semantic Brand Score is calculated on texts that potentially represent spontaneous expressions of different stakeholders: they are not subjected to direct interviews, thus reducing possible cognitive biases. The metric can be calculated, for example, by analyzing newspaper articles, consumer dialogue on online forums, or posts published on social media.

Definition and calculation

Pre-processing

The calculation of the Semantic Brand Score requires that the analyzed texts are preliminarily transformed into networks of words, i.e. graphs in which each node represents a word. Links between words are given by their co-occurrence within a given range, or within a sentence. A pre-processing of natural language is advisable to clean up texts, for example by removing stopwords and word affixes (stemming). Consider for example the following network, obtained from the pre-processing of the sentence "The dawn is the appearance of light - usually golden, pink or purple - before sunrise.".

Word co-occurrence network (range 3 words) - ENG.jpg

Semantic Brand Score, which measures brand importance, [6] results from the standardized sum of its components: prevalence, diversity and connectivity. [7]

Prevalence

This dimension measures the frequency of use of a brand name, i.e. the number of times a brand is directly mentioned. Prevalence is linked to the concept of brand awareness, [4] with the idea that a brand that appears more often in a text is more familiar to that text authors. Similarly, the fact that a brand name is frequently mentioned increases its recognition and recall, for those who read it.

Diversity

This dimension measures the diversity of the words associated with a brand. These are textual associations (and not mental ones as in the brand image theorized by Keller [4] ), i.e. the words that are most frequently used in conjunction with a certain brand. Calculation is obtained by means of the degree centrality [8] indicator, which corresponds to the degree of the node representing the brand. Alternatively, it was suggested to calculate diversity through the measure of distinctiveness centrality, [9] which gives more value to less redundant brand associations. The idea is that many distinctive textual associations make the discourse around a brand more informative, leading to greater brand strength [10] and importance.

Connectivity

This last dimension measures the level of connectivity of a brand with respect to general discourse, i.e. its ability to act as a bridge between other words (nodes) in the network. Ideally it represents the brokerage power of a brand, i.e. its ability to link different words, groups of words, or topics. Calculation is based on the metric of weighted betweenness centrality. [11] [12]

Semantic Brand Score

The Semantic Brand Score is the standardized sum of prevalence, diversity and connectivity. The three components are all important and only together they represent the full construct of brand importance. Consider for example the case where a brand is frequently mentioned, but in a repetitive way with many posts having the same phrase "InventedCola is the best drink of all time". Prevalence in this case would be high, but diversity would be low. On the other hand, a brand frequently mentioned in a heterogeneous context would have both high prevalence and diversity. However, connectivity may still be low if the brand is discussed only as a niche of a wider discourse. When a brand is in-between different topics - it is important and acts as an intermediary for the whole context - then its connectivity is also high. The "InventedCola" brand could be central in one discourse (e.g. soft drinks) and peripheral in another (e.g. bar cocktails).

Some tutorials for the calculation of the metric using the Python programming language can be found online. [13]

Sentiment of textual brand associations

The informativeness of brand importance can be complemented by comparing its value with that of brand associations sentiment. The fact that a brand is frequently mentioned, even in diverse contexts, and is at the heart of a discourse, defines its importance. However, it may be useful to understand whether the feelings and opinions associated with it are positive or negative.

Use cases

Not only "brands"

The Semantic Brand Score can be used to measure the importance of any word, or set of words; it is therefore not limited to the analysis of brands in a strict sense. By "brand" one can also intend the name of a politician, [14] or a set of words that represent a concept (for example, the concept of "innovation" or a corporate core value).

Use cases

The measure was used to evaluate the transition dynamics that occur when a new brand replaces an old one. [6] The Semantic Brand Score is also useful to relate the importance of a brand to that of its competitors, or to analyze importance time trends of a single brand. In some applications, the measures obtained have also proved useful for forecasting purposes; for example, in the political scenario, a link has been found between brand importance of candidate names in online press and election outcomes. [15] [7]

There are no limits on the text sources that can be analyzed: newspaper articles, emails, posts on online forums, blogs and social media, open text fields of interviews administered to consumers, etc.. The measure also works with different languages.

See also

Related Research Articles

Natural language processing Field of computer science and linguistics

Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Semantic network a knowledge base that represents semantic relations between concepts in a network

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map.

Brand equity, in marketing, is the worth of a brand in and of itself — i.e., the social value of a well-known brand name. The owner of a well-known brand name can generate more revenue simply from brand recognition, as consumers perceive the products of well-known brands as better than those of lesser-known brands.

Text mining, also referred to as text data mining, similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can differ three different perspectives of text mining: information extraction, data mining, and a KDD process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

Brodmann area 45

Brodmann area 45 (BA45), is part of the frontal cortex in the human brain. It is situated on the lateral surface, inferior to BA9 and adjacent to BA46.

Automatic summarization is the process of shortening a set of data computationally, to create a subset that represents the most important or relevant information within the original content.

Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per document is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by taking the cosine of the angle between the two vectors formed by any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents.

In statistics and related fields, a similarity measure or similarity function is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity measure exists, usually such measures are in some sense the inverse of distance metrics: they take on large values for similar objects and either zero or a negative value for very dissimilar objects.

Centrality

In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.

Verbal fluency tests are a kind of psychological test in which participants have to produce as many words as possible from a category in a given time. This category can be semantic, including objects such as animals or fruits, or phonemic, including words beginning with a specified letter, such as p, for example. The semantic fluency test is sometimes described as the category fluency test or simply as "freelisting", while letter fluency is also referred to as phonemic test fluency. The COWAT is the most employed phonemic variant. Although the most common performance measure is the total number of words, other analyses such as number of repetitions, number and length of clusters of words from the same semantic or phonemic subcategory, or number of switches to other categories can be carried out.

Social media measurement

Social media measurement and social media analytics or social listening is a way of computing popularity of a brand or company by extracting information from social media channels, such as blogs, wikis, news sites, micro-blogs such as Twitter, social networking sites, video/photo sharing websites, forums, message boards and user-generated content from time to time. In other words, this is the way to caliber success of social media marketing strategies used by a company or a brand. It is also used by companies to gauge current trends in the industry. The process first gathers data from different websites and then performs analysis based on different metrics like time spent on the page, click through rate, content share, comments, text analytics to identify positive or negative emotions about the brand.

Return on investment (ROI) or return on costs (ROC) is a ratio between net income and investment. A high ROI means the investment's gains compare favourably to its cost. As a performance measure, ROI is used to evaluate the efficiency of an investment or to compare the efficiencies of several different investments. In economic terms, it is one way of relating profits to capital invested.

Betweenness centrality

In graph theory, betweenness centrality is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such that either the number of edges that the path passes through or the sum of the weights of the edges is minimized. The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex.

In natural language processing, semantic compression is a process of compacting a lexicon used to build a textual document by reducing language heterogeneity, while maintaining text semantics. As a result, the same ideas can be represented using a smaller set of words.

Catpac is a computer program that analyzes text samples to identify key concepts contained within the sample. It was conceived chiefly by Richard Holmes, a Michigan State computer programmer and Dr. Joseph Woelfel, a University at Albany and University at Buffalo sociologist for the analysis of attitude formation and change in the sociological context. Contributions by Rob Zimmelman, an undergraduate and graduate student at the University of Albany, from 1981 to 1984 on the Univac 1100 mainframe, included the inclusion of the CATPAC software in the Galileo*Telegal system, text-labeling and porting of CATPAC output for the Galileo system of paired-comparison conceptual visualization. CATPAC and the Galileo system are still in commercial use today, and with recent data capture and visualization contributions, continues to grow. Contributions by other students at the university resulted in the software that is still in commercial use today. It uses text files as input and produces output such as word and alphabetical frequencies as well as various types of cluster analysis.

Evaluation of binary classifiers

The evaluation of binary classifiers compares two methods of assigning a binary attribute, one of which is usually a standard method and the other is being investigated. There are many metrics that can be used to measure the performance of a classifier or predictor; different fields have different preferences for specific metrics due to different goals. For example, in medicine sensitivity and specificity are often used, while in computer science precision and recall are preferred. An important distinction is between metrics that are independent on the prevalence, and metrics that depend on the prevalence – both types are useful, but they have very different properties.

Word2vec Models used to produce word embeddings

Word2vec is a technique for natural language processing published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. As the name implies, word2vec represents each distinct word with a particular list of numbers called a vector. The vectors are chosen carefully such that a simple mathematical function indicates the level of semantic similarity between the words represented by those vectors.

Distinctiveness centrality

Distinctiveness centrality is a network centrality measure, used in graph analysis. It is similar to degree centrality, but weighted in order to attribute higher importance to distinctive, non-redundant, connections.

References

  1. Colladon, Andrea Fronzetti; Bella, Agostino La; Grippa, Francesca; Guardabascio, Barbara; Capano, Vincenzo D'Innella (2018). "Brand Intelligence in the Era of Big Data: Advances in the Use of the Semantic Brand Score". Poster Presented at the XXIX RSA AiIG 2018 - the Challenge of Management Engineering in a Changing Manufacturing World. doi:10.13140/rg.2.2.22783.66723.
  2. Fronzetti Colladon, Andrea (2018). "Measuring Brand Importance through Semantic and Social Network Analysis: Applications of the Semantic Brand Score". XXXVIII Sunbelt Conference of the International Network for Social Network Analysis.
  3. Alexandridis, Kostas; Takemura, Shion; Webb, Alex; Lausche, Barbara; Culter, Jim; Sato, Tetsu (November 2018). "Semantic knowledge network inference across a range of stakeholders and communities of practice". Environmental Modelling & Software. 109: 202–222. doi: 10.1016/j.envsoft.2018.08.026 .
  4. 1 2 3 Keller, Kevin Lane (January 1993). "Conceptualizing, Measuring, and Managing Customer-Based Brand Equity". Journal of Marketing. 57 (1): 1–22. doi:10.1177/002224299305700101. ISSN   0022-2429. S2CID   220602603.
  5. Aaker, David A. (April 1996). "Measuring Brand Equity Across Products and Markets". California Management Review. 38 (3): 102–120. doi:10.2307/41165845. JSTOR   41165845.
  6. 1 2 Fronzetti Colladon, Andrea (July 2018). "The Semantic Brand Score". Journal of Business Research. 88: 150–160. arXiv: 2105.05781 . doi:10.1016/j.jbusres.2018.03.026. S2CID   89613465.
  7. 1 2 Saporiti, Riccardo (14 May 2019). "Elezioni: è la Lega il brand che vale di più sui giornali". Il Sole 24 Ore - Info Data. Retrieved 21 May 2019.
  8. Freeman, Linton C. (January 1978). "Centrality in social networks conceptual clarification". Social Networks. 1 (3): 215–239. CiteSeerX   10.1.1.227.9549 . doi:10.1016/0378-8733(78)90021-7.
  9. Fronzetti Colladon, Andrea; Naldi, Maurizio (2020-05-22). Xiao, Gaoxi (ed.). "Distinctiveness centrality in social networks". PLOS ONE. 15 (5): e0233276. arXiv: 1912.03391 . Bibcode:2020PLoSO..1533276F. doi: 10.1371/journal.pone.0233276 . ISSN   1932-6203. PMC   7244137 . PMID   32442196.
  10. Grohs, Reinhard; Raies, Karine; Koll, Oliver; Mühlbacher, Hans (June 2016). "One pie, many recipes: Alternative paths to high brand strength". Journal of Business Research. 69 (6): 2244–2251. doi:10.1016/j.jbusres.2015.12.037.
  11. Brandes, Ulrik (May 2008). "On variants of shortest-path betweenness centrality and their generic computation". Social Networks. 30 (2): 136–145. CiteSeerX   10.1.1.72.9610 . doi:10.1016/j.socnet.2007.11.001.
  12. Freeman, Linton C. (March 1977). "A Set of Measures of Centrality Based on Betweenness". Sociometry. 40 (1): 35–41. doi:10.2307/3033543. JSTOR   3033543.
  13. Colladon, Andrea Fronzetti (2019-04-16). "Calculating the Semantic Brand Score with Python". Medium. Retrieved 2019-04-17.
  14. Guzmán, Francisco; Sierra, Vicenta (December 2009). "A political candidate's brand image scale: Are political candidates brands?". Journal of Brand Management. 17 (3): 207–217. doi:10.1057/bm.2009.19. ISSN   1350-231X. S2CID   167417115.
  15. "Semantic Brand Score - Analytics Demo". semanticbrandscore.com. Retrieved 2019-02-15.