This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these messages)
|
Social media mining is the process of obtaining data from user-generated content on social media in order to extract actionable patterns, form conclusions about users, and act upon the information. Mining supports targeting advertising to users or academic research. The term is an analogy to the process of mining for minerals. Mining companies sift through raw ore to find the valuable minerals; likewise, social media mining sifts through social media data in order to discern patterns and trends about matters such as social media usage, online behaviour, content sharing, connections between individuals, buying behaviour. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as such organizations can use the analyses for tasks such as design strategies, introduce programs, products, processes or services.
Social media mining uses concepts from computer science, data mining, machine learning, and statistics. Mining is based on social network analysis, network science, sociology, ethnography, optimization and mathematics. It attempts to formally represent, measure and model patterns from social media data. [1] In the 2010s, major corporations, governments and not-for-profit organizations began mining to learn about customers, clients and others.
Platforms such as Google, Facebook (partnered with Datalogix and BlueKai) conduct mining to target users with advertising. [2] Scientists and machine learning researchers extract insights and design product features. [3]
Users may not understand how platforms use their data. [4] Users tend to click through Terms of Use agreements without reading them, leading to ethical questions about whether platforms adequately protect users' privacy.
During the 2016 United States presidential election, Facebook allowed Cambridge Analytica, a political consulting firm linked to the Trump campaign, to analyze the data of an estimated 87 million Facebook users to profile voters, creating controversy when this was revealed. [5]
As defined by Kaplan and Haenlein, [6] social media is the "group of internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content." There are many categories of social media including, but not limited to, social networking (Facebook or LinkedIn), microblogging (Twitter), photo sharing (Flickr, Instagram, Photobucket, or Picasa), news aggregation (Google Reader, StumbleUpon, or Feedburner), video sharing (YouTube, MetaCafe), livecasting (Ustream or Twitch), virtual worlds (Kaneva), social gaming (World of Warcraft), social search (Google, Bing, or Ask.com), and instant messaging (Google Talk, Skype, or Yahoo! messenger).
The first social media website was introduced by GeoCities in 1994. It enabled users to create their own homepages without having a sophisticated knowledge of HTML coding. The first social networking site, SixDegrees.com, was introduced in 1997. [7] Since then, many other social media sites have been introduced, each providing service to millions of people. These individuals form a virtual world in which individuals (social atoms), entities (content, sites, etc.) and interactions (between individuals, between entities, between individuals and entities) coexist. Social norms and human behavior govern this virtual world. By understanding these social norms and models of human behavior and combining them with the observations and measurements of this virtual world, one can systematically analyze and mine social media. Social media mining is the process of representing, analyzing, and extracting meaningful patterns from data in social media, resulting from social interactions. It is an interdisciplinary field encompassing techniques from computer science, data mining, machine learning, social network analysis, network science, sociology, ethnography, statistics, optimization, and mathematics. Social media mining faces grand challenges such as the big data paradox, obtaining sufficient samples, the noise removal fallacy, and evaluation dilemma. Social media mining represents the virtual world of social media in a computable way, measures it, and designs models that can help us understand its interactions. In addition, social media mining provides necessary tools to mine this world for interesting patterns, analyze information diffusion, study influence and homophily, provide effective recommendations, and analyze novel social behavior in social media.
Social media mining is used across several industries including business development, social science research, health services, and educational purposes. [8] [9] Once the data received goes through social media analytics, it can then be applied to these various fields. Often, companies use the patterns of connectivity that pervade social networks, such as assortativity—the social similarity between users that are induced by influence, homophily, and reciprocity and transitivity. [10] These forces are then measured via statistical analysis of the nodes and connections between these nodes. [8] Social analytics also uses sentiment analysis, because social media users often relay positive or negative sentiment in their posts. [11] This provides important social information about users' emotions on specific topics. [12] [13] [14]
These three patterns have several uses beyond pure analysis. For example, influence can be used to determine the most influential user in a particular network. [8] Companies would be interested in this information in order to decide who they may hire for influencer marketing. These influencers are determined by recognition, activity generation, and novelty—three requirements that can be measured through the data mined from these sites. [8] Analysts also value measures of homophily: the tendency of two similar individuals to become friends. [10] Users have begun to rely on information of other users' opinions in order to understand diverse subject matter. [11] These analyses can also help create recommendations for individuals in a tailored capacity. [8] By measuring influence and homophily, online and offline companies are able to suggest specific products for individuals consumers, and groups of consumers. Social media networks can use this information themselves to suggest to their users possible friends to add, pages to follow, and accounts to interact with.
Modern social media mining is a controversial practice that has led to exponential gains in user growth for tech giants such as Facebook, Inc., Twitter, and Google. Companies such as these, considered "Big Tech" are companies that build algorithms that take advantage of user input to understand their preferences, and keep them on the platform as much as possible. These inputs, that can be as simple as time spent on a given screen, provide the data being mined, and lead to companies profiting heavily from using that data to capitalize on extremely accurate predictions about user behavior. The growth of platforms accelerated rapidly once these strategies were put in place; Most of the largest platforms now average over 1 billion active users per month as of 2021. [15]
It has been claimed by a multitude of anti-algorithm personalities, like Tristan Harris or Chamath Palihapitiya, that certain companies (specifically Facebook) valued growth above all else, and ignored potential negative impacts from these growth engineering tactics. [16]
At the same time, users have now created their own data arbitrages with the help of their own data, through content monetization and becoming influencers. Users typically have access to a varied set of analytics specific to people that interact with them on social media, and can use these as building blocks for their own targeting and growth strategies through ads and posts that cater to their audiences. Influencers also commonly promote products and services for established brands, creating one of the largest digital industries: Influencer marketing. Instagram, Facebook, Twitter, YouTube, Google, and others have long given access to platform analytics, and allowed third parties to access that information as well, at times unbeknownst to even the user whose data is being viewed/bought. [17]
Social media mining research articles are published in computer science, social science, and data mining conferences and journals:
Conference papers can be found in proceedings of Knowledge Discovery and Data Mining (KDD), World Wide Web (WWW), Association for Computational Linguistics (ACL), Conference on Information and Knowledge Management (CIKM), International Conference on Data Mining (ICDM), Internet Measuring Conference (IMC).
Social media mining is also present on many data management/database conferences such as the ICDE Conference, SIGMOD Conference and International Conference on Very Large Data Bases.
Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
A recommender system (RecSys), or a recommendation system (sometimes replacing system with terms such as platform, engine, or algorithm), is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.
In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.
Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.
In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.
Reverse image search is a content-based image retrieval (CBIR) query technique that involves providing the CBIR system with a sample image that it will then base its search upon; in terms of information retrieval, the sample image is very useful. In particular, reverse image search is characterized by a lack of search terms. This effectively removes the need for a user to guess at keywords or terms that may or may not return a correct result. Reverse image search also allows users to discover content that is related to a specific sample image or the popularity of an image, and to discover manipulated versions and derivative works.
Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.
AMiner is a free online service used to index, search, and mine big scientific data.
Jie Tang is a full-time professor at the Department of Computer Science of Tsinghua University. He received a PhD in computer science from the same university in 2006. He is known for building the academic social network search system AMiner, which was launched in March 2006 and now has attracted 2,766,356 independent IP accesses from 220 countries. His research interests include social networks and data mining.
In web analytics, a session, or visit is a unit of measurement of a user's actions taken within a period of time or with regard to completion of a task. Sessions are also used in operational analytics and provision of user-specific recommendations. There are two primary methods used to define a session: time-oriented approaches based on continuity in user activity and navigation-based approaches based on continuity in a chain of requested pages.
In data mining, intention mining or intent mining is the problem of determining a user's intention from logs of his/her behavior in interaction with a computer system, such as in search engines, where there has been research on user intent or query intent prediction since 2002 ; and commercial intents expressed in social media posts.
Bing Liu is a Chinese-American professor of computer science who specializes in data mining, machine learning, and natural language processing. In 2002, he became a scholar at University of Illinois at Chicago. He holds a PhD from the University of Edinburgh (1988). His PhD advisors were Austin Tate and Kenneth Williamson Currie, and his PhD thesis was titled Reinforcement Planning for Resource Allocation and Constraint Satisfaction.
Author name disambiguation is the process of disambiguation and record linkage applied to the names of individual people. The process could, for example, distinguish individuals with the name "John Smith".
Discovering communities in a network, known as community detection/discovery, is a fundamental problem in network science, which attracted much attention in the past several decades. In recent years, with the tremendous studies on big data, another related but different problem, called community search, which aims to find the most likely community that contains the query node, has attracted great attention from both academic and industry areas. It is a query-dependent variant of the community detection problem. A detailed survey of community search can be found at ref., which reviews all the recent studies
Huan Liu is a Shanghai-born Chinese computer scientist.
Jiliang Tang is a Chinese-born computer scientist and associate professor at Michigan State University in the Computer Science and Engineering Department, where he is the director of the Data Science and Engineering (DSE) Lab. His research expertise is in data mining and machine learning.
In network theory, collective classification is the simultaneous prediction of the labels for multiple objects, where each label is predicted using information about the object's observed features, the observed features and labels of its neighbors, and the unobserved labels of its neighbors. Collective classification problems are defined in terms of networks of random variables, where the network structure determines the relationship between the random variables. Inference is performed on multiple random variables simultaneously, typically by propagating information between nodes in the network to perform approximate inference. Approaches that use collective classification can make use of relational information when performing inference. Examples of collective classification include predicting attributes of individuals in a social network, classifying webpages in the World Wide Web, and inferring the research area of a paper in a scientific publication dataset.
Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension.
Kai Shu is a computer scientist, academic, and author. He is an assistant professor at Emory University.
{{cite book}}
: |journal=
ignored (help)