Social profiling is the process of constructing a social media user's profile using his or her social data. In general, profiling refers to the data science process of generating a person's profile with computerized algorithms and technology. [1] There are various platforms for sharing this information with the proliferation of growing popular social networks, including but not limited to LinkedIn, Google+, Facebook and Twitter. [2]
A person's social data refers to the personal data that they generate either online or offline [3] (for more information, see social data revolution). A large amount of these data, including one's language, location and interest, is shared through social media and social network. Users join multiple social media platforms and their profiles across these platforms can be linked using different methods [4] to obtain their interests, locations, content, and friend list. Altogether, this information can be used to construct a person's social profile.
Meeting the user's satisfaction level for information collection is becoming more challenging. This is because of too much "noise" generated, which affects the process of information collection due to explosively increasing online data. Social profiling is an emerging approach to overcome the challenges faced in meeting user's demands by introducing the concept of personalized search while keeping in consideration user profiles generated using social network data. A study reviews and classifies research inferring users social profile attributes from social media data as individual and group profiling. The existing techniques along with utilized data sources, the limitations, and challenges were highlighted.
The prominent approaches adopted include machine learning, ontology, and fuzzy logic. Social media data from Twitter and Facebook have been used by most of the studies to infer the social attributes of users. The literature showed that user social attributes, including age, gender, home location, wellness, emotion, opinion, relation, influence are still need to be explored. [5]
The ever-increasing online content has resulted in the lack of proficiency of centralized search engine's results. [6] [7] It can no longer satisfy user's demand for information. A possible solution that would increase coverage of search results would be meta-search engines, [6] an approach that collects information from numerous centralized search engines. A new problem thus emerges, that is too much data and too much noise is generated in the collection process.
Therefore, a new technique called personalized meta-search engines was developed. It makes use of a user's profile (largely social profile) to filter the search results. A user's profile can be a combination of a number of things, including but not limited to, "a user's manual selected interests, user's search history", and personal social network data. [6]
According to Samuel D. Warren II and Louis Brandeis (1890), disclosure of private information and the misuse of it can hurt people's feelings and cause considerable damage in people's lives. [8] Social networks provide people access to intimate online interactions; therefore, information access control, information transactions, privacy issues, connections and relationships on social media have become important research fields and are subjects of concern to the public.
Ricard Fogues and other co-authors state that "any privacy mechanism has at its base an access control", that dictate "how permissions are given, what elements can be private, how access rules are defined, and so on". [9] Current access control for social media accounts tend to still be very simplistic: there is very limited diversity in the category of relationships on for social network accounts. User's relationships to others are, on most platforms, only categorized as "friend" or "non-friend" and people may leak important information to "friends" inside their social circle but not necessarily users to they consciously want to share the information to. [9] The below section is concerned with social media profiling and what profiling information on social media accounts can achieve.
A lot of information is voluntarily shared on online social networks, such as photos and updates on life activities (new job, hobbies, etc.). People rest assured that different social network accounts on different platforms will not be linked as long as they do not grant permission to these links. However, according to Diane Gan, information gathered online enables "target subjects to be identified on other social networking sites such as Foursquare, Instagram, LinkedIn, Facebook and Google+, where more personal information was leaked". [10]
The majority of social networking platforms use the "opt out approach" for their features. If users wish to protect their privacy, it is user's own responsibility to check and change the privacy settings as a number of them are set to default option. [10] A major social network platforms have developed geo-tag functions and are in popular usage. This is concerning because 39% of users have experienced profiling hacking; 78% burglars have used major social media networks and Google Street-view to select their victims; and an astonishing 54% of burglars attempted to break into empty houses when people posted their status updates and geo-locations. [11]
Formation and maintenance of social media accounts and their relationships with other accounts are associated with various social outcomes. [12] In 2015, for many firms, customer relationship management is essential and is partially done through Facebook. [13] Before the emergence and prevalence of social media, customer identification was primarily based upon information that a firm could directly acquire: [14] for example, it may be through a customer's purchasing process or voluntary act of completing a survey/loyalty program. However, the rise of social media has greatly reduced the approach of building a customer's profile/model based on available data. Marketers now increasingly seek customer information through Facebook; [13] this may include a variety of information users disclose to all users or partial users on Facebook: name, gender, date of birth, e-mail address, sexual orientation, marital status, interests, hobbies, favorite sports team(s), favorite athlete(s), or favorite music, and more importantly, Facebook connections. [13]
However, due to the privacy policy design, acquiring true information on Facebook is no trivial task. Often, Facebook users either refuse to disclose true information (sometimes using pseudonyms) or setting information to be only visible to friends, Facebook users who "LIKE" your page are also hard to identify. To do online profiling of users and cluster users, marketers and companies can and will access the following kinds of data: gender, the IP address and city of each user through the Facebook Insight page, who "LIKED" a certain user, a page list of all the pages that a person "LIKED" (transaction data), other people that a user follow (even if it exceeds the first 500, which we usually can not see) and all the publicly shared data. [13]
First launched on the Internet in March 2006, Twitter is a platform on which users can connect and communicate with any other user in just 280 characters. [10] Like Facebook, Twitter is also a crucial tunnel for users to leak important information, often unconsciously, but able to be accessed and collected by others.
According to Rachel Nuwer, in a sample of 10.8 million tweets by more than 5,000 users, their posted and publicly shared information are enough to reveal a user's income range. [15] A postdoctoral researcher from the University of Pennsylvania, Daniel Preoţiuc-Pietro and his colleagues were able to categorize 90% of users into corresponding income groups. Their existing collected data, after being fed into a machine-learning model, generated reliable predictions on the characteristics of each income group. [15]
The mobile app called Streamd.in displays live tweets on Google Maps by using geo-location details attached to the tweet, and traces the user's movement in the real world. [10]
The advent and universality of social media networks have boosted the role of images and visual information dissemination. [16] Many types of visual information on social media transmit messages from the author, location information and other personal information. For example, a user may post a photo of themselves in which landmarks are visible, which can enable other users to determine where they are. In a study done by Cristina Segalin, Dong Seon Cheng and Marco Cristani, they found that profiling user posts' photos can reveal personal traits such as personality and mood. [16] In the study, convolutional neural networks (CNNs) is introduced. It builds on the main characteristics of computational aesthetics CA (emphasizing "computational methods", "human aesthetic point of view", and "the need to focus on objective approaches" [16] ) defined by Hoenig (Hoenig, 2005). This tool can extract and identify content in photos.
In a study called "A Rule-Based Flickr Tag Recommendation System", the author suggests personalized tag recommendations, [17] largely based on user profiles and other web resources. It has proven to be useful in many aspects: "web content indexing", "multimedia data retrieval", and enterprise Web searches. [17]
In 2011, marketers and retailers are increasing their market presence by creating their own pages on social media, on which they post information, ask people to like and share to enter into contests, and much more. Studies in 2011 show that on average a person spends about 23 minutes on a social networking site per day. [18] Therefore, companies from small to large ones are investing in gathering user behavior information, rating, reviews, and more. [19]
Until 2006, communications online are not content led in terms of the amount of time people spend online. However, content sharing and creating has been the primary online activity of general social media users and that has forever changed online marketing. [20] In the book Advanced Social media Marketing, [21] the author gives an example of how a New York wedding planner might identify his audience when marketing on Facebook. Some of these categories may include: (1) who live in the United States; (2) Who live within 50 miles of New York; (3) Age 21 and older; (4) engaged female. [21] No matter you choose to pay cost per click or cost per impressions/views "the cost of Facebook Marketplace ads and Sponsored Stories is set by your maximum bid and the competition for the same audiences". [21] The cost of clicks is usually $0.5–1.5 each.
Klout is a popular online tool that focuses on assessing a user's social influence by social profiling. It takes several social media platforms (such as Facebook, Twitter etc.) and numerous aspects into account and generate a user's score from 1 to 100. Regardless of one's number of likes for a post, or connections on LinkedIn, social media contains plentiful personal information. Klout generates a single score that indicates a person's influence. [22]
In a study called "How Much Klout do You Have...A Test of System Generated Cues on Source Credibility" done by Chad Edwards, Klout scores can influence people's perceived credibility. [23] As Klout Score becomes a popular combined-into-one-score method of accessing people's influence, it can be a convenient tool and a biased one at the same time. A study of how social media followers influence people's judgments done by David Westerman illustrates that possible bias that Klout may contain. [24] In one study, participants were asked to view six identical mock Twitter pages with only one major independent variable: page followers. Result shows that pages with too many or too fewer followers would both decrease its credibility, despite its similar content. Klout score may be subject to the same bias as well. [24]
While this is sometimes used during recruitment process, it remains to be controversial.
Kred not only assigns each user an influence score, but also allows each user to claim a Kred profile and Kred account. Through this platform, each user can view how top influencers engage with their online community and how each of your online action impacted your influence scores.
Several suggestions that Kred is giving to the audience about increasing influence are: (1) be generous with your audience, free comfortable sharing content from your friends and tweeting others; (2) join an online community; (3) create and share meaningful content; (4) track your progress online.
Follower Wonk is specifically targeted towards Twitter analytics, which helps users to understand follower demographics, and optimizes your activities to find which activity attracts the most positive feedback from followers.
Keyhole is a hashtag tracking and analytics device that tracks Instagram, Twitter and Facebook hashtag data. It is a service that allows you to track which top influencer is using a certain hashtag and what are the other demographic information about the hashtag. When you enter a hashtag on its website, it will automatically randomly sample users that currently used this tag which allows user to analyze each hashtag they are interested in.
The prevalence of the Internet and social media has provided online activists both a new platform for activism, and the most popular tool. While online activism might stir up great controversy and trend, few people actually participate or sacrifice for relevant events. It becomes an interesting topic to analyse the profile of online activists. In a study done by Harp and his co-authors about online activist in China, Latin America and United States, the majority of online activists are males in Latin America and China with a median income of $10,000 or less, while the majority of online activist is female in United States with a median income of $30,000 - $69,999; and the education level of online activists in the United States tend to be postgraduate work/education while activists in other countries have lower education levels. [25]
A closer examination of their online shared content shows that the most shared information online include five types:
The Chinese government hopes to establish a "social-credit system" that aims to score "financial creditworthiness of citizens", social behavior and even political behaviour. [26] This system will be combining big data and social profiling technologies. According to Celia Hatton from BBC News, everyone in China will be expected to enroll in a national database that includes and automatically calculates fiscal information, political behavior, social behavior and daily life including minor traffic violations – a single score that evaluates a citizen's trustworthiness. [27]
Credibility scores, social influence scores and other comprehensive evaluations of people are not rare in other countries. However, China's "social-credit system" remains to be controversial as this single score can be a reflection of a person's every aspect. [27] Indeed, "much about the social-credit system remains unclear". [26]
Although the implementation of social credit score remains controversial in China, Chinese government aims to fully implement this system by 2018. [28] According to Jake Laband (the deputy director of the Beijing office of the US-China Business Council), low credit scores will "limit eligibility for financing, employment, and Party membership, as well restrict real estate transactions and travel." Social credit score will not only be affected by legal criteria, but also social criteria, such as contract breaking. However, this has been a great concern for privacy for big companies due to the huge amount of data that will be analyzed by the system.
Online advertising, also known as online marketing, Internet advertising, digital advertising or web advertising, is a form of marketing and advertising that uses the Internet to promote products and services to audiences and platform users. Online advertising includes email marketing, search engine marketing (SEM), social media marketing, many types of display advertising, and mobile advertising. Advertisements are increasingly being delivered via automated software systems operating across multiple websites, media services and platforms, known as programmatic advertising.
User-generated content (UGC), alternatively known as user-created content (UCC), is generally any form of content, such as images, videos, text, testimonials, and audio, that has been posted by users on online content aggregation platforms such as social media, discussion forums and wikis. It is a product consumers create to disseminate information about online products or the firms that market them.
Social media are interactive technologies that facilitate the creation, sharing and aggregation of content, ideas, interests, and other forms of expression through virtual communities and networks. Common features include:
Social media optimization (SMO) is the use of a number of outlets and communities to generate publicity to increase the awareness of a product, service brand or event. Types of social media involved include RSS feeds, social news, bookmarking sites, and social networking sites such as Facebook, Instagram, Twitter, video sharing websites, and blogging sites. SMO is similar to search engine optimization (SEO) in that the goal is to generate web traffic and increase awareness for a website. SMO's focal point is on gaining organic links to social media content. In contrast, SEO's core is about reaching the top of the search engine hierarchy. In general, social media optimization refers to optimizing a website and its content to encourage more users to use and share links to the website across social media and networking sites.
Microblogging is a form of blogging using short posts without titles known as microposts. Microblogs "allow users to exchange small elements of content such as short sentences, individual images, or video links", which may be the major reason for their popularity. Some popular social networks such as X (Twitter), Threads, Mastodon, Tumblr, Koo, and Instagram can be viewed as collections of microblogs.
Digital footprint or digital shadow refers to one's unique set of traceable digital activities, actions, contributions, and communications manifested on the Internet or digital devices. Digital footprints can be classified as either passive or active. The former is composed of a user's web-browsing activity and information stored as cookies. The latter is often released deliberately by a user to share information on websites or social media. While the term usually applies to a person, a digital footprint can also refer to a business, organization or corporation.
Targeted advertising is a form of advertising, including online advertising, that is directed towards an audience with certain traits, based on the product or person the advertiser is promoting.
Social network advertising, also known as social media targeting, is a group of terms used to describe forms of online advertising and digital marketing that focus on social networking services. A significant aspect of this type of advertising is that advertisers can take advantage of users' demographic information, psychographics, and other data points to target their ads.
Social media measurement, also called social media controlling, is the management practice of evaluating successful social media communications of brands, companies, or other organizations.
Social media marketing is the use of social media platforms and websites to promote a product or service. Although the terms e-marketing and digital marketing are still dominant in academia, social media marketing is becoming more popular for both practitioners and researchers.
The social data revolution is the shift in human communication patterns towards increased personal information sharing and its related implications, made possible by the rise of social networks in the early 2000s. This phenomenon has resulted in the accumulation of unprecedented amounts of public data.
Klout was a website and mobile app that used social media analytics to rate its users according to online social influence via the "Klout Score", which was a numerical value between 1 and 100. In determining the user score, Klout measured the size of a user's social media network and correlated the content created to measure how other users interact with that content. Klout launched in 2008.
Since the arrival of early social networking sites in the early 2000s, online social networking platforms have expanded exponentially, with the biggest names in social media in the mid-2010s being Facebook, Instagram, Twitter and Snapchat. The massive influx of personal information that has become available online and stored in the cloud has put user privacy at the forefront of discussion regarding the database's ability to safely store such personal information. The extent to which users and social media platform administrators can access user profiles has become a new topic of ethical consideration, and the legality, awareness, and boundaries of subsequent privacy violations are critical concerns in advance of the technological age.
Social media and television have a number of connections and interrelationships that have led to the phenomenon of Social Television, which is an emerging communication digital technology that centers around real-time interactivity involving digital media displayed on television. The main idea behind Social Television is to make television consumption a more active content experience for audiences. In the 2010s, social media platforms and websites allowed for television shows to be accessed online on a range of desktop and mobile computer devices, smartphones and smart TVs that are still evolving today in the 2020s. Alongside this, online users can use social media websites to share digital video clips or excerpts from TV shows with fellow fans or even share an entire show online. Many social media websites enable users to post online comments on the programs—both negative and positive—in a variety of ways. Viewers can actively participate while watching a TV program by posting comments online, and have their interactions viewed and responded to in real time by other viewers. Technologies such as smartphones, tablets, and laptop computers allow viewers to watch downloaded digital files of TV shows or "stream" digital files of TV shows on a range of devices, both in the home and while on the go. In the 2020s, many television producers and broadcasters encourage active social media participation by viewers by posting "hashtags" on the TV screen during shows. These hashtags enable viewers to post online comments about the show, which may either be read by other social media users, or even, in some cases, displayed on the screen during the show.
A user profile is a collection of settings and information associated with a user. It contains critical information that is used to identify an individual, such as their name, age, portrait photograph and individual characteristics such as knowledge or expertise. User profiles are most commonly present on social media websites such as Facebook, Instagram, and LinkedIn; and serve as voluntary digital identity of an individual, highlighting their key features and traits. In personal computing and operating systems, user profiles serve to categorise files, settings, and documents by individual user environments, known as ‘accounts’, allowing the operating system to be more friendly and catered to the user. Physical user profiles serve as identity documents such as passports, driving licenses and legal documents that are used to identify an individual under the legal system.
Social media intelligence comprises the collective tools and solutions that allow organizations to analyze conversations, respond to synchronize social signals, and synthesize social data points into meaningful trends and analysis, based on the user's needs. Social media intelligence allows one to utilize intelligence gathering from social media sites, using both intrusive or non-intrusive means, from open and closed social networks. This type of intelligence gathering is one element of OSINT.
Social selling is the process of developing relationships as part of the sales process. Today this often takes place via social networks such as LinkedIn, Twitter, Facebook, and Pinterest, but can take place either online or offline. Examples of social selling techniques include sharing relevant content, interacting directly with potential buyers and customers, personal branding, and social listening. Social Selling is gaining popularity in a variety of industries, though it is used primarily for B2B (business-to-business) selling or highly considered consumer purchases. C2C companies have been using social selling techniques since far before the Internet existed. B2B and B2C companies are now adopting many of those techniques as they are translated to social media platforms.
Online youth radicalization is the action in which a young individual or a group of people come to adopt increasingly extreme political, social, or religious ideals and aspirations that reject, or undermine the status quo or undermine contemporary ideas and expressions of a state, which they may or may not reside in. Online youth radicalization can be both violent or non-violent.
Privacy settings are "the part of a social networking website, internet browser, piece of software, etc. that allows you to control who sees information about you". With the growing prevalence of social networking services, opportunities for privacy exposures also grow. Privacy settings allow a person to control what information is shared on these platforms.
The advent of social networking services has led to many issues spanning from misinformation and disinformation to privacy concerns related to public and private personal data.