Detecting fake news online

Last updated

Detection of fake news online is important in today's society as fresh news content is rapidly being produced as a result of the abundance of available technology. Claire Wardle has identified seven main categories of fake news, and within each category, the fake news content can be either visual and/or linguistic-based. In order to detect fake news, both linguistic and non-linguistic cues can be analyzed using several methods. While many of these methods of detecting fake news are generally successful, they do have some limitations.

Contents

Background and implications of fake news detection

Detection of fake news

With the advancement of technology, digital news is more widely exposed to users globally and contributes to the increment of spreading hoaxes and disinformation online. Fake news can be found through popular platforms such as social media and the Internet. There have been multiple solutions and efforts in the detection of fake news where it even works with artificial intelligence tools. However, fake news intends to convince the reader to believe false information which deems these articles difficult to perceive. The rate of producing digital news is large and quick, running daily at every second, thus it is challenging for machine learning to effectively detect fake news. [1]

Implications of fake news detection

In the discourse of not being able to detect fake news, the world would no longer hold value in truth. Fake news paves the way for deceiving others and promoting ideologies. These people who produce the wrong information benefit by earning money with the number of interactions on their publications. Spreading disinformation holds various intentions, in particular, to gain favor in political elections, for business and products, done out of spite or revenge. Humans can be gullible and fake news is challenging to differentiate from the normal news. Most are easily influenced especially by the sharing of friends and family due to relations and trust. We tend to base our emotions from the news, which makes accepting not difficult when it is relevant and stance from our own beliefs. Therefore, we become satisfied with what we want to hear and fall into these traps. [2]

Types of fake news

Fake news appears in different forms and the examples of their features are clickbait, propaganda, satire or parody, sloppy journalism, misleading headings and biased or slanted news. Claire Wardle of First Draft News has identified seven types of fake news. [3]

The seven types

Types of Fake NewsDescription
Satire or parody

Satire or parody is where the information has the potential to fool and may be misinterpreted as fact. Satire does not necessarily cause harm as it takes stories from news sources and uses ridicule and sarcasm. Parodies focus on its content and are explicitly produced for entertainment purposes. [4]

False connectionA false connection is obvious when headlines, visuals or captions do not support the content. The kind of news that is built on poor journalism with unrelated attributes to attract attention and used for profit. For example, reading a headline that states the death of a celebrity but upon clicking, the content of the entire article does not mention the celebrity
Misleading contentMisleading content is the type of fake news that uses information to frame an issue or an individual. A popular form of news used by politicians to bring down their opponents by making false claims with possibly some truth.
False contextFalse context includes false contextual information that is shared around genuine content.
Impostor contentDerives from a false or made-up source which impersonates a real news source.
Manipulated content

Presents genuine information or imagery but deceives to tell a different story.

Fabricated content

New and fully fabricated content that is 100% false with the intention to deceive and cause harm.

Types of data in fake news

Visual-based

Visual-based type of fake news uses content that integrates multiple forms of media which includes graphical representation such as Photoshopped images and videos. Visual news that grabs the attention of the viewers are mainly posted on platforms like social media and media sites. Facebook, Instagram and Twitter are popular examples of frequently used social media to post and share online content thus circulate to many other users. More than 70% of its users utilize them as daily news sources to receive the latest and quickest updates. Media sites are operated by content media companies and their content focus on a wide range of visuals and design their sites based on style and user's interest. [5]

Linguistics-based

Linguistic-based type of fake news is in the form of text or string content and generally analysed by text linguistics. Its content largely focuses on text as a communication system and includes characteristics like tone, grammar, and pragmatics that allows discourse analysis. Examples of linguistic-based platforms are blog sites, emails and news sites. Blog sites are managed by users and the content produced is unsupervised which considers it easy to receive wrong information. Email is another medium where its users can receive news and this poses as a challenge to detect and validate their authenticity. It is known that hoaxes, spam and junk mails are infamously spread through emails. Popular news websites too can generate their own content and attract users with their authentic presence. [5]

Features in fake news detection

Characteristics of fake news (cues) are extracted from the source, headline, body text, visual content and social engagement of authors.

Linguistics cues

Data representation

The 'Bag of Words' approach evaluates individual words as a single, significant unit. The frequency of each word (or n-grams) is obtained and the frequencies are aggregated and analysed for deceptive cues. The challenge of this approach is that it is language-dependent. It depends on individual n-grams, which are typically analysed separately from useful contextual information. [6]

Psycholinguistics features

The LIWC (Linguistic Inquiry and Word Count) lexicon can be used to extract suitable proportions of words which in turn, will aid in the extraction of psycholinguistic features. This enables the system “to ascertain the tone of the language (e.g., positive emotions, perceptual process, etc.), statistics of the text (e.g.: word counts) and part of speech category (e.g.: articles, verbs)”. LIWC is a useful tool as it is able to “cluster single LIWC categories into multiple feature sets such as summary categories (e.g., analytical thinking, emotional tone), linguistic processes (e.g., function words, pronouns), and psychological processes (e.g., effective processes, social processes)”. [5]

Readability

The veracity of content can be assessed by analysing its readability. This includes picking out content features such as the number of characters, complex words, number of syllables and word types, among many others, which enables users to carry out readability metrics such as Flesch-Kincaid, Flesch Reading Ease, Gunning Fog, and the Automated readability index (ARI). [5]

Discourse

Using discourse analysis, the truthfulness of an article's content can be evaluated. The Rhetorical Structure Theory (RST) analytic framework can be used to pinpoint rhetoric relations between linguistics components. Differences between honest and dishonest content in terms of coherence and structure can be evaluated together using a Vector Space Model (VSM). An individual content's position in a multi-dimensional RST space can be assessed with respect to its distance from truth and deception. Conspicuous usage of specific rhetorical relations might suggest deception. However, although there are tools to automatically classify rhetorical relations, it has yet to be officially used as an assessment tool for veracity. [6]

Deep syntax

Deeper language structures, also known as syntax, are analysed to detect deception. “Features based on context-free grammar (CFG) are picked out and these features depend largely on lexicalised production rules that are combined with their parent & grandparent nodes” . The challenge is that syntax analysis on its own may not be the best at detecting deception, hence it is usually used in combination with other linguistic or network analysis methods. [6]

Semantic analysis

Veracity of content can be assessed by analyzing the compatibility between the content and the profile it is derived from. This approach is an extension of the n-gram and syntax analysis approaches. Firstly, deception can be identified by contradictions or omission of facts that were present in the user's previous posts on similar topics. For example, for a product review, a truthful review will most likely be written by a writer who makes similar remarks about features of the product that most reviewers will comment on. Secondly, deception can also be detected through content that is extracted from keywords, which contains the attribute:descriptor pair. Profiles and description of the author's experiences are matched, and the veracity of the described content is evaluated by assessing the compatibility scores – the content's compatibility with the existence of a distinct aspect and a general aspect of what it is actually describing. This approach predicts falsehood at approximately 91% of accuracy. This approach is shown to be valuable in the context of reviews but currently, it has only been effective in this domain. The challenge lies in the ability to determine the alignment of attributes:descriptor because it depends on the amount of content of the profiles and the accuracy of the associated attributes to descriptors. [6]

Non-linguistics cues

Visual

Visual based cues are prevalent in all types of news content. The veracity of visual elements such as images and videos are assessed using visual features like clarity, coherence, diversity, clustering score, and similarity distribution histogram, as well as statistical features like count, image, multi-image, hot image and long image ratio etc. [7]

Network

Linked data approach
The linked data approach uses a current collection of human knowledge to assess the veracity of new statements. It relies on querying available knowledge networks and publicly structured data like the DBpedia Ontology or Google relation Extraction Corpus (GREC). How this system works is that the closer the node representing the new statement is to the node that represents the existing factual statements, the more likely that the new statement is true. The challenge is that the statements must be found in a pre-existing knowledge bank. [6]

Sentiment

Sentiment is based on the unintended, judgement or affective state. Syntactic patterns of content can be evaluated to identify emotions from factual arguments by analysing patterns of argumentation style classes. The fake negative reviewers used excessive negative emotion terms as compared to the honest ones as they tried to exaggerate a particular sentiment they were trying to express. [6]

Social context features

Social context features can be extracted from the user's social engagements on social media platforms. It reveals the proliferation process which will provide auxiliary information that suggests its veracity. Social context features can be evaluated in 3 aspects – User based, post based and network based. [7]

User-based
It was suggested that fake news is more likely to be created and spread by social bots or cyborgs. By analyzing the user's interaction with news on social media, user-based social context features can be identified and characterized. Individual level features infers the credibility and reliability of each user. Information like registration age, follow/following count and authored tweets are extracted. Group level features captures overall characteristics of groups of users related to the news. Spreaders of news may form communities with certain characteristics. Information like percentage of verified users and followers are used.

Post-based
Emotions and opinions of fake news through social media posts can be analysed. Post-based features can be used to identify fake news via reactions expressed in the post. Post level features analyses linguistic-based features that can be applied to identify unique features for each post. Special features include stance, topic and credibility. Stance reveals the user's opinions towards news. Topic is extracted using topic models like the latent Dirichlet allocation (LDA). Credibility assesses the degree of reliability. Group level features aggregates the feature value for all relevant posts for news articles using crowd wisdom. Temporal level features monitors the temporal variations of post level feature value. It uses unsupervised embedding methods like the recurrent neural network (RNN) to monitor changes in the post over time.

Social network approach
Users create networks based on their interest, topics and relations. Fake news spreads like an echo chamber cycle; it identifies the value of extracting network-based features to represent network patterns for fake news detection. Network based features are extracted by creating specific networks among users who authored related social media posts. In the case of Twitter, the stance network is built with nodes that show tweets that are related to the news. Edges show the similarity of stances. The Co-occurrence network depends on user engagements. User's authored posts related to the same news articles were counted. The friendship network shows the structure between followers and followee's related tweets. An extension of the friendship network is the diffusion network which tracks the trajectory of the spread of news. Nodes represent users and edges represent the diffusion path of information among them. This network only exists if both users follow each other and the first user posts about a news after the second user does.

Methods of detection

Deep syntax analysis

Deep syntax can be analysed using Probabilistic context-free grammar (PCFG). Syntax structures are described by changing sentences into parse trees. Nouns, verbs etc. are rewritten into their syntactic constituent parts. Probabilities are assigned to the parse tree. This method identifies the rule categories like lexicalization and parent nodes etc. It detects deception with 85-91% accuracy, depending on the category used in the analysis. [8]

Propagation paths

A model to detect fake news on social media by classifying propagation paths of news was proposed. The propagation path of each news story is modeled as a multivariate time series – Each tuple indicates the characteristics of the user who participates in the propagation of the news. A time series classifier is built with recurrent and convolutional networks to predict the veracity of the news story. Recurrent and convolutional networks are able to learn global and local variations of under characteristics which will in turn help characterize clues for the detection of fake news. [9] Clustering based methods can be used to detect fake news with a success rate of 63% through the classification of fake news and real news. How clustering works is that a large number of data is fed to a machine that contains an algorithm that will create a small number of clusters via agglomeration clustering with the k-nearest neighbour approach. This approach “clusters similar news reports based on the normalized frequency of relations” and after the real and fake news cluster centers were computed, this model is able to ascertain the deceptive value of a new article based on the principle of coordinate distances, where its Euclidean distances to the real and fake news cluster centers are calculated. However, the challenge of this approach is that it may be less accurate if applied on fake news articles that are relatively new because similar news story sets may not be accessible yet. [6]

Predictive modelling-based methods

The detection of fake news can also be achieved through predictive modelling based methods. One type would be the logistic regression model. In this model, positive coefficients increase the probability of truth while negative ones increase the probability of deception. “Authors claimed that regression indicators like, Disjunction, Purpose, Restatement, and Solutionhood points to truth, and Condition regression indicator pointed to deception”. [5]

Fact checking

Fact checking is a form of “knowledge-based study of fake news” which focuses on assessing the truthfulness of news. There are two types of fact checking, namely manual and automatic. [10]

Manual fact checking

The process of manual fact checking is one that is done by humans and it can be done by either experts or regular people.

Expert-based
This method depends on professionals in the fact-checking field, also called fact-checkers, to authenticate a particular news content. It is typically done by a few but very reliable fact-checkers. This approach is relatively simple to conduct and is also very accurate. However, the disadvantages of this method are that it is expensive and the system is likely to be overwhelmed as the amount of news content to be verified increases.

Crowd-sourced
This alternative type of fact checking requires a huge group of normal individuals who serve as fact checkers. This form of fact checking is not as easy to conduct and the results are likely to be less reliable and accurate due to the biases of the fact checkers as well as possible clashes between them in annotations of the news content. However, as compared to expert-based fact checking, it is less likely for this crowdsourcing fact checking system to be overwhelmed when the volume of news content to be authenticated increases. In this type of fact checking, it is important to sieve out unreliable users and iron out any results that may contrast each other. These concerns would become more crucial as the fact checking population expands. Nevertheless, individuals who fact check on these crowd-sourcing sites are more able to provide more comprehensive feedback such as including their attitudes or opinions.

Automatic fact checking

A big problem with manual fact-checking is that the systems are easily overwhelmed by growing numbers of fresh news content that needs to be checked, which is very prevalent in the case of social media. Hence, automatic fact checking methods have been created to combat this problem. These approaches mostly depend on “Information Retrieval (IR) and Natural Language Processing (NLP) techniques, as well as on network/graph theory”. Automatic fact checking methods generally comprise two steps, fact extraction and fact checking. In fact extraction, also known as knowledge-base construction, knowledge is taken from the Web as “raw facts” and it is typically unnecessary, obsolete, conflicting, inaccurate or not complete. They will then be refined and cleaned up by “knowledge processing tasks to build a knowledge-base or a knowledge graph”. Secondly, fact checking, also known as knowledge comparison, is done to assess the veracity of the news content. This is accomplished by matching the knowledge taken from the to-be-checked news content against the facts found in the current “knowledge-base(s) or knowledge graph(s)”.

Deception detection strategies

Deception detection strategies fall under a “style-based study of fake news” and it mainly aims to identify fake news by looking at its style. A popular strategy for style-based deception detection is using “a feature vector representing the content style of the given information within a machine learning framework” to determine if the information is deceitful, which calls for classification, or how deceitful it is, which calls for regression. [10]

Propagation-based fake news detection

Propagation-based detection is the analysis of the dissemination of fake news. There are two types: cascade-based, and network-based fake news detection [10]

Cascade-based fake news detection

A tree or tree-like structure is often used to represent a fake news cascade. It shows the propagation of fake news on social networks by users. The root node is represented by the user who publishes the fake news. The rest of the nodes represent users who subsequently disseminate the news by forwarding or posting it. The cascade is represented in terms of the number of steps the fake news has traveled, which is known as the Hops-based fake news cascade, or the number of times it was posted, which is known as the Time-based fake news cascade. The Hops-based fake news cascade is often represented as a standard tree consisting of parameters like depth, which is the maximum number of steps (hops) taken, breadth, which is the number of user who have received the fake news after it was posted, and size, which is the total number of users represented in the cascade. The Time-based fake news cascade is often represented by a tree-like structure consisting of parameters like lifetime, which is the longest interval for the propagation of fake news, real-time heat, which is the number of users forwarding and reposting the fake news at time t, and overall heat, which is the overall number of users that forwarded or reposted the fake news.

Utilizing graph kernels to analyze cascade-similarity
The similarity between the news cascades can be computed by using graph kernels and used within a supervised learning framework as a feature to detect fake news. A graph-kernel based hybrid support-vector machine (SVM) classifier that will record propagation patterns that are high-order (i.e. cascade similarities) was suggested, in addition to features like topics and sentiments. User roles (i.e. opinion leader or normal user), approval, sentiment and doubt scores are evaluated additionally. Assuming that cascades of fake news are different from the cascades of real news, a random walk (RW) graph kernel kRW (·, ·) was used to detect fake news by computing the differences in distance between the 2 cascades.

Utilizing cascade representations
Informative representations of cascades can be useful as features in a supervised learning framework. Other than using feature-engineering, which are not automatic, representation learning, which is often achieved by deep learning, can be used to represent a cascade too. Deep learning by creating a tree-like neural network, Recursive Neural network (RNNs), was utilized, according to the fake news cascades. This method can automatically represent news that is to be verified. However, because the depth of the cascade is equivalent to the depth of the neural network, it will be challenging because methods of deep learning are sensitive.

Network-based fake news detection

Flexible networks can be constructed by network-based fake news detection to capture the propagation of fake news indirectly. The networks can be homogeneous, heterogeneous or hierarchical.

Homogeneous network
Homogeneous networks contain 1 type of node and 1 type of edge. The stance network is a classic homogeneous network in which nodes represent the user's news-related post and the edges represent the positive or negative relation between posts. It evaluates the veracity of news-related posts.

Heterogeneous network
Heterogeneous networks consist of nodes and edges of multiple types. It is typically a hybrid framework which is made up of 3 components – representation and embedding of entity, modeling of relation and semi-supervised learning. One example would be the tri-relationship network between news publishers, news articles and news proliferators.

Hierarchical network
Hierarchical networks consist of nodes and edges of various types which form a set-subset relationship (i.e. a hierarchy). News verification is turned into a graph optimization problem in this network.

Credibility-based study of fake news

This approach looks at fake news “based on news-related and social-related information. For instance, intuitively, a news article published on unreliable website(s) and forwarded by unreliable user(s) is more likely to be fake news than news posted by authoritative and credible users”. In other words, this approach focuses on the source of the news content. As such, the credibility perspective of studying fake news generally overlaps with a propagation-based study of fake news. [10]

Assessing news headline credibility

This method typically revolves around identifying clickbait, which are headers that aim to capture users’ attention and lead them to click on a link to a certain web page. Existing clickbait detection studies use both “linguistic features such as term frequencies, readability, and forward references and non-linguistic features such as webpage links”. [11] “user interests”, “and headline stance” “within a supervised learning framework such as gradient boosted decision trees” “to identify or block clickbaits”. Empirical studies have suggested that clickbaits are typically defined by “a cardinal number, easy readability, strong nouns and adjectives to convey authority and sensationalism”

Assessing news source credibility

This approach generally looks at the “quality, credibility, and political bias of source websites” in order to assess the quality and reliability of news content.

Assessing news comments credibility

The credibility of news content can also be evaluated via the credibility of the comments associated with it. “User comments on news websites and social media carry invaluable information on stances and opinion” although it is very common for them to be overlooked. There are a few models that can be used to assess comment credibility and they can be classified into three types, content-based, behavior-based and graph(network)-based.

Content-based models
These models evaluate comment credibility by leveraging on language features taken from user comments and the strategy it adopts is comparable to that of style-based fake news detection.

Behavior-based models
These models often make use of the “indicative features of unreliable comments extracted from the metadata associated with user behavior”. Looking at review spam detection studies, these related behavioral attributes can be sorted into five categories, namely, burstiness, activity, timeliness, similarity, and extremity.

Graph-based models
Lastly, these models focus on the relationships among reviewers, comments, products and so on. To evaluate the reliability of news comments, graph-based models frequently use “Probabilistic Graphical Models (PGMs), web ranking algorithms and centrality measures, or matrix decomposition techniques”.

Assessing news spreader credibility

Lastly, the credibility of news content can also be evaluated by looking at the users who spread the particular news content and assessing their reliability. Users are a vital part of the propagation of deceptive news as they can spread fake news through various ways such as sharing, forwarding, liking and reviewing. In this process, users can be categorized into two types, malicious users who typically have low reliability and normal users who generally have higher reliability. Malicious users deliberately spread deceptive news in search of monetary and/or non-monetary benefits such as power and popularity. This group of users can be split into three categories. Firstly, bots, which are software applications “that run automated tasks or scripts over the Internet” . Secondly, trolls, which are people who bicker or agitate other users with the aim of distracting and ruining relationships between people. They generally do this by posting provocative, digressive or irrelevant messages in order to instigate other users to respond with strong emotional content. The last category is cyborgs which are accounts that are registered by humans as a cover in order to run “automated programs performing online activities” . On the contrary, naïve users are regular users who inadvertently join in on the spreading of deceptive news as they misinterpret deceitful news to be the truth. There are two main factors that have been studied that may help explain why naïve users may participate in the spreading of fake news. The first factor is social influence, which “refers to environmental and exogenous factors such as network structure or peer pressure that can influence the dynamics of fake news” . This is demonstrated by “the bandwagon effect, normative influence theory and social identity theory” which illustrates that “peer pressure psychologically impacts user behavior towards fake-news-related activities”. The second factor is self influence. This refers to the intrinsic characteristics of users that can affect how they react to or handle deceptive news. For instance, according to confirmation bias and naïve realism, users are more likely to believe in deceptive news or participate in its related activities if it validates their pre-existing knowledge.

Account analysis

The credibility in Twitter events by creating a data set of tweets that is relevant to trending topics was detected. Using crowd sourcing, they annotated the data sets regarding the veracity of each tweet. 4 features, namely message, user, topic and propagation were analysed using a Decision Tree Model. This method achieved 86% accuracy. Benevuto et al. [ citation needed ] came up with a model that detects spammers by constructing a manual annotated dataset of 1000 records of spam and non-spam accounts. Attributes on content and user behavior were extracted and analysed. This method successfully detected 70% of spam accounts and 96% of non-spam accounts. Chu et al. [ citation needed ] developed a similar detection model which distinguished bot accounts. 3 groups were categorized - humans, bots and cyborgs. A system was built with 4 features of analysis, namely entropy measures, spam detection, account properties and decision making. This method successfully identified the ‘human’ class at 96% accuracy. [12]

Browser add-ons

Browser plugins can detect deceptive content like click bait, bias, conspiracy theory and junk science on social media websites. One example is the ‘Fake News Detector’ which uses machine learning technique to gather a ground truth data set. In addition, crowd wisdom is being used to improve and allow the program to learn. Another example of a browser add-on developed was one that was created by 4 college students during a hackathon by Princeton University. This system makes a real-time analysis of the user's feed and notifies the user of posting or sharing any potentially false content by analyzing keywords, images and sources. [12]

Limitations of detecting fake news

Fake news is not something that is new however. As technology evolves and advances over time, the detection of fake news also becomes more challenging as social media continues to dominate our everyday lives and hence accelerating the speed of which fake news travels. [13] In a recent study published by the journal Science, it analysed millions of tweets sent between 2006 and 2017 and it was found that: “Falsehood diffused significantly farther, faster, deeper, and more broadly than the truth in all categories of information.” It also concluded that “it took the truth about six times as long as falsehood to reach 1,500 people.” Also other than just the sheer speed of how fast fake news travels, it is also more challenging to detect it simply because of how attractive most fake news articles are titled. The same Science paper also revealed that replies to false news tweets contained more expressions of surprise or disgust than true news. [14]

Limitations of cue and feature-based methods

Varied linguistics cues imply that a new cue set must be designed for a prospective situation which makes it difficult to generalize cue and feature engineering methods across different topics and domains. Such approaches therefore would require more human involvement in the design process, evaluation and utilization of these cues for detection. [15]

Limitations of linguistic analysis-based methods

Although this form of method is often deemed to be better than cue-based methods it unfortunately still does not extract and fully exploit the rich semantic and syntactic information in the content. E.g.: The N-gram approach is simple, however it cannot model more complicated contextual dependencies of the text. Syntactic features used alone are also less powerful than word based n-grams and a superficial combination of the two would not be effective in capturing the complex interdependence. [15]

Limitations of deep learning-based method

Fake news detection is still a challenge even to deep learning methods such as Convolutional Neural Network (CNN), Recurrent neural network (RNN), etc., because the content of fake news is planned in a way it resembles the truth so as to deceive readers; and without cross referencing and fact checking, it is often difficult to determine veracity by text analysis alone. [15]

Limitations of existing feedback-based methods

The issue with existing feedback based methods (e.g.: Response User Analysis, Response text analysis, Temporal Pattern Analysis, Propagation Pattern Analysis and Hand-engineered analysis) is the type of training data that models are being trained on. It is usually a snapshot of users’ responses that are usually collected after or towards the end of the propagation process when sufficient responses are available. This encourages and provides a reason for the decreased quality in performance on early detection using trained models when there are fewer responses collected. The methods also do not have the ability to update their state based on incrementally available users' responses. [15]

Limitations of existing intervention-based methods

Intervention based methods (Decontamination, Network monitoring, Crowdsourcing and User Behaviour Modeling) tend to be more difficult to evaluate and tested especially in complex environments where there are many interdependent connections and transactions. Also they might make restrictive assumptions about certain cases which limits their applicability. [15]

Related Research Articles

Peer-to-peer Type of decentralized and distributed network architecture

Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the application. They are said to form a peer-to-peer network of nodes.

Semantic network

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map.

Social network analysis Analysis of social structures using network and graph theory

Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of nodes and the ties, edges, or links that connect them. Examples of social structures commonly visualized through social network analysis include social media networks, memes spread, information circulation, friendship and acquaintance networks, business networks, knowledge networks, difficult working relationships, social networks, collaboration graphs, kinship, disease transmission, and sexual relationships. These networks are often visualized through sociograms in which nodes are represented as points and ties are represented as lines. These visualizations provide a means of qualitatively assessing networks by varying the visual representation of their nodes and edges to reflect attributes of interest.

Image segmentation

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple segments. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

Network theory Study of graphs as a representation of relations between discrete objects

Network theory is the study of graphs as a representation of either symmetric relations or asymmetric relations between discrete objects. In computer science and network science, network theory is a part of graph theory: a network can be defined as a graph in which nodes and/or edges have attributes.

Misinformation is false, inaccurate, or misleading information that is communicated regardless of an intention to deceive. Examples of misinformation are false rumors, insults, and pranks. Disinformation is a subset of misinformation that is deliberately deceptive, e.g., malicious hoaxes, spearphishing, and computational propaganda. The principal effect of misinformation is to elicit fear and suspicion among a population. News parody or satire can become misinformation if it is believed to be credible and communicated as if it were true. The words "misinformation" and "disinformation" have often been associated with the concept of "fake news", which some scholars define as "fabricated information that mimics news media content in form but not in organizational process or intent".

Community structure

In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into sets of nodes such that each set of nodes is densely connected internally. In the particular case of non-overlapping community finding, this implies that the network divides naturally into groups of nodes with dense connections internally and sparser connections between groups. But overlapping communities are also allowed. The more general definition is based on the principle that pairs of nodes are more likely to be connected if they are both members of the same community(ies), and less likely to be connected if they do not share communities. A related but different problem is community search, where the goal is to find a community that a certain vertex belongs to.

In computing, reactive programming is a declarative programming paradigm concerned with data streams and the propagation of change. With this paradigm, it's possible to express static or dynamic data streams with ease, and also communicate that an inferred dependency within the associated execution model exists, which facilitates the automatic propagation of the changed data flow.

Network motif Type of sub-graph

Network motifs are recurrent and statistically significant subgraphs or patterns of a larger graph. All networks, including biological networks, social networks, technological networks and more, can be represented as graphs, which include a wide variety of subgraphs.

Social media measurement

Social media measurement and social media analytics, social listening is a way of computing popularity of a brand or company by extracting information from social media channels, such as blogs, wikis, news sites, micro-blogs such as Twitter, social networking sites, video/photo sharing websites, forums, message boards and user-generated content from time to time. In other words, this is the way to caliber success of social media marketing strategies used by a company or a brand. It is also used by companies to gauge current trends in the industry. The process first gathers data from different websites and then performs analysis based on different metrics like time spent on the page, click through rate, content share, comments, text analytics to identify positive or negative emotions about the brand.

The clique percolation method is a popular approach for analyzing the overlapping community structure of networks. The term network community has no widely accepted unique definition and it is usually defined as a group of nodes that are more densely connected to each other than to other nodes in the network. There are numerous alternative methods for detecting communities in networks, for example, the Girvan–Newman algorithm, hierarchical clustering and modularity maximization.

Social network Social structure made up of a set of social actors

A social network is a social structure made up of a set of social actors, sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for analyzing the structure of whole social entities as well as a variety of theories explaining the patterns observed in these structures. The study of these structures uses social network analysis to identify local and global patterns, locate influential entities, and examine network dynamics.

NetMiner

NetMiner is an application software for exploratory analysis and visualization of large network data based on SNA(Social Network Analysis). It can be used for general research and teaching in social networks. This tool allows researchers to explore their network data visually and interactively, helps them to detect underlying patterns and structures of the network. It features data transformation, network analysis, statistics, visualization of network data, chart, and a programming language based on the Python script language. Also, it enables users to import unstructured text data(e.g. news, articles, tweets, etc.) and extract words and network from text data. In addition, NetMiner SNS Data Collector, an extension of NetMiner, can collect some social networking service(SNS) data with a few clicks.

NodeXL

NodeXL is a network analysis and visualization software package for Microsoft Excel 2007/2010/2013/2016. It is a popular package similar to other network visualization tools such as Pajek, UCINet, and Gephi. It is widely applied in ring, mapping of vertex and edge, and customizable visual attributes and tags. It enables researchers to undertake social network analysis work’s metrics such as centrality, degree, and clustering. It allows us to see the relational data and describe the overall relational network structure. When we applied it in Twitter data analysis, it can show the huge network of all users participating in public discussion and its internal structure through big data mining. It permits social Network analysis (SNA) emphasizes the relationships rather than isolated individuals or organizations, so this method allows us to investigate the ‘two-way dialogue between the organization and the public. SNA also provides researchers with a flexible measurement system and parameter selection to confirm the influential nodes in the network, such as in-degree and out-degree centrality. The free version contains network visualization and social network analysis features. The commercial version includes access to social media network data importers, advanced network metrics, and automation.

Social media mining is the process of obtaining big data from user-generated content on social media sites and mobile apps in order to extract patterns, form conclusions about users, and act upon the information, often for the purpose of advertising to users or conducting research. The term is an analogy to the resource extraction process of mining for rare minerals. Resource extraction mining requires mining companies to shift through vast quantities of raw ore to find the precious minerals; likewise, social media mining requires human data analysts and automated software programs to shift through massive amounts of raw social media data in order to discern patterns and trends relating to social media usage, online behaviours, sharing of content, connections between individuals, online buying behaviour, and more. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as these organizations can use these patterns and trends to design their strategies or introduce new programs, new products, processes or services.

Social navigation is a form of social computing introduced by Dourish and Chalmers in 1994. They defined it as when "movement from one item to another is provoked as an artifact of the activity of another or a group of others". According to later research in 2002, "social navigation exploits the knowledge and experience of peer users of information resources" to guide users in the information space. With all of the digital information available both on the World Wide Web and from other sources, it is becoming increasingly difficult to navigate and search efficiently. Studying others' navigational trails and understanding their behavior can help improve one's own search strategy by helping them to make more informed decisions based on the actions of others. "The idea of social navigation is to aid users to navigate information spaces through making the collective, aggregated, or individual actions of others visible and useful as a basis for making decisions on where to go next and what to choose."

A social bot or troll bot is an agent that communicates more or less autonomously on social media, often with the task of influencing the course of discussion and/or the opinions of its readers. It is related to chatbots but mostly only uses rather simple interactions or no reactivity at all. The messages it distributes are mostly either very simple, or prefabricated, and it often operates in groups and various configurations of partial human control (hybrid). It usually targets advocating certain ideas, supporting campaigns, or aggregating other sources either by acting as a "follower" and/or gathering followers itself. In this very limited respect, social bots can be said to have passed the Turing test. If social media profiles are expected to be human, then social bots represent fake accounts. The automated creation and deployment of many social bots against a distributed system or community is one form of Sybil attack.

Network eavesdropping, also known as eavesdropping attack, sniffing attack, or snooping attack, is a method that retrieves user information through the internet. This attack happens on electronic devices like computers and smartphones. This network attack typically happens under the usage of unsecured networks, such as public wifi connections or shared electronic devices. Eavesdropping attacks through the network is considered one of the most urgent threats in industries that rely on collecting and storing data.

In network theory, link prediction is the problem of predicting the existence of a link between two entities in a network. Examples of link prediction include predicting friendship links among users in a social network, predicting co-authorship links in a citation network, and predicting interactions between genes and proteins in a biological network. Link prediction can also have a temporal aspect, where, given a snapshot of the set of links at time , the goal is to predict the links at time . Link prediction is widely applicable. In e-commerce, link prediction is often a subtask for recommending items to users. In the curation of citation databases, it can be used for record deduplication. In bioinformatics, it has been used to predict protein-protein interactions (PPI). It is also used to identify hidden groups of terrorists and criminals in security related applications.

In network theory, collective classification is the simultaneous prediction of the labels for multiple objects, where each label is predicted using information about the object's observed features, the observed features and labels of its neighbors, and the unobserved labels of its neighbors. Collective classification problems are defined in terms of networks of random variables, where the network structure determines the relationship between the random variables. Inference is performed on multiple random variables simultaneously, typically by propagating information between nodes in the network to perform approximate inference. Approaches that use collective classification can make use of relational information when performing inference. Examples of collective classification include predicting attributes of individuals in a social network, classifying webpages in the World Wide Web, and inferring the research area of a paper in a scientific publication dataset.

References

  1. "Explained: What is False Information (Fake News)?". webwise.ie. 21 June 2018. Retrieved 19 April 2020.
  2. "Why is Fake News Invented?". 30secondes.org. 2019. Retrieved 19 April 2020.
  3. Wardle, Claire (16 February 2017). "Fake News. It's Complicated". First Draft News. Retrieved 19 April 2020.
  4. Horne, Benjamin; Adah, Sibel (2017). "This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News". Eleventh International AAAI Conference on Web and Social Media.: 759–766 via AAAI.
  5. 1 2 3 4 5 Parikh, Shivam B.; Pradeep, K.Atrey (2018). "Media-rich Fake News Detection: A Survey". 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR): 436–441. doi:10.1109/MIPR.2018.00093. ISBN   978-1-5386-1857-8.
  6. 1 2 3 4 5 6 7 Conroy, Niall J.; Rubin, Victoria L.; Chen, Yimin (2016). "Automatic Deception Detection: Methods of Finding Fake News". Proceedings of the Association for Information Science and Technology. 52 (1): 1–4. doi: 10.1002/pra2.2015.145052010082 .
  7. 1 2 Shu, Kai; Sliva, Amy; Wang, Suhang; Tang, Jiliang; Liu, Huan (2017). "Fake News Detection on Social Media: A Data Mining Perspective". ACM SIGKDD Explorations Newsletter. 19 (1): 22–36. doi:10.1145/3137597.3137600.
  8. Feng, Song; Banerjee, Ritwik; Choi, Yejin (2012). "Syntactic Stylometry for Deception Detection". Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 2: 171–175 via ACL.
  9. Liu, Yang; Wu, Yi-Fang Brook (2018). "Early Detection of Fake News on Social Media Through Propagation Path Classification with Recurrent and Convolutional Networks". Thirty-Second AAAI Conference on Artificial Intelligence: 354–361 via AAAI.
  10. 1 2 3 4 Zhou, XinYi; Zafarani, Reza (2018). "Fake News: A Survey of Research, Detection Methods, and Opportunities". ACM Computing Surveys. arXiv: 1812.00315 . Bibcode:2018arXiv181200315Z.
  11. Biyani, Prakhar; Tsioutsiouliklis, Kostas; Blackmer, John (2016). "8 Amazing Secrets for Getting More Clicks". Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence: 94–100 via AAAI.}
  12. 1 2 Figueira, Álvaro Reis; Torgo, Luis; Guimarães, Nuno (2018). "Current State of the Art to Detect Fake News in Social Media and Next Challenges". 14th International Conference on Web Information Systems and Technologies: 332–339. doi: 10.5220/0007188503320339 via ResearchGate.
  13. Resnick, Brian (19 March 2018). "False News Stories Travel Faster and Farther on Twitter than the Truth". Vox. Retrieved 19 April 2020.
  14. Vosoughi, Soroush; Roy, Deb; Aral, Sinan (2018). "The Spread of True and False News Online". Science. 359 (6380): 1146–1151. Bibcode:2018Sci...359.1146V. doi:10.1126/science.aap9559. PMID   29590045 via ScienceMag.
  15. 1 2 3 4 5 Sharma, Karishma; Feng, Qian; He, Jiang; Ruchansky, Natali (2019). "Combating Fake News: A Survey on Identification and Mitigation Techniques". ACM Transactions on Intelligent Systems and Technology(TIST). 10 (3): 1–42. arXiv: 1901.06437 . Bibcode:2019arXiv190106437S. doi:10.1145/3305260.