SKYNET is a program by the U.S. National Security Agency that performs machine learning analysis on communications data to extract information about possible terror suspects. The tool is used to identify targets, such as al-Qaeda couriers, who move between GSM cellular networks. Specifically, mobile usage patterns such as swapping SIM cards within phones that have the same ESN, MEID or IMEI number are deemed indicative of covert activities. [1] [2] Like many other security programs, the SKYNET program uses graphs that consist of a set of nodes and edges to visually represent social networks. [3] The tool also uses classification techniques like random forest analysis. Because the data set includes a very large proportion of true negatives and a small training set, there is a risk of overfitting. [1] Bruce Schneier argues that a false positive rate of 0.008% would be low for commercial applications where "if Google makes a mistake, people see an ad for a car they don't want to buy" but "if the government makes a mistake, they kill innocents." [1]
NSA directorates participating: [2]
It has partnerships with TMAC/FASTSCOPE, MIT Lincoln labs and Harvard. [4]
The SKYNET project was linked with drone systems, thus creating the potential for false-positives to lead to deaths. [1] [5]
According to NSA, the SKYNET project is able to accurately reconstruct crucial information about the suspects including their social relationships, habits, and patterns of movements through graph-based visualization of GSM data. [3] However, scholars criticize that current security literature conflate statistical discrepancies with behavioral abnormalities and that the anomaly detection methodology SKYNET perpetuates the self/other binary. [6] For example, Al-Jazeera's bureau chief in Islamabad, Ahmad Zaidan, was wrongly identified as the most probable member of al-Qaeda and the Muslim Brotherhood on their records. [1] [5] [7]
Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically reported either to an administrator or collected centrally using a security information and event management (SIEM) system. A SIEM system combines outputs from multiple sources and uses alarm filtering techniques to distinguish malicious activity from false alarms.
Total Information Awareness (TIA) was a mass detection program by the United States Information Awareness Office. It operated under this title from February to May 2003 before being renamed Terrorism Information Awareness.
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions. Recently, generative artificial neural networks have been able to surpass many previous approaches in performance. Machine learning approaches have been applied to large language models, computer vision, speech recognition, email filtering, agriculture, and medicine, where it is too costly to develop algorithms to perform the needed tasks.
In artificial intelligence, artificial immune systems (AIS) are a class of computationally intelligent, rule-based machine learning systems inspired by the principles and processes of the vertebrate immune system. The algorithms are typically modeled after the immune system's characteristics of learning and memory for use in problem-solving.
Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.
In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.
The following outline is provided as an overview of and topical guide to cryptography:
Fraud represents a significant problem for governments and businesses and specialized analysis techniques for discovering fraud using them are required. Some of these methods include knowledge discovery in databases (KDD), data mining, machine learning and statistics. They offer applicable and successful solutions in different areas of electronic fraud crimes.
In network theory, link analysis is a data-analysis technique used to evaluate relationships between nodes. Relationships may be identified among various types of nodes (100k), including organizations, people and transactions. Link analysis has been used for investigation of criminal activity, computer security analysis, search engine optimization, market research, medical research, and art.
graph-tool is a Python module for manipulation and statistical analysis of graphs. The core data structures and algorithms of graph-tool are implemented in C++, making extensive use of metaprogramming, based heavily on the Boost Graph Library. Many algorithms are implemented in parallel using OpenMP, which provides increased performance on multi-core architectures.
PRODIGAL is a computer system for predicting anomalous behavior among humans, by data mining network traffic such as emails, text messages and server log entries. It is part of DARPA's Anomaly Detection at Multiple Scales (ADAMS) project. The initial schedule is for two years and the budget $9 million.
Anomaly Detection at Multiple Scales, or ADAMS, was a $35 million DARPA project designed to identify patterns and anomalies in very large data sets. It is under DARPA's Information Innovation office and began in 2011 and ended in August 2014
Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.
Salvatore J. Stolfo is an academic and professor of computer science at Columbia University, specializing in computer security.
Artificial intelligence for video surveillance utilizes computer software programs that analyze the audio and images from video surveillance cameras in order to recognize humans, vehicles, objects, attributes, and events. Security contractors program the software to define restricted areas within the camera's view and program for times of day for the property being protected by the camera surveillance. The artificial intelligence ("A.I.") sends an alert if it detects a trespasser breaking the "rule" set that no person is allowed in that area during that time of day.
This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.
The following outline is provided as an overview of and topical guide to machine learning:
Artificial Intelligence for IT Operations (AIOps) is a term coined by Gartner in 2016 as an industry category for machine learning analytics technology that enhances IT operations analytics. AIOps is the acronym of "Artificial Intelligence Operations". Such operation tasks include automation, performance monitoring and event correlations among others.
Network detection and response (NDR) refers to a category of network security products that detect abnormal system behaviors by continuously analyzing network traffic. NDR solutions apply behavioral analytics to inspect raw network packets and metadata for both internal (east-west) and external (north-south) network communications.