SKYNET (surveillance program)

Last updated

SKYNET is a program by the U.S. National Security Agency that performs machine learning analysis on communications data to extract information about possible terror suspects. The tool is used to identify targets, such as al-Qaeda couriers, who move between GSM cellular networks. Specifically, mobile usage patterns such as swapping SIM cards within phones that have the same ESN, MEID or IMEI number are deemed indicative of covert activities. [1] [2] Like many other security programs, the SKYNET program uses graphs that consist of a set of nodes and edges to visually represent social networks. [3] The tool also uses classification techniques like random forest analysis. Because the data set includes a very large proportion of true negatives and a small training set, there is a risk of overfitting. [1] Bruce Schneier argues that a false positive rate of 0.008% would be low for commercial applications where "if Google makes a mistake, people see an ad for a car they don't want to buy" but "if the government makes a mistake, they kill innocents." [1]

Contents

Participation and partnerships

NSA directorates participating: [2]

It has partnerships with TMAC/FASTSCOPE, MIT Lincoln labs and Harvard. [4]

Controversy

The SKYNET project was linked with drone systems, thus creating the potential for false-positives to lead to deaths. [1] [5]

According to NSA, the SKYNET project is able to accurately reconstruct crucial information about the suspects including their social relationships, habits, and patterns of movements through graph-based visualization of GSM data. [3] However, scholars criticize that current security literature conflate statistical discrepancies with behavioral abnormalities and that the anomaly detection methodology SKYNET perpetuates the self/other binary. [6] For example, Al-Jazeera's bureau chief in Islamabad, Ahmad Zaidan, was wrongly identified as the most probable member of al-Qaeda and the Muslim Brotherhood on their records. [1] [5] [7]

See also

Related Research Articles

<span class="mw-page-title-main">Data mining</span> Process of extracting and discovering patterns in large data sets

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically reported either to an administrator or collected centrally using a security information and event management (SIEM) system. A SIEM system combines outputs from multiple sources and uses alarm filtering techniques to distinguish malicious activity from false alarms.

<span class="mw-page-title-main">Total Information Awareness</span> US mass detection program

Total Information Awareness (TIA) was a mass detection program by the United States Information Awareness Office. It operated under this title from February to May 2003 before being renamed Terrorism Information Awareness.

<span class="mw-page-title-main">Machine learning</span> Study of algorithms that improve automatically through experience

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions. Recently, generative artificial neural networks have been able to surpass many previous approaches in performance. Machine learning approaches have been applied to large language models, computer vision, speech recognition, email filtering, agriculture, and medicine, where it is too costly to develop algorithms to perform the needed tasks.

In artificial intelligence, artificial immune systems (AIS) are a class of computationally intelligent, rule-based machine learning systems inspired by the principles and processes of the vertebrate immune system. The algorithms are typically modeled after the immune system's characteristics of learning and memory for use in problem-solving.

<span class="mw-page-title-main">Grammar induction</span>

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

<span class="mw-page-title-main">Anomaly detection</span> Approach in data analysis

In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

The following outline is provided as an overview of and topical guide to cryptography:

Fraud represents a significant problem for governments and businesses and specialized analysis techniques for discovering fraud using them are required. Some of these methods include knowledge discovery in databases (KDD), data mining, machine learning and statistics. They offer applicable and successful solutions in different areas of electronic fraud crimes.

In network theory, link analysis is a data-analysis technique used to evaluate relationships between nodes. Relationships may be identified among various types of nodes (100k), including organizations, people and transactions. Link analysis has been used for investigation of criminal activity, computer security analysis, search engine optimization, market research, medical research, and art.

graph-tool is a Python module for manipulation and statistical analysis of graphs. The core data structures and algorithms of graph-tool are implemented in C++, making extensive use of metaprogramming, based heavily on the Boost Graph Library. Many algorithms are implemented in parallel using OpenMP, which provides increased performance on multi-core architectures.

PRODIGAL is a computer system for predicting anomalous behavior among humans, by data mining network traffic such as emails, text messages and server log entries. It is part of DARPA's Anomaly Detection at Multiple Scales (ADAMS) project. The initial schedule is for two years and the budget $9 million.

Anomaly Detection at Multiple Scales, or ADAMS, was a $35 million DARPA project designed to identify patterns and anomalies in very large data sets. It is under DARPA's Information Innovation office and began in 2011 and ended in August 2014

<span class="mw-page-title-main">Deeplearning4j</span> Open-source deep learning library

Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.

<span class="mw-page-title-main">Salvatore J. Stolfo</span> American computer scientist

Salvatore J. Stolfo is an academic and professor of computer science at Columbia University, specializing in computer security.

<span class="mw-page-title-main">Artificial intelligence for video surveillance</span> Overview of artificial intelligence for surveillance

Artificial intelligence for video surveillance utilizes computer software programs that analyze the audio and images from video surveillance cameras in order to recognize humans, vehicles, objects, attributes, and events. Security contractors program the software to define restricted areas within the camera's view and program for times of day for the property being protected by the camera surveillance. The artificial intelligence ("A.I.") sends an alert if it detects a trespasser breaking the "rule" set that no person is allowed in that area during that time of day.

<span class="mw-page-title-main">Glossary of artificial intelligence</span> List of definitions of terms and concepts commonly used in the study of artificial intelligence

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

<span class="mw-page-title-main">Outline of machine learning</span> Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning:

Artificial Intelligence for IT Operations (AIOps) is a term coined by Gartner in 2016 as an industry category for machine learning analytics technology that enhances IT operations analytics. AIOps is the acronym of "Artificial Intelligence Operations". Such operation tasks include automation, performance monitoring and event correlations among others.

Network detection and response (NDR) refers to a category of network security products that detect abnormal system behaviors by continuously analyzing network traffic. NDR solutions apply behavioral analytics to inspect raw network packets and metadata for both internal (east-west) and external (north-south) network communications.

References

  1. 1 2 3 4 5 Grothoff, Christian; Porup, J. M. (16 February 2016). "The NSA's SKYNET program may be killing thousands of innocent people". Ars Technica UK .
  2. 1 2 "SKYNET: Applying Advanced Cloud-based Behavior Analytics". The Intercept . 8 May 2015.
  3. 1 2 NSA. "SKYNET: Courier Detection via Machine Learning". Snowden Doc Search. Archived from the original on 2019-12-08. Retrieved December 7, 2019.
  4. Crockford, Kade (8 May 2015). "MIT and Harvard Worked with NSA on SKYNET Project". Privacy SOS.
  5. 1 2 Robbins, Martin (18 February 2016). "Has a rampaging AI algorithm really killed thousands in Pakistan?". The Guardian .
  6. Aradau, Claudia; Blanke, Tobias (February 2018). "Governing others: Anomaly and the algorithmic subject of security". European Journal of International Security. 3 (1): 1–21. doi:10.1017/eis.2017.14. ISSN   2057-5637. S2CID   58923172.
  7. http://cdn.arstechnica.net/wp-content/uploads/sites/3/2016/02/skynet-courier-detection-via-machine-learning-p17-normal.gif [ bare URL image file ]