Adversarial machine learning

Last updated

Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. [1] A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications. [2]

Contents

Most machine learning techniques are mostly designed to work on specific problem sets, under the assumption that the training and test data are generated from the same statistical distribution (IID). However, this assumption is often dangerously violated in practical high-stake applications, where users may intentionally supply fabricated data that violates the statistical assumption.

Most common attacks in adversarial machine learning include evasion attacks, [3] data poisoning attacks, [4] Byzantine attacks [5] and model extraction. [6]

History

At the MIT Spam Conference in January 2004, John Graham-Cumming showed that a machine-learning spam filter could be used to defeat another machine-learning spam filter by automatically learning which words to add to a spam email to get the email classified as not spam. [7]

In 2004, Nilesh Dalvi and others noted that linear classifiers used in spam filters could be defeated by simple "evasion attacks" as spammers inserted "good words" into their spam emails. (Around 2007, some spammers added random noise to fuzz words within "image spam" in order to defeat OCR-based filters.) In 2006, Marco Barreno and others published "Can Machine Learning Be Secure?", outlining a broad taxonomy of attacks. As late as 2013 many researchers continued to hope that non-linear classifiers (such as support vector machines and neural networks) might be robust to adversaries, until Battista Biggio and others demonstrated the first gradient-based attacks on such machine-learning models (2012 [8] –2013 [9] ). In 2012, deep neural networks began to dominate computer vision problems; starting in 2014, Christian Szegedy and others demonstrated that deep neural networks could be fooled by adversaries, again using a gradient-based attack to craft adversarial perturbations. [10] [11]

Recently, it was observed that adversarial attacks are harder to produce in the practical world due to the different environmental constraints that cancel out the effect of noise. [12] [13] For example, any small rotation or slight illumination on an adversarial image can destroy the adversariality. In addition, researchers such as Google Brain's Nicholas Frosst point out that it is much easier to make self-driving cars [14] miss stop signs by physically removing the sign itself, rather than creating adversarial examples. [15] Frosst also believes that the adversarial machine learning community incorrectly assumes models trained on a certain data distribution will also perform well on a completely different data distribution. He suggests that a new approach to machine learning should be explored, and is currently working on a unique neural network that has characteristics more similar to human perception than state-of-the-art approaches. [15]

While adversarial machine learning continues to be heavily rooted in academia, large tech companies such as Google, Microsoft, and IBM have begun curating documentation and open source code bases to allow others to concretely assess the robustness of machine learning models and minimize the risk of adversarial attacks. [16] [17] [18]

Examples

Examples include attacks in spam filtering, where spam messages are obfuscated through the misspelling of "bad" words or the insertion of "good" words; [19] [20] attacks in computer security, such as obfuscating malware code within network packets or modifying the characteristics of a network flow to mislead intrusion detection; [21] [22] attacks in biometric recognition where fake biometric traits may be exploited to impersonate a legitimate user; [23] or to compromise users' template galleries that adapt to updated traits over time.

Researchers showed that by changing only one-pixel it was possible to fool deep learning algorithms. [24] Others 3-D printed a toy turtle with a texture engineered to make Google's object detection AI classify it as a rifle regardless of the angle from which the turtle was viewed. [25] Creating the turtle required only low-cost commercially available 3-D printing technology. [26]

A machine-tweaked image of a dog was shown to look like a cat to both computers and humans. [27] A 2019 study reported that humans can guess how machines will classify adversarial images. [28] Researchers discovered methods for perturbing the appearance of a stop sign such that an autonomous vehicle classified it as a merge or speed limit sign. [14] [29]

McAfee attacked Tesla's former Mobileye system, fooling it into driving 50 mph over the speed limit, simply by adding a two-inch strip of black tape to a speed limit sign. [30] [31]

Adversarial patterns on glasses or clothing designed to deceive facial-recognition systems or license-plate readers, have led to a niche industry of "stealth streetwear". [32]

An adversarial attack on a neural network can allow an attacker to inject algorithms into the target system. [33] Researchers can also create adversarial audio inputs to disguise commands to intelligent assistants in benign-seeming audio; [34] a parallel literature explores human perception of such stimuli. [35] [36]

Clustering algorithms are used in security applications. Malware and computer virus analysis aims to identify malware families, and to generate specific detection signatures. [37] [38]

Attack modalities

Taxonomy

Attacks against (supervised) machine learning algorithms have been categorized along three primary axes: [39] influence on the classifier, the security violation and their specificity.

This taxonomy has been extended into a more comprehensive threat model that allows explicit assumptions about the adversary's goal, knowledge of the attacked system, capability of manipulating the input data/system components, and on attack strategy. [41] [42] This taxonomy has further been extended to include dimensions for defense strategies against adversarial attacks. [43]

Strategies

Below are some of the most commonly encountered attack scenarios.

Data poisoning

Poisoning consists of contaminating the training dataset with data designed to increase errors in the output. Given that learning algorithms are shaped by their training datasets, poisoning can effectively reprogram algorithms with potentially malicious intent. Concerns have been raised especially for user-generated training data, e.g. for content recommendation or natural language models. The ubiquity of fake accounts offers many opportunities for poisoning. Facebook reportedly removes around 7 billion fake accounts per year. [44] [45] Poisoning has been reported as the leading concern for industrial applications. [2]

On social medias, disinformation campaigns attempt to bias recommendation and moderation algorithms, to push certain content over others.

A particular case of data poisoning is the backdoor attack, [46] which aims to teach a specific behavior for inputs with a given trigger, e.g. a small defect on images, sounds, videos or texts.

For instance, intrusion detection systems are often trained using collected data. An attacker may poison this data by injecting malicious samples during operation that subsequently disrupt retraining. [41] [42] [39] [47] [48]

Data poisoning techniques can also be applied to text-to-image models to alter their output, which can be used by artists to defend their copyrighted works or artistic style against imitation. [49]

Data poisoning can also happen unintentionally through model collapse, where models are trained on synthetic data. [50]

Byzantine attacks

As machine learning is scaled, it often relies on multiple computing machines. In federated learning, for instance, edge devices collaborate with a central server, typically by sending gradients or model parameters. However, some of these devices may deviate from their expected behavior, e.g. to harm the central server's model [51] or to bias algorithms towards certain behaviors (e.g., amplifying the recommendation of disinformation content). On the other hand, if the training is performed on a single machine, then the model is very vulnerable to a failure of the machine, or an attack on the machine; the machine is a single point of failure. [52] In fact, the machine owner may themselves insert provably undetectable backdoors. [53]

The current leading solutions to make (distributed) learning algorithms provably resilient to a minority of malicious (a.k.a. Byzantine) participants are based on robust gradient aggregation rules. [54] [55] [56] [57] [58] [59] The robust aggregation rules do not always work especially when the data across participants has a non-iid distribution. Nevertheless, in the context of heterogeneous honest participants, such as users with different consumption habits for recommendation algorithms or writing styles for language models, there are provable impossibility theorems on what any robust learning algorithm can guarantee. [5] [60]

Evasion

Evasion attacks [9] [41] [42] [61] consist of exploiting the imperfection of a trained model. For instance, spammers and hackers often attempt to evade detection by obfuscating the content of spam emails and malware. Samples are modified to evade detection; that is, to be classified as legitimate. This does not involve influence over the training data. A clear example of evasion is image-based spam in which the spam content is embedded within an attached image to evade textual analysis by anti-spam filters. Another example of evasion is given by spoofing attacks against biometric verification systems. [23]

Evasion attacks can be generally split into two different categories: black box attacks and white box attacks. [17]

Model extraction

Model extraction involves an adversary probing a black box machine learning system in order to extract the data it was trained on. [62] [63] This can cause issues when either the training data or the model itself is sensitive and confidential. For example, model extraction could be used to extract a proprietary stock trading model which the adversary could then use for their own financial benefit.

In the extreme case, model extraction can lead to model stealing, which corresponds to extracting a sufficient amount of data from the model to enable the complete reconstruction of the model.

On the other hand, membership inference is a targeted model extraction attack, which infers the owner of a data point, often by leveraging the overfitting resulting from poor machine learning practices. [64] Concerningly, this is sometimes achievable even without knowledge or access to a target model's parameters, raising security concerns for models trained on sensitive data, including but not limited to medical records and/or personally identifiable information. With the emergence of transfer learning and public accessibility of many state of the art machine learning models, tech companies are increasingly drawn to create models based on public ones, giving attackers freely accessible information to the structure and type of model being used. [64]

Categories

Adversarial deep reinforcement learning

Adversarial deep reinforcement learning is an active area of research in reinforcement learning focusing on vulnerabilities of learned policies. In this research area, some studies initially showed that reinforcement learning policies are susceptible to imperceptible adversarial manipulations. [65] [66] While some methods have been proposed to overcome these susceptibilities, in the most recent studies it has been shown that these proposed solutions are far from providing an accurate representation of current vulnerabilities of deep reinforcement learning policies. [67]

Adversarial natural language processing

Adversarial attacks on speech recognition have been introduced for speech-to-text applications, in particular for Mozilla's implementation of DeepSpeech. [68]

Adversarial attacks and training in linear models

There is a growing literature about adversarial attacks in linear models. Indeed, since the seminal work from Goodfellow at al. [69] studying these models in linear models has been an important tool to understand how adversarial attacks affect machine learning models. The analysis of these models is simplified because the computation of adversarial attacks can be simplified in linear regression and classification problems. Moreover, adversarial training is convex in this case. [70]

Linear models allow for analytical analysis while still reproducing phenomena observed in state-of-the-art models. One prime example of that is how this model can be used to explain the trade-off between robustness and accuracy. [71] Diverse work indeed provides analysis of adversarial attacks in linear models, including asymptotic analysis for classification [72] and for linear regression. [73] [74] And, finite-sample analysis based on Rademacher complexity. [75]


Specific attack types

There are a large variety of different adversarial attacks that can be used against machine learning systems. Many of these work on both deep learning systems as well as traditional machine learning models such as SVMs [8] and linear regression. [76] A high level sample of these attack types include:

Adversarial examples

An adversarial example refers to specially crafted input that is designed to look "normal" to humans but causes misclassification to a machine learning model. Often, a form of specially designed "noise" is used to elicit the misclassifications. Below are some current techniques for generating adversarial examples in the literature (by no means an exhaustive list).

Black box attacks

Black box attacks in adversarial machine learning assume that the adversary can only get outputs for provided inputs and has no knowledge of the model structure or parameters. [17] [85] In this case, the adversarial example is generated either using a model created from scratch, or without any model at all (excluding the ability to query the original model). In either case, the objective of these attacks is to create adversarial examples that are able to transfer to the black box model in question. [86]

Simple Black-box Adversarial Attacks

Simple Black-box Adversarial Attacks is a query-efficient way to attack black-box image classifiers. [87]

Take a random orthonormal basis in . The authors suggested the discrete cosine transform of the standard basis (the pixels).

For a correctly classified image , try , and compare the amount of error in the classifier upon . Pick the one that causes the largest amount of error.

Repeat this for until the desired level of error in the classifier is reached.

It was discovered when the authors designed a simple baseline to compare with a previous black-box adversarial attack algorithm based on gaussian processes, and were surprised that the baseline worked even better. [88]

Square Attack

The Square Attack was introduced in 2020 as a black box evasion adversarial attack based on querying classification scores without the need of gradient information. [89] As a score based black box attack, this adversarial approach is able to query probability distributions across model output classes, but has no other access to the model itself. According to the paper's authors, the proposed Square Attack required fewer queries than when compared to state-of-the-art score-based black box attacks at the time. [89]

To describe the function objective, the attack defines the classifier as , with representing the dimensions of the input and as the total number of output classes. returns the score (or a probability between 0 and 1) that the input belongs to class , which allows the classifier's class output for any input to be defined as . The goal of this attack is as follows: [89]

In other words, finding some perturbed adversarial example such that the classifier incorrectly classifies it to some other class under the constraint that and are similar. The paper then defines loss as and proposes the solution to finding adversarial example as solving the below constrained optimization problem: [89]

The result in theory is an adversarial example that is highly confident in the incorrect class but is also very similar to the original image. To find such example, Square Attack utilizes the iterative random search technique to randomly perturb the image in hopes of improving the objective function. In each step, the algorithm perturbs only a small square section of pixels, hence the name Square Attack, which terminates as soon as an adversarial example is found in order to improve query efficiency. Finally, since the attack algorithm uses scores and not gradient information, the authors of the paper indicate that this approach is not affected by gradient masking, a common technique formerly used to prevent evasion attacks. [89]

HopSkipJump Attack

This black box attack was also proposed as a query efficient attack, but one that relies solely on access to any input's predicted output class. In other words, the HopSkipJump attack does not require the ability to calculate gradients or access to score values like the Square Attack, and will require just the model's class prediction output (for any given input). The proposed attack is split into two different settings, targeted and untargeted, but both are built from the general idea of adding minimal perturbations that leads to a different model output. In the targeted setting, the goal is to cause the model to misclassify the perturbed image to a specific target label (that is not the original label). In the untargeted setting, the goal is to cause the model to misclassify the perturbed image to any label that is not the original label. The attack objectives for both are as follows where is the original image, is the adversarial image, is a distance function between images, is the target label, and is the model's classification class label function: [90]

To solve this problem, the attack proposes the following boundary function for both the untargeted and targeted setting: [90]

This can be further simplified to better visualize the boundary between different potential adversarial examples: [90]

With this boundary function, the attack then follows an iterative algorithm to find adversarial examples for a given image that satisfies the attack objectives.

  1. Initialize to some point where
  2. Iterate below
    1. Boundary search
    2. Gradient update
      • Compute the gradient
      • Find the step size

Boundary search uses a modified binary search to find the point in which the boundary (as defined by ) intersects with the line between and . The next step involves calculating the gradient for , and update the original using this gradient and a pre-chosen step size. HopSkipJump authors prove that this iterative algorithm will converge, leading to a point right along the boundary that is very close in distance to the original image. [90]

However, since HopSkipJump is a proposed black box attack and the iterative algorithm above requires the calculation of a gradient in the second iterative step (which black box attacks do not have access to), the authors propose a solution to gradient calculation that requires only the model's output predictions alone. [90] By generating many random vectors in all directions, denoted as , an approximation of the gradient can be calculated using the average of these random vectors weighted by the sign of the boundary function on the image , where is the size of the random vector perturbation: [90]

The result of the equation above gives a close approximation of the gradient required in step 2 of the iterative algorithm, completing HopSkipJump as a black box attack. [91] [92] [90]

White box attacks

White box attacks assumes that the adversary has access to model parameters on top of being able to get labels for provided inputs. [86]

Fast gradient sign method

One of the first proposed attacks for generating adversarial examples was proposed by Google researchers Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. [93] The attack was called fast gradient sign method (FGSM), and it consists of adding a linear amount of in-perceivable noise to the image and causing a model to incorrectly classify it. This noise is calculated by multiplying the sign of the gradient with respect to the image we want to perturb by a small constant epsilon. As epsilon increases, the model is more likely to be fooled, but the perturbations become easier to identify as well. Shown below is the equation to generate an adversarial example where is the original image, is a very small number, is the gradient function, is the loss function, is the model weights, and is the true label. [94]

One important property of this equation is that the gradient is calculated with respect to the input image since the goal is to generate an image that maximizes the loss for the original image of true label . In traditional gradient descent (for model training), the gradient is used to update the weights of the model since the goal is to minimize the loss for the model on a ground truth dataset. The Fast Gradient Sign Method was proposed as a fast way to generate adversarial examples to evade the model, based on the hypothesis that neural networks cannot resist even linear amounts of perturbation to the input. [95] [94] [93] FGSM has shown to be effective in adversarial attacks for image classification and skeletal action recognition. [96]

Carlini & Wagner (C&W)

In an effort to analyze existing adversarial attacks and defenses, researchers at the University of California, Berkeley, Nicholas Carlini and David Wagner in 2016 propose a faster and more robust method to generate adversarial examples. [97]

The attack proposed by Carlini and Wagner begins with trying to solve a difficult non-linear optimization equation: [63]

Here the objective is to minimize the noise (), added to the original input , such that the machine learning algorithm () predicts the original input with delta (or ) as some other class . However instead of directly the above equation, Carlini and Wagner propose using a new function such that: [63]

This condenses the first equation to the problem below: [63]

and even more to the equation below: [63]

Carlini and Wagner then propose the use of the below function in place of using , a function that determines class probabilities for given input . When substituted in, this equation can be thought of as finding a target class that is more confident than the next likeliest class by some constant amount: [63]

When solved using gradient descent, this equation is able to produce stronger adversarial examples when compared to fast gradient sign method that is also able to bypass defensive distillation, a defense that was once proposed to be effective against adversarial examples. [98] [99] [97] [63]

Defenses

Conceptual representation of the proactive arms race Proactive arms race.jpg
Conceptual representation of the proactive arms race

Researchers have proposed a multi-step approach to protecting machine learning. [11]

Mechanisms

A number of defense mechanisms against evasion, poisoning, and privacy attacks have been proposed, including:

See also

Related Research Articles

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

In machine learning, a neural network is a model inspired by the structure and function of biological neural networks in animal brains.

In Vapnik–Chervonenkis theory, the Vapnik–Chervonenkis (VC) dimension is a measure of the size of a class of sets. The notion can be extended to classes of binary functions. It is defined as the cardinality of the largest set of points that the algorithm can shatter, which means the algorithm can always learn a perfect classifier for any labeling of at least one configuration of those data points. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis.

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Inherently, Multi-task learning is a multi-objective optimization problem having trade-offs between different tasks. Early versions of MTL were called "hints".

Stochastic gradient descent is an iterative method for optimizing an objective function with suitable smoothness properties. It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient by an estimate thereof. Especially in high-dimensional optimization problems this reduces the very high computational burden, achieving faster iterations in exchange for a lower convergence rate.

In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsistent, but three major types can be distinguished, following Jebara (2004):

  1. A generative model is a statistical model of the joint probability distribution on a given observable variable X and target variable Y; A generative model can be used to "generate" random instances (outcomes) of an observation x.
  2. A discriminative model is a model of the conditional probability of the target Y, given an observation x. It can be used to "discriminate" the value of the target variable Y, given an observation x.
  3. Classifiers computed without using a probability model are also referred to loosely as "discriminative".

In machine learning, backpropagation is a gradient estimation method commonly used for training neural networks to compute the network parameter updates.

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction, to generate lower-dimensional embeddings for subsequent use by other machine learning algorithms.

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches.

Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.

In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. A variant for classification is also sometimes used.

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weight and bias levels of a network when it is simulated in a specific data environment. A learning rule may accept existing conditions of the network, and will compare the expected result and actual result of the network to give new and improved values for the weights and biases. Depending on the complexity of the model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.

In machine learning, the vanishing gradient problem is encountered when training neural networks with gradient-based learning methods and backpropagation. In such methods, during each training iteration, each neural network weight receives an update proportional to the partial derivative of the loss function with respect to the current weight. The problem is that as the network depth or sequence length increases, the gradient magnitude typically is expected to decrease, slowing the training process. In the worst case, this may completely stop the neural network from further learning. As one example of the problem cause, traditional activation functions such as the hyperbolic tangent function have gradients in the range [-1,1], and backpropagation computes gradients using the chain rule. This has the effect of multiplying n of these small numbers to compute gradients of the early layers in an n-layer network, meaning that the gradient decreases exponentially with n while the early layers train very slowly.

<span class="mw-page-title-main">Quantum machine learning</span> Interdisciplinary research area at the intersection of quantum physics and machine learning

Quantum machine learning is the integration of quantum algorithms within machine learning programs.

<span class="mw-page-title-main">Domain adaptation</span> Field associated with machine learning and transfer learning

Domain adaptation is a field associated with machine learning and transfer learning. This scenario arises when we aim at learning a model from a source data distribution and applying that model on a different target data distribution. For instance, one of the tasks of the common spam filtering problem consists in adapting a model from one user to a new user who receives significantly different emails. Domain adaptation has also been shown to be beneficial to learning unrelated sources. When more than one source distribution is available, the problem is referred to as multi-source domain adaptation.

<span class="mw-page-title-main">Generative adversarial network</span> Deep learning method

A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.

<span class="mw-page-title-main">Residual neural network</span> Type of artificial neural network

A residual neural network is a deep learning architecture in which the layers learn residual functions with reference to the layer inputs. It was developed in 2015 for image recognition, and won the ImageNet Large Scale Visual Recognition Challenge of that year.

Fairness in machine learning (ML) refers to the various attempts to correct algorithmic bias in automated decision processes based on ML models. Decisions made by such models after a learning process may be considered unfair if they were based on variables considered sensitive.

An energy-based model (EBM) is an application of canonical ensemble formulation from statistical physics for learning from data. The approach prominently appears in generative artificial intelligence.

Self-supervised learning (SSL) is a paradigm in machine learning where a model is trained on a task using the data itself to generate supervisory signals, rather than relying on external labels provided by humans. In the context of neural networks, self-supervised learning aims to leverage inherent structures or relationships within the input data to create meaningful training signals. SSL tasks are designed so that solving it requires capturing essential features or relationships in the data. The input data is typically augmented or transformed in a way that creates pairs of related samples. One sample serves as the input, and the other is used to formulate the supervisory signal. This augmentation can involve introducing noise, cropping, rotation, or other transformations. Self-supervised learning more closely imitates the way humans learn to classify objects.

In probability theory, integral probability metrics are types of distance functions between probability distributions, defined by how well a class of functions can distinguish the two distributions. Many important statistical distances are integral probability metrics, including the Wasserstein-1 distance and the total variation distance. In addition to theoretical importance, integral probability metrics are widely used in areas of statistics and machine learning.

References

  1. Kianpour, Mazaher; Wen, Shao-Fang (2020). "Timing Attacks on Machine Learning: State of the Art". Intelligent Systems and Applications. Advances in Intelligent Systems and Computing. Vol. 1037. pp. 111–125. doi:10.1007/978-3-030-29516-5_10. ISBN   978-3-030-29515-8. S2CID   201705926.
  2. 1 2 Siva Kumar, Ram Shankar; Nyström, Magnus; Lambert, John; Marshall, Andrew; Goertzel, Mario; Comissoneru, Andi; Swann, Matt; Xia, Sharon (May 2020). "Adversarial Machine Learning-Industry Perspectives". 2020 IEEE Security and Privacy Workshops (SPW). pp. 69–75. doi:10.1109/SPW50608.2020.00028. ISBN   978-1-7281-9346-5. S2CID   229357721.
  3. Goodfellow, Ian; McDaniel, Patrick; Papernot, Nicolas (25 June 2018). "Making machine learning robust against adversarial inputs". Communications of the ACM. 61 (7): 56–66. doi: 10.1145/3134599 . ISSN   0001-0782.[ permanent dead link ]
  4. Geiping, Jonas; Fowl, Liam H.; Huang, W. Ronny; Czaja, Wojciech; Taylor, Gavin; Moeller, Michael; Goldstein, Tom (2020-09-28). Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching. International Conference on Learning Representations 2021 (Poster).
  5. 1 2 3 El-Mhamdi, El Mahdi; Farhadkhani, Sadegh; Guerraoui, Rachid; Guirguis, Arsany; Hoang, Lê-Nguyên; Rouault, Sébastien (2021-12-06). "Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning)". Advances in Neural Information Processing Systems. 34. arXiv: 2008.00742 .
  6. Tramèr, Florian; Zhang, Fan; Juels, Ari; Reiter, Michael K.; Ristenpart, Thomas (2016). Stealing Machine Learning Models via Prediction {APIs}. 25th USENIX Security Symposium. pp. 601–618. ISBN   978-1-931971-32-4.
  7. "How to beat an adaptive/Bayesian spam filter (2004)" . Retrieved 2023-07-05.
  8. 1 2 Biggio, Battista; Nelson, Blaine; Laskov, Pavel (2013-03-25). "Poisoning Attacks against Support Vector Machines". arXiv: 1206.6389 [cs.LG].
  9. 1 2 3 Biggio, Battista; Corona, Igino; Maiorca, Davide; Nelson, Blaine; Srndic, Nedim; Laskov, Pavel; Giacinto, Giorgio; Roli, Fabio (2013). "Evasion Attacks against Machine Learning at Test Time". Advanced Information Systems Engineering. Lecture Notes in Computer Science. Vol. 7908. Springer. pp. 387–402. arXiv: 1708.06131 . doi:10.1007/978-3-642-40994-3_25. ISBN   978-3-642-38708-1. S2CID   18716873.
  10. Szegedy, Christian; Zaremba, Wojciech; Sutskever, Ilya; Bruna, Joan; Erhan, Dumitru; Goodfellow, Ian; Fergus, Rob (2014-02-19). "Intriguing properties of neural networks". arXiv: 1312.6199 [cs.CV].
  11. 1 2 Biggio, Battista; Roli, Fabio (December 2018). "Wild patterns: Ten years after the rise of adversarial machine learning". Pattern Recognition. 84: 317–331. arXiv: 1712.03141 . Bibcode:2018PatRe..84..317B. doi:10.1016/j.patcog.2018.07.023. S2CID   207324435.
  12. Kurakin, Alexey; Goodfellow, Ian; Bengio, Samy (2016). "Adversarial examples in the physical world". arXiv: 1607.02533 [cs.CV].
  13. Gupta, Kishor Datta, Dipankar Dasgupta, and Zahid Akhtar. "Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Techniques." 2020 IEEE Symposium Series on Computational Intelligence (SSCI). 2020.
  14. 1 2 Lim, Hazel Si Min; Taeihagh, Araz (2019). "Algorithmic Decision-Making in AVs: Understanding Ethical and Technical Concerns for Smart Cities". Sustainability. 11 (20): 5791. arXiv: 1910.13122 . Bibcode:2019arXiv191013122L. doi: 10.3390/su11205791 . S2CID   204951009.
  15. 1 2 "Google Brain's Nicholas Frosst on Adversarial Examples and Emotional Responses". Synced. 2019-11-21. Retrieved 2021-10-23.
  16. "Responsible AI practices". Google AI. Retrieved 2021-10-23.
  17. 1 2 3 Adversarial Robustness Toolbox (ART) v1.8, Trusted-AI, 2021-10-23, retrieved 2021-10-23
  18. amarshal. "Failure Modes in Machine Learning - Security documentation". docs.microsoft.com. Retrieved 2021-10-23.
  19. 1 2 Biggio, Battista; Fumera, Giorgio; Roli, Fabio (2010). "Multiple classifier systems for robust classifier design in adversarial environments". International Journal of Machine Learning and Cybernetics. 1 (1–4): 27–41. doi:10.1007/s13042-010-0007-7. hdl:11567/1087824. ISSN   1868-8071. S2CID   8729381. Archived from the original on 2023-01-19. Retrieved 2015-01-14.
  20. 1 2 Brückner, Michael; Kanzow, Christian; Scheffer, Tobias (2012). "Static Prediction Games for Adversarial Learning Problems" (PDF). Journal of Machine Learning Research. 13 (Sep): 2617–2654. ISSN   1533-7928.
  21. Apruzzese, Giovanni; Andreolini, Mauro; Ferretti, Luca; Marchetti, Mirco; Colajanni, Michele (2021-06-03). "Modeling Realistic Adversarial Attacks against Network Intrusion Detection Systems". Digital Threats: Research and Practice. 3 (3): 1–19. arXiv: 2106.09380 . doi:10.1145/3469659. ISSN   2692-1626. S2CID   235458519.
  22. 1 2 Vitorino, João; Oliveira, Nuno; Praça, Isabel (March 2022). "Adaptative Perturbation Patterns: Realistic Adversarial Learning for Robust Intrusion Detection". Future Internet. 14 (4): 108. doi: 10.3390/fi14040108 . hdl: 10400.22/21851 . ISSN   1999-5903.
  23. 1 2 Rodrigues, Ricardo N.; Ling, Lee Luan; Govindaraju, Venu (1 June 2009). "Robustness of multimodal biometric fusion methods against spoof attacks" (PDF). Journal of Visual Languages & Computing. 20 (3): 169–179. doi:10.1016/j.jvlc.2009.01.010. ISSN   1045-926X.
  24. Su, Jiawei; Vargas, Danilo Vasconcellos; Sakurai, Kouichi (October 2019). "One Pixel Attack for Fooling Deep Neural Networks". IEEE Transactions on Evolutionary Computation. 23 (5): 828–841. arXiv: 1710.08864 . doi:10.1109/TEVC.2019.2890858. ISSN   1941-0026. S2CID   2698863.
  25. "Single pixel change fools AI programs". BBC News. 3 November 2017. Retrieved 12 February 2018.
  26. Athalye, Anish; Engstrom, Logan; Ilyas, Andrew; Kwok, Kevin (2017). "Synthesizing Robust Adversarial Examples". arXiv: 1707.07397 [cs.CV].
  27. "AI Has a Hallucination Problem That's Proving Tough to Fix". WIRED. 2018. Retrieved 10 March 2018.
  28. Zhou, Zhenglong; Firestone, Chaz (2019). "Humans can decipher adversarial images". Nature Communications. 10 (1): 1334. arXiv: 1809.04120 . Bibcode:2019NatCo..10.1334Z. doi: 10.1038/s41467-019-08931-6 . PMC   6430776 . PMID   30902973.
  29. Ackerman, Evan (2017-08-04). "Slight Street Sign Modifications Can Completely Fool Machine Learning Algorithms". IEEE Spectrum: Technology, Engineering, and Science News. Retrieved 2019-07-15.
  30. "A Tiny Piece of Tape Tricked Teslas Into Speeding Up 50 MPH". Wired. 2020. Retrieved 11 March 2020.
  31. "Model Hacking ADAS to Pave Safer Roads for Autonomous Vehicles". McAfee Blogs. 2020-02-19. Retrieved 2020-03-11.
  32. Seabrook, John (2020). "Dressing for the Surveillance Age". The New Yorker. Retrieved 5 April 2020.
  33. 1 2 3 Heaven, Douglas (October 2019). "Why deep-learning AIs are so easy to fool". Nature. 574 (7777): 163–166. Bibcode:2019Natur.574..163H. doi:10.1038/d41586-019-03013-5. PMID   31597977. S2CID   203928744.
  34. Hutson, Matthew (10 May 2019). "AI can now defend itself against malicious messages hidden in speech". Nature. doi:10.1038/d41586-019-01510-1. PMID   32385365. S2CID   189666088.
  35. Lepori, Michael A; Firestone, Chaz (2020-03-27). "Can you hear me now? Sensitive comparisons of human and machine perception". arXiv: 2003.12362 [eess.AS].
  36. Vadillo, Jon; Santana, Roberto (2020-01-23). "On the human evaluation of audio adversarial examples". arXiv: 2001.08444 [eess.AS].
  37. D. B. Skillicorn. "Adversarial knowledge discovery". IEEE Intelligent Systems, 24:54–61, 2009.
  38. 1 2 B. Biggio, G. Fumera, and F. Roli. "Pattern recognition systems under attack: Design issues and research challenges Archived 2022-05-20 at the Wayback Machine ". Int'l J. Patt. Recogn. Artif. Intell., 28(7):1460002, 2014.
  39. 1 2 Barreno, Marco; Nelson, Blaine; Joseph, Anthony D.; Tygar, J. D. (2010). "The security of machine learning" (PDF). Machine Learning. 81 (2): 121–148. doi: 10.1007/s10994-010-5188-5 . S2CID   2304759.
  40. Sikos, Leslie F. (2019). AI in Cybersecurity. Intelligent Systems Reference Library. Vol. 151. Cham: Springer. p. 50. doi:10.1007/978-3-319-98842-9. ISBN   978-3-319-98841-2. S2CID   259216663.
  41. 1 2 3 B. Biggio, G. Fumera, and F. Roli. "Security evaluation of pattern classifiers under attack Archived 2018-05-18 at the Wayback Machine ". IEEE Transactions on Knowledge and Data Engineering, 26(4):984–996, 2014.
  42. 1 2 3 4 5 Biggio, Battista; Corona, Igino; Nelson, Blaine; Rubinstein, Benjamin I. P.; Maiorca, Davide; Fumera, Giorgio; Giacinto, Giorgio; Roli, Fabio (2014). "Security Evaluation of Support Vector Machines in Adversarial Environments". Support Vector Machines Applications. Springer International Publishing. pp. 105–153. arXiv: 1401.7727 . doi:10.1007/978-3-319-02300-7_4. ISBN   978-3-319-02300-7. S2CID   18666561.
  43. Heinrich, Kai; Graf, Johannes; Chen, Ji; Laurisch, Jakob; Zschech, Patrick (2020-06-15). "Fool Me Once, Shame On You, Fool Me Twice, Shame On Me: A Taxonomy of Attack and De-fense Patterns for AI Security". ECIS 2020 Research Papers.
  44. "Facebook removes 15 Billion fake accounts in two years". Tech Digest. 2021-09-27. Retrieved 2022-06-08.
  45. "Facebook removed 3 billion fake accounts in just 6 months". New York Post. Associated Press. 2019-05-23. Retrieved 2022-06-08.
  46. Schwarzschild, Avi; Goldblum, Micah; Gupta, Arjun; Dickerson, John P.; Goldstein, Tom (2021-07-01). "Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks". International Conference on Machine Learning. PMLR: 9389–9398.
  47. B. Biggio, B. Nelson, and P. Laskov. "Support vector machines under adversarial label noise Archived 2020-08-03 at the Wayback Machine ". In Journal of Machine Learning Research – Proc. 3rd Asian Conf. Machine Learning, volume 20, pp. 97–112, 2011.
  48. M. Kloft and P. Laskov. "Security analysis of online centroid anomaly detection". Journal of Machine Learning Research, 13:3647–3690, 2012.
  49. Edwards, Benj (2023-10-25). "University of Chicago researchers seek to "poison" AI art generators with Nightshade". Ars Technica. Retrieved 2023-10-27.
  50. Rao, Rahul. "AI-Generated Data Can Poison Future AI Models". Scientific American. Retrieved 2024-06-22.
  51. Baruch, Gilad; Baruch, Moran; Goldberg, Yoav (2019). "A Little Is Enough: Circumventing Defenses For Distributed Learning". Advances in Neural Information Processing Systems. 32. Curran Associates, Inc. arXiv: 1902.06156 .
  52. El-Mhamdi, El-Mahdi; Guerraoui, Rachid; Guirguis, Arsany; Hoang, Lê-Nguyên; Rouault, Sébastien (2022-05-26). "Genuinely distributed Byzantine machine learning". Distributed Computing. 35 (4): 305–331. arXiv: 1905.03853 . doi: 10.1007/s00446-022-00427-9 . ISSN   1432-0452. S2CID   249111966.
  53. Goldwasser, S.; Kim, Michael P.; Vaikuntanathan, V.; Zamir, Or (2022). "Planting Undetectable Backdoors in Machine Learning Models". arXiv: 2204.06974 [cs.LG].
  54. 1 2 Blanchard, Peva; El Mhamdi, El Mahdi; Guerraoui, Rachid; Stainer, Julien (2017). "Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent". Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.
  55. Chen, Lingjiao; Wang, Hongyi; Charles, Zachary; Papailiopoulos, Dimitris (2018-07-03). "DRACO: Byzantine-resilient Distributed Training via Redundant Gradients". International Conference on Machine Learning. PMLR: 903–912. arXiv: 1803.09877 .
  56. Mhamdi, El Mahdi El; Guerraoui, Rachid; Rouault, Sébastien (2018-07-03). "The Hidden Vulnerability of Distributed Learning in Byzantium". International Conference on Machine Learning. PMLR: 3521–3530. arXiv: 1802.07927 .
  57. Allen-Zhu, Zeyuan; Ebrahimianghazani, Faeze; Li, Jerry; Alistarh, Dan (2020-09-28). "Byzantine-Resilient Non-Convex Stochastic Gradient Descent". arXiv: 2012.14368 [cs.LG]. Review
  58. Mhamdi, El Mahdi El; Guerraoui, Rachid; Rouault, Sébastien (2020-09-28). Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent. 9th International Conference on Learning Representations (ICLR), May 4-8, 2021 (virtual conference). Retrieved 2022-10-20. Review
  59. Data, Deepesh; Diggavi, Suhas (2021-07-01). "Byzantine-Resilient High-Dimensional SGD with Local Iterations on Heterogeneous Data". International Conference on Machine Learning. PMLR: 2478–2488.
  60. Karimireddy, Sai Praneeth; He, Lie; Jaggi, Martin (2021-09-29). "Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing". arXiv: 2006.09365 [cs.LG]. Review
  61. B. Nelson, B. I. Rubinstein, L. Huang, A. D. Joseph, S. J. Lee, S. Rao, and J. D. Tygar. "Query strategies for evading convex-inducing classifiers". J. Mach. Learn. Res., 13:1293–1332, 2012
  62. "How to steal modern NLP systems with gibberish?". cleverhans-blog. 2020-04-06. Retrieved 2020-10-15.
  63. 1 2 3 4 5 6 7 8 Wang, Xinran; Xiang, Yu; Gao, Jun; Ding, Jie (2020-09-13). "Information Laundering for Model Privacy". arXiv: 2009.06112 [cs.CR].
  64. 1 2 Dickson, Ben (2021-04-23). "Machine learning: What are membership inference attacks?". TechTalks. Retrieved 2021-11-07.
  65. Goodfellow, Ian; Shlens, Jonathan; Szegedy, Christian (2015). "Explaining and Harnessing Adversarial Examples". International Conference on Learning Representations. arXiv: 1412.6572 .
  66. Pieter, Huang; Papernot, Sandy; Goodfellow, Nicolas; Duan, Ian; Abbeel, Yan (2017-02-07). Adversarial Attacks on Neural Network Policies. OCLC   1106256905.
  67. Korkmaz, Ezgi (2022). "Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs". Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22). 36 (7): 7229–7238. arXiv: 2112.09025 . doi:10.1609/aaai.v36i7.20684. S2CID   245219157.
  68. Carlini, Nicholas; Wagner, David (2018). "Audio Adversarial Examples: Targeted Attacks on Speech-to-Text". 2018 IEEE Security and Privacy Workshops (SPW). pp. 1–7. arXiv: 1801.01944 . doi:10.1109/SPW.2018.00009. ISBN   978-1-5386-8276-0. S2CID   4475201.
  69. Goodfellow, Ian J.; Shlens, Jonathon; Szegedy, Christian (2015). Explaining and Harnessing Adversarial Examples. International Conference on Learning Representations (ICLR).
  70. Ribeiro, Antonio H.; Zachariah, Dave; Bach, Francis; Schön, Thomas B. (2023). Regularization properties of adversarially-trained linear regression. Thirty-seventh Conference on Neural Information Processing Systems.
  71. Tsipras, Dimitris; Santurkar, Shibani; Engstrom, Logan; Turner, Alexander; Ma, Aleksander (2019). Robustness May Be At Odds with Accuracy. International Conference for Learning Representations.
  72. Dan, C.; Wei, Y.; Ravikumar, P. (2020). Sharp statistical guarantees for adversarially robust Gaussian classification. International Conference on Machine Learning.
  73. Javanmard, A.; Soltanolkotabi, M.; Hassani, H. (2020). Precise tradeoffs in adversarial training for linear regression. Conference on Learning Theory.
  74. Ribeiro, A. H.; Schön, T. B. (2023). "Overparameterized Linear Regression under Adversarial Attacks". IEEE Transactions on Signal Processing. 71: 601–614. arXiv: 2204.06274 . Bibcode:2023ITSP...71..601R. doi:10.1109/TSP.2023.3246228.
  75. Yin, D.; Kannan, R.; Bartlett, P. (2019). Rademacher Complexity for Adversarially Robust Generalization. International Conference on Machine Learning.
  76. Jagielski, Matthew; Oprea, Alina; Biggio, Battista; Liu, Chang; Nita-Rotaru, Cristina; Li, Bo (May 2018). "Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning". 2018 IEEE Symposium on Security and Privacy (SP). IEEE. pp. 19–35. arXiv: 1804.00308 . doi:10.1109/sp.2018.00057. ISBN   978-1-5386-4353-2. S2CID   4551073.
  77. "Attacking Machine Learning with Adversarial Examples". OpenAI. 2017-02-24. Retrieved 2020-10-15.
  78. Gu, Tianyu; Dolan-Gavitt, Brendan; Garg, Siddharth (2019-03-11). "BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain". arXiv: 1708.06733 [cs.CR].
  79. Veale, Michael; Binns, Reuben; Edwards, Lilian (2018-11-28). "Algorithms that remember: model inversion attacks and data protection law". Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences. 376 (2133). arXiv: 1807.04644 . Bibcode:2018RSPTA.37680083V. doi:10.1098/rsta.2018.0083. ISSN   1364-503X. PMC   6191664 . PMID   30322998.
  80. Shokri, Reza; Stronati, Marco; Song, Congzheng; Shmatikov, Vitaly (2017-03-31). "Membership Inference Attacks against Machine Learning Models". arXiv: 1610.05820 [cs.CR].
  81. 1 2 Goodfellow, Ian J.; Shlens, Jonathon; Szegedy, Christian (2015-03-20). "Explaining and Harnessing Adversarial Examples". arXiv: 1412.6572 [stat.ML].
  82. Madry, Aleksander; Makelov, Aleksandar; Schmidt, Ludwig; Tsipras, Dimitris; Vladu, Adrian (2019-09-04). "Towards Deep Learning Models Resistant to Adversarial Attacks". arXiv: 1706.06083 [stat.ML].
  83. Carlini, Nicholas; Wagner, David (2017-03-22). "Towards Evaluating the Robustness of Neural Networks". arXiv: 1608.04644 [cs.CR].
  84. Brown, Tom B.; Mané, Dandelion; Roy, Aurko; Abadi, Martín; Gilmer, Justin (2018-05-16). "Adversarial Patch". arXiv: 1712.09665 [cs.CV].
  85. Guo, Sensen; Zhao, Jinxiong; Li, Xiaoyu; Duan, Junhong; Mu, Dejun; Jing, Xiao (2021-04-24). "A Black-Box Attack Method against Machine-Learning-Based Anomaly Network Flow Detection Models". Security and Communication Networks. 2021. e5578335. doi: 10.1155/2021/5578335 . ISSN   1939-0114.
  86. 1 2 Gomes, Joao (2018-01-17). "Adversarial Attacks and Defences for Convolutional Neural Networks". Onfido Tech. Retrieved 2021-10-23.
  87. Guo, Chuan; Gardner, Jacob; You, Yurong; Wilson, Andrew Gordon; Weinberger, Kilian (2019-05-24). "Simple Black-box Adversarial Attacks". Proceedings of the 36th International Conference on Machine Learning. PMLR: 2484–2493. arXiv: 1905.07121 .
  88. Kilian Weinberger. On the importance of deconstruction in machine learning research.ML-Retrospectives @ NeurIPS 2020, 2020.https://slideslive.com/38938218/the-importance-of-deconstruction
  89. 1 2 3 4 5 Andriushchenko, Maksym; Croce, Francesco; Flammarion, Nicolas; Hein, Matthias (2020). "Square Attack: A Query-Efficient Black-Box Adversarial Attack via Random Search". In Vedaldi, Andrea; Bischof, Horst; Brox, Thomas; Frahm, Jan-Michael (eds.). Computer Vision – ECCV 2020. Lecture Notes in Computer Science. Vol. 12368. Cham: Springer International Publishing. pp. 484–501. arXiv: 1912.00049 . doi:10.1007/978-3-030-58592-1_29. ISBN   978-3-030-58592-1. S2CID   208527215.
  90. 1 2 3 4 5 6 7 Chen, Jianbo; Jordan, Michael I.; Wainwright, Martin J. (2019), HopSkipJumpAttack: A Query-Efficient Decision-Based Attack, arXiv: 1904.02144 , retrieved 2021-10-25
  91. Andriushchenko, Maksym; Croce, Francesco; Flammarion, Nicolas; Hein, Matthias (2020-07-29). "Square Attack: a query-efficient black-box adversarial attack via random search". arXiv: 1912.00049 [cs.LG].
  92. "Black-box decision-based attacks on images". KejiTech. 2020-06-21. Retrieved 2021-10-25.
  93. 1 2 Goodfellow, Ian J.; Shlens, Jonathon; Szegedy, Christian (2015-03-20). "Explaining and Harnessing Adversarial Examples". arXiv: 1412.6572 [stat.ML].
  94. 1 2 "Adversarial example using FGSM | TensorFlow Core". TensorFlow. Retrieved 2021-10-24.
  95. Tsui, Ken (2018-08-22). "Perhaps the Simplest Introduction of Adversarial Examples Ever". Medium. Retrieved 2021-10-24.
  96. Corona-Figueroa, Abril; Bond-Taylor, Sam; Bhowmik, Neelanjan; Gaus, Yona Falinie A.; Breckon, Toby P.; Shum, Hubert P. H.; Willcocks, Chris G. (2023). Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers. IEEE/CVF. arXiv: 2308.14152 .
  97. 1 2 Carlini, Nicholas; Wagner, David (2017-03-22). "Towards Evaluating the Robustness of Neural Networks". arXiv: 1608.04644 [cs.CR].
  98. "carlini wagner attack". richardjordan.com. Retrieved 2021-10-23.
  99. Plotz, Mike (2018-11-26). "Paper Summary: Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods". Medium. Retrieved 2021-10-23.
  100. Kishor Datta Gupta; Akhtar, Zahid; Dasgupta, Dipankar (2021). "Determining Sequence of Image Processing Technique (IPT) to Detect Adversarial Attacks". SN Computer Science. 2 (5): 383. arXiv: 2007.00337 . doi:10.1007/s42979-021-00773-8. ISSN   2662-995X. S2CID   220281087.
  101. O. Dekel, O. Shamir, and L. Xiao. "Learning to classify with missing and corrupted features". Machine Learning, 81:149–178, 2010.
  102. Liu, Wei; Chawla, Sanjay (2010). "Mining adversarial patterns via regularized loss minimization" (PDF). Machine Learning. 81: 69–83. doi: 10.1007/s10994-010-5199-2 . S2CID   17497168.
  103. B. Biggio, G. Fumera, and F. Roli. "Evade hard multiple classifier systems Archived 2015-01-15 at the Wayback Machine ". In O. Okun and G. Valentini, editors, Supervised and Unsupervised Ensemble Methods and Their Applications, volume 245 of Studies in Computational Intelligence, pages 15–38. Springer Berlin / Heidelberg, 2009.
  104. B. I. P. Rubinstein, P. L. Bartlett, L. Huang, and N. Taft. "Learning in a large function space: Privacy- preserving mechanisms for svm learning". Journal of Privacy and Confidentiality, 4(1):65–100, 2012.
  105. M. Kantarcioglu, B. Xi, C. Clifton. "Classifier Evaluation and Attribute Selection against Active Adversaries". Data Min. Knowl. Discov., 22:291–335, January 2011.
  106. Chivukula, Aneesh; Yang, Xinghao; Liu, Wei; Zhu, Tianqing; Zhou, Wanlei (2020). "Game Theoretical Adversarial Deep Learning with Variational Adversaries". IEEE Transactions on Knowledge and Data Engineering. 33 (11): 3568–3581. doi:10.1109/TKDE.2020.2972320. hdl: 10453/145751 . ISSN   1558-2191. S2CID   213845560.
  107. Chivukula, Aneesh Sreevallabh; Liu, Wei (2019). "Adversarial Deep Learning Models with Multiple Adversaries". IEEE Transactions on Knowledge and Data Engineering. 31 (6): 1066–1079. doi:10.1109/TKDE.2018.2851247. hdl: 10453/136227 . ISSN   1558-2191. S2CID   67024195.
  108. "TrojAI". www.iarpa.gov. Retrieved 2020-10-14.
  109. Athalye, Anish; Carlini, Nicholas; Wagner, David (2018-02-01). "Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Example". arXiv: 1802.00420v1 [cs.LG].
  110. He, Warren; Wei, James; Chen, Xinyun; Carlini, Nicholas; Song, Dawn (2017-06-15). "Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong". arXiv: 1706.04701 [cs.LG].