Taxonomy of Attacks, Defenses, and Consequences in Adversarial Machine Learning

A detailed report about the taxonomy of attacks, defenses, and consequences in adversarial machine learning.

Five attack types

Data Access Attacks

These types of attacks occur during the training phase of a model, where attackers gain access to some or all of the training data. Enable it to create an alternative model that is used to test the validity of potential inputs for attack in subsequent testing stages. This type of attack may lead to the model learning incorrect or misleading patterns, thereby affecting its performance in practical applications.

Poisoning Attacks

Poisoning attacks also occur during the training phase. Poisoning attacks can be indirect or direct. In indirect poisoning, attackers lack access to the processed data, causing them to tamper with the original data before it is processed. In direct poisoning attacks, attackers directly alter data through data injection or manipulation, or tamper with models through logical destruction. These attacks can lead to model training errors, resulting in inaccurate or unreliable results in practical applications.

Evasion Attacks

Evasion attacks occur during the testing phase of the model. Attackers cause misclassification of model output through minor input interference. This typically involves gradient based search algorithms such as L-BFGS, FGSM, or JSMA. These algorithms search for small perturbations that can cause significant changes in the model loss function, leading to misclassification. The goal of evading attacks is to make the model unable to correctly recognize or classify adversarial samples.

Oracle Attacks

Oracle Attacks utilize application programming interfaces (APIs) to provide input to a model and observe output. Even if attackers do not directly understand the composition and structure of the model, they can still train an alternative model by observing input-output pairing, which has a significant similarity in behavior to the target model. This type of attack can be used to generate adversarial samples used in evasion attacks.

Extraction Attacks

In this type of attack, the attacker extracts parameters or structures from the model's predictions. This typically involves observing the probability values returned by the model for each class. The purpose of extracting attacks is to replicate or reconstruct the target model, allowing attackers to understand the working principle of the model and design more effective attack strategies.

Defense mechanisms

Data access attacks

Data access attacks involve unauthorized access or acquisition of training data. To prevent such attacks, traditional access control measures such as data encryption can be used to prevent malicious access.

Poisoning attacks

Methods to prevent poisoning attacks include data cleaning and robust statistics. Data cleaning refers to identifying and removing harmful data by detecting the impact of samples on classification performance. Robust statistical methods utilize constraint and regularization techniques to reduce the distortion effect of tampered data on the learning model.

Evasion attacks

Methods to defend against such attacks include adversarial training, gradient masking, defense distillation, ensemble methods, feature squeezing, and model robustness improvements such as reformer/autoencoder. Adversarial training involves adding inputs that contain adversarial perturbations but have correct labels to the training data to enhance the model's resistance to adversarial samples. Gradient masking improves robustness by reducing the sensitivity of the model to small changes in input. The defensive distillation and integration methods aim to enhance resistance by training smoother or more diverse models.

Oracle attacks

Strategies to defend against Oracle attacks include limiting the amount of information output by the model to prevent attackers from obtaining sufficient data to train effective alternative models.

Extraction attacks

Randomization mechanisms can be used to achieve differential privacy in defense against such attacks, ensuring that the model output does not leak additional information recorded by individuals in the training data. Differential privacy is achieved by applying randomization on the dataset, but this may come at the expense of sacrificing model prediction accuracy.

In addition, homomorphic encryption is also a feasible method that allows operations to be performed on encrypted data, thereby protecting the privacy of personal information without decrypting the data.

Reference

A Taxonomy and Terminology of 3 Adversarial Machine Learning, Elham.T, Kevin.J.B, Michael.H, Andres.D.M, Julian.T.S, March 8, 2023, National Institute of Standards and Technology, doi:10.6028/NIST.AI.100-2e2023.ipd, https://csrc.nist.gov/pubs/ai/100/2/e2023/ipd

FreelyTomorrow

Updated on Apr 23, 2024