Taxonomy of Attacks, Defenses, and Consequences in Adversarial Machine Learning
A detailed report about the taxonomy of attacks, defenses, and consequences in adversarial machine learning.
A detailed report about the taxonomy of attacks, defenses, and consequences in adversarial machine learning.
Five attack types
Data Access Attacks
These types of attacks occur during the training phase of a model, where attackers gain access to some or all of the training data. Enable it to create an alternative model that is used to test the validity of potential inputs for attack in subsequent testing stages. This type of attack may lead to the model learning incorrect or misleading patterns, thereby affecting its performance in practical applications.
Poisoning Attacks
Poisoning attacks also occur during the training phase. Poisoning attacks can be indirect or direct. In indirect poisoning, attackers lack access to the processed data, causing them to tamper with the original data before it is processed. In direct poisoning attacks, attackers directly alter data through data injection or manipulation, or tamper with models through logical destruction. These attacks can lead to model training errors, resulting in inaccurate or unreliable results in practical applications.
Evasion Attacks
Evasion attacks occur during the testing phase of the model. Attackers cause misclassification of model output through minor input interference. This typically involves gradient based search algorithms such as L-BFGS, FGSM, or JSMA. These algorithms search for small perturbations that can cause significant changes in the model loss function, leading to misclassification. The goal of evading attacks is to make the model unable to correctly recognize or classify adversarial samples.
Oracle Attacks
Oracle Attacks utilize application programming interfaces (APIs) to provide input to a model and observe output. Even if attackers do not directly understand the composition and structure of the model, they can still train an alternative model by observing input-output pairing, which has a significant similarity in behavior to the target model. This type of attack can be used to generate adversarial samples used in evasion attacks.
Extraction Attacks
In this type of attack, the attacker extracts parameters or structures from the model's predictions. This typically involves observing the probability values returned by the model for each class. The purpose of extracting attacks is to replicate or reconstruct the target model, allowing attackers to understand the working principle of the model and design more effective attack strategies.
Defense mechanisms
Data access attacks
Data access attacks involve unauthorized access or acquisition of training data. To prevent such attacks, traditional access control measures such as data encryption can be used to prevent malicious access.
Poisoning attacks
Methods to prevent poisoning attacks include data cleaning and robust statistics. Data cleaning refers to identifying and removing harmful data by detecting the impact of samples on classification performance. Robust statistical methods utilize constraint and regularization techniques to reduce the distortion effect of tampered data on the learning model.
Evasion attacks
Methods to defend against such attacks include adversarial training, gradient masking, defense distillation, ensemble methods, feature squeezing, and model robustness improvements such as reformer/autoencoder. Adversarial training involves adding inputs that contain adversarial perturbations but have correct labels to the training data to enhance the model's resistance to adversarial samples. Gradient masking improves robustness by reducing the sensitivity of the model to small changes in input. The defensive distillation and integration methods aim to enhance resistance by training smoother or more diverse models.
Oracle attacks
Strategies to defend against Oracle attacks include limiting the amount of information output by the model to prevent attackers from obtaining sufficient data to train effective alternative models.
Extraction attacks
Randomization mechanisms can be used to achieve differential privacy in defense against such attacks, ensuring that the model output does not leak additional information recorded by individuals in the training data. Differential privacy is achieved by applying randomization on the dataset, but this may come at the expense of sacrificing model prediction accuracy.
In addition, homomorphic encryption is also a feasible method that allows operations to be performed on encrypted data, thereby protecting the privacy of personal information without decrypting the data.
Reference
A Taxonomy and Terminology of 3 Adversarial Machine Learning, Elham.T, Kevin.J.B, Michael.H, Andres.D.M, Julian.T.S, March 8, 2023, National Institute of Standards and Technology, doi:10.6028/NIST.AI.100-2e2023.ipd, https://csrc.nist.gov/pubs/ai/100/2/e2023/ipd
Read Next
使用Terraform在Ubuntu中部署KVM虚拟机
使用Terraform部署KVM虚拟机的详细流程
MySQL/Redis相关面试题
数据库运维(MySQL和Redis)的面试题总结
事件源模式和传统数据库方法在数据管理上的优劣分析
对事件源模式和传统数据库方法在应用程序性能影响、性能、扩展性和可靠性的分析;以及云原生环境下数据管理的最佳实践
关于Metrics_server在自托管环境下无法使用的问题
修复kubernetes的metrics server在自托管环境下因缺少CA证书而无法运行的问题。