Offensive AI Compilation
A curated list of useful resources that cover Offensive AI.
📁 Contents 📁
- 🚫 Abuse 🚫
- 🔧 Use 🔧
- 📊 Surveys 📊
- 🗣 Contributors 🗣
- ©️ License ©️
🚫 Abuse 🚫
Exploiting the vulnerabilities of AI models.
🧠 Adversarial Machine Learning 🧠
Adversarial Machine Learning is responsible for assessing their weaknesses and providing countermeasures.
⚡ Attacks ⚡
It is organized in four types of attacks: extraction, inversion, poisoning and evasion.
🔒 Extraction 🔒
It tries to steal the parameters and hyperparameters of a model by making requests that maximize the extraction of information.
Depending on the knowledge of the adversary's model, white-box and black-box attacks can be performed.
In the simplest white-box case (when the adversary has full knowledge of the model, e.g., a sigmoid function), one can create a system of linear equations that can be easily solved.
In the generic case, where there is insufficient knowledge of the model, the substitute model is used. This model is trained with the requests made to the original model in order to imitate the same functionality as the original one.
⚠️ Limitations ⚠️
-
Training a substitute model is equivalent (in many cases) to training a model from scratch.
-
Very computationally intensive.
-
The adversary has limitations on the number of requests before being detected.
🛡️ Defensive actions 🛡️
-
Rounding of output values.
-
Use of differential privacy.
-
Use of ensembles.
-
Use of specific defenses
🔗 Useful links 🔗
- Stealing Machine Learning Models via Prediction APIs
- Stealing Hyperparameters in Machine Learning
- Knockoff Nets: Stealing Functionality of Black-Box Models
- Model Extraction Warning in MLaaS Paradigm
- Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data
- Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks
- Stealing Neural Networks via Timing Side Channels
- Model Stealing Attacks Against Inductive Graph Neural Networks
- High Accuracy and High Fidelity Extraction of Neural Networks
- Poisoning Web-Scale Training Datasets is Practical
- Polynomial Time Cryptanalytic Extraction of Neural Network Models
- Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models
- Awesome Data Poisoning And Backdoor Attacks: A curated list of papers & resources linked to data poisoning, backdoor attacks and defenses against them.
- BackdoorBox: An Open-sourced Python Toolbox for Backdoor Attacks and Defenses.
⬅️ Inversion (or inference) ⬅️
They are intended to reverse the information flow of a machine learning model.
They enable an adversary to have knowledge of the model that was not explicitly intended to be shared.
They allow to know the training data or information as statistical properties of the model.
Three types are possible:
-
Membership Inference Attack (MIA): An adversary attempts to determine whether a sample was employed as part of the training.
-
Property Inference Attack (PIA): An adversary aims to extract statistical properties that were not explicitly encoded as features during the training phase.
-
Reconstruction: An adversary tries to reconstruct one or more samples from the training set and/or their corresponding labels. Also called inversion.
🛡️ Defensive actions 🛡️
-
Use of advanced cryptography. Countermeasures include differential privacy, homomorphic cryptography and secure multiparty computation.
-
Use of regularization techniques such as Dropout due to the relationship between overtraining and privacy.
-
Model compression has been proposed as a defense against reconstruction attacks.
🔗 Useful links 🔗
- Membership Inference Attacks Against Machine Learning Models
- Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
- Machine Learning Models that Remember Too Much
- ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models
- Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning
- LOGAN: Membership Inference Attacks Against Generative Models
- Overfitting, robustness, and malicious algorithms: A study of potential causes of privacy risk in machine learning
- Comprehensive Privacy Analysis of Deep Learning: Stand-alone and Federated Learning under Passive and Active White-box Inference Attacks
- Inference Attacks Against Collaborative Learning
- The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
- Towards the Science of Security and Privacy in Machine Learning
- MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples
- Extracting Training Data from Large Language Models
- Property Inference Attacks on Fully Connected Neural Networks using Permutation Invariant Representations
- Extracting Training Data from Diffusion Models
- High-resolution image reconstruction with latent diffusion models from human brain activity
- Stealing and evading malware classifiers and antivirus at low false positive conditions
- Realistic fingerprint presentation attacks based on an adversarial approach
- Active Adversarial Tests: Increasing Confidence in Adversarial Robustness Evaluations.
- GPT Jailbreak Status: Updates on the status of jailbreaking the OpenAI GPT language model.
💉 Poisoning 💉
They aim to corrupt the training set by causing a machine learning model to reduce its accuracy.
This attack is difficult to detect when performed on the training data, since the attack can propagate among different models using the same training data.
The adversary seeks to destroy the availability of the model by modifying the decision boundary and, as a result, producing incorrect predictions or, create a backdoor in a model. In the latter, the model behaves correctly (returning the desired predictions) in most cases, except for certain inputs specially created by the adversary that produce undesired results. The adversary can manipulate the results of the predictions and launch future attacks.
🔓 Backdoors 🔓
BadNets are the simplest type of backdoor in a machine learning model. Moreover, BadNets are able to be preserved in a model, even if they are retrained again for a different task than the original model (transfer learning).
It is important to note that public pre-trained models may contain backdoors.
🛡️ Defensive actions 🛡️
-
Detection of poisoned data, along with the use of data sanitization.
-
Robust training methods.
-
Specific defenses.
- Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks
- STRIP: A Defence Against Trojan Attacks on Deep Neural Networks
- Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
- ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation
- DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks
- Defending Neural Backdoors via Generative Distribution Modeling
- A Comprehensive Survey on Backdoor Attacks and Their Defenses in Face Recognition Systems
- DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models
🔗 Useful links 🔗
- Poisoning Attacks against Support Vector Machines
- Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
- Trojaning Attack on Neural Networks
- Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
- Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks
- Spectral Signatures in Backdoor Attacks
- Latent Backdoor Attacks on Deep Neural Networks
- Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks
- Hidden Trigger Backdoor Attacks
- Transferable Clean-Label Poisoning Attacks on Deep Neural Nets
- TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems
- Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization
- When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks
- [Certified Defenses for Data Poisoning