Foundation Models Security
Jailbreak defense, adversarial robustness, and internal safeguards for LLMs and multimodal models.
Goals
To protect foundation models against adversarial manipulation, jailbreaks, and misuse through principled internal and external defenses.
Overview
This research focuses on the security of large language and multimodal foundation models. Work includes internal neuron-level defenses against jailbreaks, adversarial robustness, and threat detection in AI systems.
Key objectives
- Build internal defenses against multimodal jailbreak attacks
- Develop robust models for adversarial AI environments
- Detect and mitigate security threats in deployed systems
- Formalize AI security as a learning problem
Key topics
- Multimodal jailbreak defense
- Adversarial robustness for LLMs
- Internal neuron-level safeguards
- Threat detection and anomaly analysis
Papers in this direction
Does a Hybrid Space-Aware Randomized Defense Improve Empirical and Certified Adversarial Robustness?
Dhar, J, Pandey, MK, Bozorgtabar, B, Le, T, Yao, L
International Conference on Machine Learning (ICML)
f-Divergence Self-Play for Tabular Anomaly Detection via Large Language Models
Tran, VH, Ngo, VL, Nguyen, D, Nguyen, T, Nguyen, P, Harandi, M, Le, T
International Conference on Machine Learning (ICML)
Causal-aware Anomaly Detection for Tabular Data
Nguyen, D, Nguyen, TAH, Le, TD, Venkatesh, S, Le, T, Gupta, S
International Conference on Machine Learning (ICML)
RoNE: Robust Neurons Enable Internal Defenses Against Multimodal Jailbreak
Tran, TL, Phan, H, Kanan, C, Le, T
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)