Foundation Models Security

Jailbreak defense, adversarial robustness, and internal safeguards for LLMs and multimodal models.

Goals

To protect foundation models against adversarial manipulation, jailbreaks, and misuse through principled internal and external defenses.

Overview

This research focuses on the security of large language and multimodal foundation models. Work includes internal neuron-level defenses against jailbreaks, adversarial robustness, and threat detection in AI systems.

Key topics

Multimodal jailbreak defense
Adversarial robustness for LLMs
Internal neuron-level safeguards
Threat detection and anomaly analysis

News

Jun 2026
Paper ... accepted at ...

Papers in this direction

RoNE: Robust Neurons Enable Internal Defenses Against Multimodal Jailbreak
TL Tran, H Phan, C Kanan, T Le
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)
Causal-aware Anomaly Detection for Tabular Data
D Nguyen, TAH Nguyen, TD Le, S Venkatesh, T Le, S Gupta
International Conference on Machine Learning (ICML)
f-Divergence Self-Play for Tabular Anomaly Detection via Large Language Models
VH Tran, VL Ngo, D Nguyen, T Nguyen, P Nguyen, M Harandi, T Le
International Conference on Machine Learning (ICML)
Does a Hybrid Space-Aware Randomized Defense Improve Empirical and Certified Adversarial Robustness?
J Dhar, MK Pandey, B Bozorgtabar, T Le, L Yao
International Conference on Machine Learning (ICML)
Co-Steer: Cross-Modal Collaborative Steering for Jailbreaking MLLMs
Jingmin Zhu, Rollin Omari, Tamas Abraham, Junae Kim, Amardeep Kaur, Trung Le, Dinh Phung, Qiuhong Ke
European Conference on Computer Vision (ECCV 2026)

Foundation Models Security

Goals

Overview

Key topics

News

Papers in this direction

RoNE: Robust Neurons Enable Internal Defenses Against Multimodal Jailbreak

Causal-aware Anomaly Detection for Tabular Data

f-Divergence Self-Play for Tabular Anomaly Detection via Large Language Models

Does a Hybrid Space-Aware Randomized Defense Improve Empirical and Certified Adversarial Robustness?

Co-Steer: Cross-Modal Collaborative Steering for Jailbreaking MLLMs