All research directions

Foundation Models Security

Jailbreak defense, adversarial robustness, and internal safeguards for LLMs and multimodal models.

To protect foundation models against adversarial manipulation, jailbreaks, and misuse through principled internal and external defenses.

Overview

This research focuses on the security of large language and multimodal foundation models. Work includes internal neuron-level defenses against jailbreaks, adversarial robustness, and threat detection in AI systems.

Key objectives

  • Build internal defenses against multimodal jailbreak attacks
  • Develop robust models for adversarial AI environments
  • Detect and mitigate security threats in deployed systems
  • Formalize AI security as a learning problem

Key topics

  • Multimodal jailbreak defense
  • Adversarial robustness for LLMs
  • Internal neuron-level safeguards
  • Threat detection and anomaly analysis

Papers in this direction

  • 2026

    Does a Hybrid Space-Aware Randomized Defense Improve Empirical and Certified Adversarial Robustness?

    Dhar, J, Pandey, MK, Bozorgtabar, B, Le, T, Yao, L

    International Conference on Machine Learning (ICML)

  • 2026

    f-Divergence Self-Play for Tabular Anomaly Detection via Large Language Models

    Tran, VH, Ngo, VL, Nguyen, D, Nguyen, T, Nguyen, P, Harandi, M, Le, T

    International Conference on Machine Learning (ICML)

  • 2026

    Causal-aware Anomaly Detection for Tabular Data

    Nguyen, D, Nguyen, TAH, Le, TD, Venkatesh, S, Le, T, Gupta, S

    International Conference on Machine Learning (ICML)

  • 2026

    RoNE: Robust Neurons Enable Internal Defenses Against Multimodal Jailbreak

    Tran, TL, Phan, H, Kanan, C, Le, T

    Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)