Foundation Distillation

Knowledge distillation, Wasserstein transfer, and efficient student models from large teachers.

Goals

To compress the knowledge of large foundation models into smaller, faster students without sacrificing performance on downstream tasks.

Overview

This direction studies how to distill large language, vision-language, and generative models into efficient students. Research spans Wasserstein knowledge distillation, hierarchical relational distillation, and data-free black-box transfer.

Key topics

Knowledge distillation for LLMs and VLMs
Wasserstein and optimal-transport distillation
Hierarchical relational distillation
Data-free and black-box distillation

News

Jun 2026
Paper ... accepted at ...

Papers in this direction

HieRD: Hierarchical Relational Distillation for Vision-Language Embedding Models
V Le, N Hong Dang, T Vu, LN Van, DA Nguyen, T Le
International Conference on Machine Learning (ICML)
MCW-KD: Multi-Cost Wasserstein Knowledge Distillation for Large Language Models
HT Vuong, T Le, Q Tran, LN Van, T Le
Proceedings of the AAAI Conference on Artificial Intelligence
Diverse Image Priors for Black-box Data-free Knowledge Distillation
TN Vo, D Nguyen, T Le, K Do, S Gupta
International Conference on Computer Communication and the Internet (ICCCI)

Foundation Distillation

Goals

Overview

Key topics

News

Papers in this direction

HieRD: Hierarchical Relational Distillation for Vision-Language Embedding Models

MCW-KD: Multi-Cost Wasserstein Knowledge Distillation for Large Language Models

Diverse Image Priors for Black-box Data-free Knowledge Distillation