Foundation Distillation
Knowledge distillation, Wasserstein transfer, and efficient student models from large teachers.
Goals
To compress the knowledge of large foundation models into smaller, faster students without sacrificing performance on downstream tasks.
Overview
This direction studies how to distill large language, vision-language, and generative models into efficient students. Research spans Wasserstein knowledge distillation, hierarchical relational distillation, and data-free black-box transfer.
Key objectives
- Develop principled Wasserstein and relational distillation methods
- Enable data-free and black-box knowledge transfer
- Distill vision-language and multimodal embedding models
- Preserve teacher capability in compact student architectures
Key topics
- Knowledge distillation for LLMs and VLMs
- Wasserstein and optimal-transport distillation
- Hierarchical relational distillation
- Data-free and black-box distillation
Papers in this direction
Diverse Image Priors for Black-box Data-free Knowledge Distillation
Vo, TN, Nguyen, D, Le, T, Do, K, Gupta, S
International Conference on Computer Communication and the Internet (ICCCI)
MCW-KD: Multi-Cost Wasserstein Knowledge Distillation for Large Language Models
Vuong, HT, Le, T, Tran, Q, Van, LN, Le, T
Proceedings of the AAAI Conference on Artificial Intelligence
HieRD: Hierarchical Relational Distillation for Vision-Language Embedding Models
Le, V, Dang, N Hong, Vu, T, Van, LN, Nguyen, DA, Le, T
International Conference on Machine Learning (ICML)