← All stories
● Covered by 1 source · 1 reportMedium impact

New Study Introduces Dispersion Loss to Mitigate Embedding Condensation in Small LMs

Aggregated by BrevFeed ai · updated 1h ago

🔖 Save

Researchers have identified a geometric issue called embedding condensation in smaller language models, where token embeddings collapse into a tight subspace. To address this, they propose a new training objective called dispersion loss, which aims to enhance model expressivity by counteracting this phenomenon.

Key points

Embedding condensation affects smaller language models more severely.
New training objective called dispersion loss introduced.
Condensation does not transfer through knowledge distillation.

Identification of Embedding Condensation

The study reveals a phenomenon known as embedding condensation, where small language models exhibit a geometric tendency to collapse token embeddings into similar directions, resembling a narrow cone. This effect is more pronounced in smaller models compared to larger counterparts.

Features of Embedding Condensation

Research demonstrates that the embedding condensation phenomenon is reproducible under controlled conditions, indicating that it persists across various model configurations. Observations show that condensation starts at model initialization and can be alleviated during pre-training but is not resolved by knowledge distillation.

Introduction of Dispersion Loss

To combat the effects of embedding condensation, researchers introduce a new training objective, dispersion loss. This approach is structured to counteract the undesired effects of embedding collapse, thus allowing small models to more fully utilize their representational capabilities.

Implications for Language Model Training

Understanding and addressing embedding condensation is crucial in improving the expressivity of smaller models in the transformer architecture. The introduction of dispersion loss could lead to enhancements in various applications that rely on language models, especially those constrained by size.

✨ This summary was generated by AI from the outlets' reporting listed below. It is not independently verified and may contain errors — check the original sources. How BrevFeed works →

Primary sources

GitHub ChenLiu-1996/LM-Dispersion arXiv 2602.00217 arXiv 2506.09027 arXiv 2312.10794

Reporting from

Hacker News Front Page — Dispersion loss counteracts embedding condensation in small language models 💬 Discuss on HN 3h ago →