← All stories
● Covered by 1 source · 1 reportMedium impact

JetBrains Launches Mellum2: 12B Mixture-of-Experts AI Model

Aggregated by BrevFeed ai · updated 4d ago

🔖 Save

JetBrains has released Mellum2, a 12 billion-parameter Mixture-of-Experts model optimized for natural language and coding tasks. With efficient parameter activation and over 2x faster inference compared to similar models, Mellum2 is positioned for high-throughput AI applications.

Key points

Mellum2 uses a 12B parameter Mixture-of-Experts architecture.
Activates 2.5B parameters per token for improved efficiency.
Available under the Apache 2.0 license on Hugging Face.

Overview of Mellum2

Mellum2 is a newly released 12 billion parameter Mixture-of-Experts (MoE) model developed by JetBrains. It is specifically designed to handle a variety of natural language and coding tasks while optimizing for low-latency inference, making it suitable for production environments.

Performance and Efficiency

The model's architecture allows it to activate only 2.5 billion parameters for each token, greatly enhancing its efficiency compared to traditional models. Mellum2 has been benchmarked against similar-sized models and delivers competitive performance while achieving more than twice the inference speed.

Use Cases and Applications

Mellum2 is applicable for a range of functions including routing, retrieval-augmented generation (RAG), summarization, and even coding features. It supports latency-sensitive tasks which are crucial for real-time applications, demonstrating its adaptability in various AI systems.

Availability and Licensing

Mellum2 is available for download under the Apache 2.0 license from Hugging Face. Full architectural details, training setups, and evaluation methodologies can be accessed through the accompanying technical report.

✨ This summary was generated by AI from the outlets' reporting listed below. It is not independently verified and may contain errors — check the original sources. How BrevFeed works →

Reporting from

Hugging Face Blog — Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains 31d ago →