From Hugging Face Blog · 29 stories
Gemma 4 12B Boosts Multimodal AI Processing on Laptops
Google DeepMind introduced Gemma 4 12B, a new encoder-free multimodal AI model, enabling advanced processing on laptops with minimal memory. Gemma 4's architecture eliminates multimodal encoders, creating efficient audio and visual input processing. Collaboration with Cerebras and Hugging Face enhances real-time speech-to-speech capabilities, improving applications like voice assistants.
ScarfBench Launches as New AI Benchmark for Java Framework Migration
ScarfBench provides a new open benchmark to evaluate AI agents on Enterprise Java framework migrations. It focuses on ensuring successful builds, deployments, and behavior preservation across major Java ecosystems like Spring and Jakarta EE, addressing gaps in existing AI-assisted modernization efforts.
Hugging Face Integrates Every Eval Ever for Model Reporting
Hugging Face has integrated the Every Eval Ever (EEE) JSON schema into its Community Evals to standardize AI evaluation reporting. This collaboration aims to enhance trust and comparability in model performance, addressing inconsistencies in evaluation results reported across multiple formats.
Hybrid models outperform transformers in predicting meaning-rich tokens
Experiments revealed that hybrid models, like Olmo Hybrid, predict meaning-rich tokens better than transformers. However, on simple repetitive tokens, transformers maintain an edge, indicating differing strengths in architectural approaches.
NVIDIA NeMo AutoModel Enhances Fine-Tuning for Generative AI Models
NVIDIA launched NeMo AutoModel, enhancing fine-tuning for generative AI models by enabling higher training performance. This tool achieves up to 3.7x faster training and reduces GPU memory use by up to 32%, making it easier for developers to implement advanced models without extensive code changes.
Launch of FFASR Leaderboard to Benchmark ASR in Real-World Conditions
Treble Technologies and Hugging Face introduced the FFASR Leaderboard to evaluate Automatic Speech Recognition (ASR) models under far-field conditions. This community-driven benchmark aims to address the significant gap in performance between traditional clean-speech evaluations and real-world usage scenarios involving background noise and reverberation.
IBM's CUGA Offers Lightweight Framework for Building Agentic Apps
IBM has released CUGA, a Configurable Generalist Agent harness that simplifies the development of agentic applications by automating the orchestration and state management. With two dozen example applications provided, developers can create functional agents quickly without extensive groundwork, increasing efficiency in building machine learning applications.
Transformers.js improves browser-based AI model management with Cross-Origin Storage API
Transformers.js now integrates the proposed Cross-Origin Storage API to manage AI model resources more efficiently. This change reduces the redundant downloads of commonly used models across different web applications, addressing issues related to cache storage and data usage.
PP-OCRv6: New OCR Model on Hugging Face with 50-Language Support
PaddleOCR has launched PP-OCRv6, a new OCR model with capabilities in 50 languages and scalability from 1.5M to 34.5M parameters. The model improves text detection and recognition accuracy compared to its predecessor, PP-OCRv5, making it suitable for a variety of real-world OCR applications.
Local Models Triaged Issues in OpenClaw Repository for Free
In June 2026, local AI models were used to efficiently triage issues in the OpenClaw repository. This method allows for real-time notifications and reduces costs associated with cloud-based models, highlighting the growing importance of local AI implementation.
MosaicLeaks addresses privacy risks in deep research agents with new training method
MosaicLeaks reveals privacy vulnerabilities in deep research agents that combine private documents and web searches, leading to potential leakage of sensitive information. The proposed Privacy-Aware Deep Research (PA-DR) method improves task accuracy and decreases information leakage significantly, from 34.0% to 9.9% for full-information leakage.
Benchmarking agent-driven software models with transformer tools
A new benchmarking approach evaluates the efficiency of coding agents in software development, focusing on task completion rather than just final output. This shift highlights the importance of designing libraries for effective agent interaction, emphasizing the need for clear APIs and documentation.
GLM-5.2 Launches with Advanced Long-Horizon Coding Capabilities
GLM-5.2 introduces a 1M-token context improving performance in long-horizon coding tasks. The model features enhanced coding capabilities and architecture improvements that significantly reduce computational costs while maintaining performance, marking it as a competitive player in the open-source sector.
New ARD Specification Enables Dynamic Agent Searches Across Tools
The Agentic Resource Discovery (ARD) specification has been developed collaboratively by major tech companies to allow agents to discover tools at runtime. This move shifts from a static model requiring pre-installed capabilities to dynamic, intent-based searches, enhancing the ability of agents to access and utilize a broader range of tools effectively.
Migrating CI from GitHub to Hugging Face Jobs for Enhanced Performance
Trackio has migrated its CI from GitHub Actions to Hugging Face Jobs, achieving a 30% reduction in CPU CI time and enabling GPU testing. This step is significant for improving efficiency and expanding testing capabilities in machine learning projects.
OpenEnv Gains Support from Major AI Organizations for Open Source Development
OpenEnv has transitioned to an open-source model coordinated by leading AI organizations such as Meta-PyTorch and Microsoft. This move aims to improve agent training efficiency across various AI harnesses and environments, fostering collaboration within the AI community.
Nemotron 3.5 Enhances Multimodal Content Safety with Custom Policies
Nemotron 3.5 introduces customizable multimodal safety integration, considering user prompts, images, and responses simultaneously. This update captures policy violations emerging from interaction, enhancing deployments across various global languages and industries.
Direct Preference Optimization Reduces Text Degeneration in OCR Models
DharmaOCR introduces Direct Preference Optimization (DPO) to combat text degeneration in OCR models. The second training stage reduced degeneration rates by an average of 59.4%, addressing a significant limitation of supervised fine-tuning.
Holo3.1 Released with Local Execution and Enhanced Performance
Holo3.1 has been released, featuring enhanced robustness for local and mobile environments, quantized checkpoints for local inference, and improved performance across various deployment frameworks. This release addresses the challenges of deployment flexibility and performance consistency in diverse operational settings.
JetBrains Launches Mellum2: 12B Mixture-of-Experts AI Model
JetBrains has released Mellum2, a 12 billion-parameter Mixture-of-Experts model optimized for natural language and coding tasks. With efficient parameter activation and over 2x faster inference compared to similar models, Mellum2 is positioned for high-throughput AI applications.
Analysis of AI Specialization and Its Emergence as a Key Principle
A recent analysis highlights the inevitability of specialization in effective AI systems, drawing on various domains. It argues that focused AI systems outperform general models, correlating with findings in optimization theory and evolutionary biology.
Exploring Alternatives to LoRA in Parameter-Efficient Fine-Tuning
The article investigates alternatives to LoRA, the predominant technique in parameter-efficient fine-tuning (PEFT). It highlights the potential of PEFT techniques to reduce memory requirements for model fine-tuning and mentions the development of the PEFT library by Hugging Face, which supports various methods and improves accessibility.
DiScoFormer model estimates density and score for data distributions
The DiScoFormer model estimates both the density and score of data distributions in a single forward pass. This model improves upon existing methods by allowing for high-dimensional data analysis without the need for retraining, addressing challenges in density estimation and score matching.
Hugging Face simplifies vLLM server setup with single command
Hugging Face introduced a command to run a vLLM server easily, facilitating model testing and evaluation. This command allows users to quickly deploy models and interact with them via the OpenAI API using Hugging Face infrastructure.
Hugging Face Enhances CLI and Adopts Weekly Releases for Improved Efficiency
Hugging Face has updated their command-line interface (CLI) to cater to both human and artificial intelligence (AI) agents, optimizing token usage. Additionally, they have shifted to a weekly release schedule for the huggingface_hub Python client to accelerate the implementation of fixes and features. These changes enhance CLI efficiency and streamline the release process.
Strands Robots SDK integrates LeRobot for seamless robot task management
The Strands Robots SDK now integrates LeRobot hardware and simulations, streamlining task management for robots. Users can record, test, and deploy robot tasks with fewer tools, enhancing workflow efficiency across multiple robots.
Agent Creates 3D Paris Gallery Using Hugging Face Spaces
A coding agent utilized Hugging Face Spaces to create a web gallery featuring 3D Gaussian models of Paris monuments without manually engaging with image or 3D tools. This illustrates a shift towards modular software construction where AI integrates existing components easily.
Introduction of MCP Tools for Reachy Mini Enhances Remote Functionality
The Reachy Mini now supports remote tools through MCP canary Space, allowing the addition of external functionalities like weather queries. This update enhances the robot's interactivity and potential use cases without modifying the core app directly.
Profiling in PyTorch: Expanding to Fused MLP with nn.Linear
The second part of the 'Profiling in PyTorch' series introduces the use of nn.Linear to create a Multilayer Perceptron (MLP) block. This change highlights how to efficiently profile and optimize deep learning models in PyTorch by leveraging GPU capabilities.