← All stories
● Covered by 1 source · 1 reportHigh impact

Manticore Search 27.1.5 Features 14x Faster ONNX Model Execution

Aggregated by BrevFeed dev · updated 15h ago

🔖 Save

Manticore's recent update to version 27.1.5 introduces a new ONNX Runtime backend, improving model execution speed by approximately 14 times over the previous SentenceTransformers/Candle method. This significant performance boost allows for much higher document processing rates, enhancing the capabilities of applications relying on embedding models.

Key points

New ONNX Runtime backend added in Manticore Search 27.1.5
Execution speed increased by ~14x over previous method
Document processing speeds range from 70-230 docs/sec
Single-insert latency reduced to ~14 ms under single-client load

Introduction of ONNX Runtime Backend

Manticore has launched a new ONNX Runtime backend with the release of Manticore Search 27.1.5, designed to enhance the execution speed of embedding models. The ONNX format is compatible with popular models such as MiniLM and E5, providing a lightweight, efficient method for handling model inference.

Performance Improvements Achieved

The new backend demonstrates a notable performance increase, averaging 14 times faster than the previous pipeline that utilized SentenceTransformers and Candle. Testing showed that while the old method processed between 5 to 11 documents per second, the new ONNX path performs in the range of 70 to 230 documents per second across various configurations.

Impact on Latency and Throughput

Latency has also seen a significant reduction, with single-insert times measured at approximately 14 milliseconds under low load, and around 56 milliseconds under higher concurrent scenarios. These metrics are a major improvement over the previous Candle implementation, which often exceeded 200 milliseconds.

Optimizations in Implementation

Key optimizations involved turning off intra_op spinning and discontinuing the batching of documents, resulting in improved parallelism and efficiency. When operational with higher batch sizes and single-threaded clients, the system can peak at document processing rates of up to 233 docs per second.

Conclusion

The new ONNX Runtime in Manticore Search not only streamlines the embedding model execution but also significantly enhances the overall throughput of applications utilizing these models. This update is likely to have a far-reaching impact on industries leveraging Manticore for scalable search and data processing solutions.

✨ This summary was generated by AI from the outlets' reporting listed below. It is not independently verified and may contain errors — check the original sources. How BrevFeed works →

Primary sources

GitHub manticoresoftware/manticoresearch GitHub manticoresoftware/columnar GitHub manticoresoftware/manticoresearch-php GitHub manticoresoftware/manticoresearch-python GitHub manticoresoftware/manticoresearch-python-asyncio GitHub manticoresoftware/manticoresearch-javascript

Reporting from

Hacker News Front Page — 14× faster embeddings: how we rebuilt the ONNX path in Manticore 💬 Discuss on HN 18h ago →