Manticore's recent update to version 27.1.5 introduces a new ONNX Runtime backend, improving model execution speed by approximately 14 times over the previous SentenceTransformers/Candle method. This significant performance boost allows for much higher document processing rates, enhancing the capabilities of applications relying on embedding models.
Manticore has launched a new ONNX Runtime backend with the release of Manticore Search 27.1.5, designed to enhance the execution speed of embedding models. The ONNX format is compatible with popular models such as MiniLM and E5, providing a lightweight, efficient method for handling model inference.
The new backend demonstrates a notable performance increase, averaging 14 times faster than the previous pipeline that utilized SentenceTransformers and Candle. Testing showed that while the old method processed between 5 to 11 documents per second, the new ONNX path performs in the range of 70 to 230 documents per second across various configurations.
Latency has also seen a significant reduction, with single-insert times measured at approximately 14 milliseconds under low load, and around 56 milliseconds under higher concurrent scenarios. These metrics are a major improvement over the previous Candle implementation, which often exceeded 200 milliseconds.
Key optimizations involved turning off intra_op spinning and discontinuing the batching of documents, resulting in improved parallelism and efficiency. When operational with higher batch sizes and single-threaded clients, the system can peak at document processing rates of up to 233 docs per second.
The new ONNX Runtime in Manticore Search not only streamlines the embedding model execution but also significantly enhances the overall throughput of applications utilizing these models. This update is likely to have a far-reaching impact on industries leveraging Manticore for scalable search and data processing solutions.
β¨ This summary was generated by AI from the outlets' reporting listed below. It is not independently verified and may contain errors β check the original sources. How BrevFeed works β
Manticore's recent update to version 27.1.5 introduces a new ONNX Runtime backend, improving model execution speed by approximately 14 times over the previous SentenceTransformers/Candle method. This significant performance boost allows for much higher document processing rates, enhancing the capabilities of applications relying on embedding models.