AMD's MI355X GPU has reached a throughput of 2626 tokens per second at a cost significantly lower than NVIDIA's Blackwell. This development is notable as the demand for inference capabilities continues to rise, highlighting AMD's potential role in the competitive AI hardware market.
AMD's MI355X GPU achieved an aggregate throughput of 2626 tokens per second per node at 2.4 request-per-second (rps). This performance is approximately 80% of that measured on NVIDIA's B200 but is over twice as cost-effective. The results underscore the increasing competitiveness of AMD in the AI hardware space amid skyrocketing demand for inference.
On average, AMD's MI355X is approximately 2.75 times cheaper than NVIDIA's B300 GPU. This significant cost difference presents a viable alternative for organizations seeking efficient inference solutions in light of rising GPU prices.
Even with competitive hardware, achieving optimal performance on AMD GPUs often requires substantial engineering efforts. The MI355X / ROCm stack's initial setup can be cumbersome, frequently necessitating weeks to build and optimize for the latest models, which can hinder immediate deployment capabilities.
The implementation involved quantizing the base bf16 GLM-5.2 model to MXFP4, resulting in a lossless setup as confirmed by various standards. The choice of inference framework was also critical; sglang was selected for its lower friction and compatibility, unlike other frameworks that did not support MXFP4 effectively.
β¨ This summary was generated by AI from the outlets' reporting listed below. It is not independently verified and may contain errors β check the original sources. How BrevFeed works β
AMD's MI355X GPU has reached a throughput of 2626 tokens per second at a cost significantly lower than NVIDIA's Blackwell. This development is notable as the demand for inference capabilities continues to rise, highlighting AMD's potential role in the competitive AI hardware market.