← All stories
● Covered by 1 source · 1 reportMedium impact

AMD MI355X Achieves High Inference Rates at Lower Cost than NVIDIA Blackwell

Aggregated by BrevFeed ai · updated 1h ago

🔖 Save

AMD's MI355X GPU has reached a throughput of 2626 tokens per second at a cost significantly lower than NVIDIA's Blackwell. This development is notable as the demand for inference capabilities continues to rise, highlighting AMD's potential role in the competitive AI hardware market.

Key points

AMD MI355X delivers 2626 tok/s/node at 2.4 rps.
Performance per dollar is superior to NVIDIA Blackwell.
ROCm stack requires additional optimization for efficiency.

Performance Comparison

AMD's MI355X GPU achieved an aggregate throughput of 2626 tokens per second per node at 2.4 request-per-second (rps). This performance is approximately 80% of that measured on NVIDIA's B200 but is over twice as cost-effective. The results underscore the increasing competitiveness of AMD in the AI hardware space amid skyrocketing demand for inference.

Cost Advantage

On average, AMD's MI355X is approximately 2.75 times cheaper than NVIDIA's B300 GPU. This significant cost difference presents a viable alternative for organizations seeking efficient inference solutions in light of rising GPU prices.

Optimization Challenges

Even with competitive hardware, achieving optimal performance on AMD GPUs often requires substantial engineering efforts. The MI355X / ROCm stack's initial setup can be cumbersome, frequently necessitating weeks to build and optimize for the latest models, which can hinder immediate deployment capabilities.

Quantization and Framework Choices

The implementation involved quantizing the base bf16 GLM-5.2 model to MXFP4, resulting in a lossless setup as confirmed by various standards. The choice of inference framework was also critical; sglang was selected for its lower friction and compatibility, unlike other frameworks that did not support MXFP4 effectively.

✨ This summary was generated by AI from the outlets' reporting listed below. It is not independently verified and may contain errors — check the original sources. How BrevFeed works →

Reporting from

Hacker News Front Page — GLM5.2 on AMD MI355X at 2626 tok/s/node at over 2x lower cost than Blackwell 💬 Discuss on HN 4h ago →