AWS and Cerebras team up to supercharge AI inference

AWS and Cerebras team up to supercharge AI inference Newspoint | Mar 13 2026 09:01:44 IST

The system splits tasks for better performance

Instead of making one chip do all the work, the system splits tasks: Trainium3 handles the "prefill" (turning requests into tokens), while Cerebras chips take care of the "decode."
This teamwork is powered by custom networking designed for speedy responses.

Cerebras's chips are already outperforming NVIDIA's GPUs

Cerebras's WSE-3 chip is seriously powerful: with 4 trillion transistors and 900,000 AI cores, a CS-3 system can process up to 1,800 tokens per second (for the Llama 3.1 8B example, about 20 times faster than comparable NVIDIA GPU-based solutions on some models).
Meanwhile, AWS's Trainium3 handles the prefill stage and is designed to provide high performance for that role, great for heavy-duty AI tasks.
Amazon says Trainium3—and future Trainium4—are expected to lead in price-performance versus merchant GPUs.

Contact to : xlf550402@gmail.com

Privacy Agreement