AWS and Cerebras Collaboration Aims to Set a New Standard for AI Inference Speed and Performance in the Cloud
Deployed in AWS data centers and accessed through
Key Takeaways
-
Fastest inference coming soon: AWS and Cerebras are partnering to deliver the fastest AI inference available through
Amazon Bedrock, launching in the next couple of months. - Industry-leading speed and performance: With AWS Trainium optimized for prefill and Cerebras CS-3 optimized for decode, this innovative integrated system will provide unmatched performance and speed for AI inference.
-
Pioneering cloud collaboration: AWS is the first cloud provider for Cerebras's disaggregated inference solution, available exclusively through
Amazon Bedrock.
This press release features multimedia. View the full release here: https://www.businesswire.com/news/home/20260313406341/en/
“Inference is where AI delivers real value to customers, but speed remains a critical bottleneck for demanding workloads like real-time coding assistance and interactive applications,” said
“Partnering with AWS to build a disaggregated inference solution will bring the fastest inference to a global customer base,” said
How
The Trainium + CS-3 solution enables “inference disaggregation,” a technique which separates AI inference into two stages: prompt processing, or “prefill,” and output generation, or “decode.” These two stages have profoundly different computational characteristics. Prefill is natively parallel, computationally intensive, and requires moderate memory bandwidth. Decode, on the other hand, is inherently serial, computationally light, and memory bandwidth intensive. Decode typically represents the majority of inference time in these scenarios because each output token must be generated sequentially.
Because each stage has a different computational challenge, they each benefit from different compute architectures and low-latency, high-bandwidth EFA networking between them. By strategically disaggregating the inference problem — with Trainium optimized for prefill and the Cerebras CS-3 optimized for decode — the two different computational challenges can be optimized in a specialized way.
Built on the AWS Nitro System — the foundation of AWS's secure, high-performance cloud infrastructure — the new solution will ensure that Cerebras CS-3 systems and Trainium-powered instances operate with the same security, isolation, and operational consistency customers expect from AWS.
AWS Trainium for Prefill and Cerebras CS-3 for Decode
Trainium is
Cerebras' CS-3 is the world's fastest AI inference system. It delivers thousands of times greater memory bandwidth than the fastest GPU. As reasoning models now represent a majority of inference to compute and generate more tokens per request as they “think” through problems, the need to accelerate this portion of the workflow has grown accordingly. OpenAI, Cognition, Mistral, and others use Cerebras to accelerate their most demanding workloads, especially agentic coding where developer productivity is constrained by inference speed.
In the disaggregated solution, CS-3 will be fully dedicated to decoding acceleration, enabling dramatically higher capacity for fast output tokens. With Trainium handling prefill, the CS-3 handling decode operations, and high-speed EFA networking connecting them, each processor will deliver maximum token capacity for its focused part of the workload.
About
About
This press release contains forward-looking statements, including statements regarding the expected benefits of our products and the transaction described herein. These statements are subject to risks and uncertainties that could cause actual results to differ materially. Neither we nor any other person assumes responsibility for the accuracy and completeness of forward-looking statements. The forward-looking statements included in this press release relate only to events and information as of the date hereof. Cerebras undertakes no obligation to update or revise any forward-looking statement as a result of new information, future events or otherwise, except as otherwise required by law.
View source version on businesswire.com: https://www.businesswire.com/news/home/20260313406341/en/
Media Contact
pr@zmcommunications.com
Source: