DigitalOcean Launches Inference Engine with New Capabilities for Production AI, Including Inference Router for Efficient Scaling of Agentic Workloads
Built alongside early design partners, the Inference Engine gives AI developers unified control over performance, cost, and scale — with customers reporting up to 67% lower inference costs.
DigitalOcean’s Inference Engine is built around four core capabilities: Inference Router, Batch Inference, Serverless Inference, and Dedicated Inference, giving development teams a single engine to match every workload type to the right performance and cost profile, without stitching together separate providers.
New Capabilities: Built for How AI Actually Runs in Production
Inference Router is designed to solve one of the biggest inefficiencies in agentic AI: sending every request to the most expensive model. With Inference Router, AI builders can define a model pool, describe tasks and priorities in natural language mapped to that model, and optimize each request for cost and latency. Powered by DigitalOcean’s purpose-built MoE (Mixture of Expert) router model, Inference Router matches each request to the right model, helping teams improve performance and unit economics without the need to build or manage routing infrastructure themselves. Customers like LawVo are already benefitting from this new capability:
"
Dedicated Inference delivers predictable performance and exceptional unit economics for teams running high-scale, sustained workloads, with reserved capacity that eliminates the variability of shared infrastructure.
Serverless Inference provides a single API key to access dozens of models, with scale-to-zero elasticity and the industry’s first off-peak pricing, giving teams instant access to leading open-source models without managing infrastructure or paying for idle capacity.
Batch Inference reduces the cost of offline AI workloads by 50% through asynchronous execution, built-in retries, and a guaranteed 24-hour completion window. Batch Inference is purpose-built for workloads where real-time response isn't required but reliability is critical.
“Most teams building agentic systems today make a single model decision and apply it uniformly across their agentic workflows. They default to a frontier model and pay the generalization tax: premium prices and higher latency for work that often does not require the most expensive closed source model. Inference Routeris the essential AI middleware that removes that tax by intelligently matching requests to the right model based on task, context, and developer-defined preferences. The result is a smarter operating model for inference - one that gives developers more control over quality, speed, and cost while helping AI-native builders move faster and build more durable businesses on DigitalOcean.” —
Performance Benchmarks: Independent Validation
The new Inference Engine was built around three core advances: hardware and software integrations, including vLLM, TensorRT, and SGLang to maximize token throughput; request-path and model-level optimizations that improve unit economics without compromising quality; and distributed scaling designed for the bursty, uneven demands of production AI applications.
According to
Customers Report Significant Cost and Performance Gains
The Inference Engine was co-developed alongside early design partners running real production workloads, and the results are already showing at scale.
"In healthcare AI, a node going down isn't just an SLA issue, it impacts patient experience. We've pressed
Workato's
"Through close collaboration on performance optimization,
At Deploy in
About
View source version on businesswire.com: https://www.businesswire.com/news/home/20260428279648/en/
Investor Relations
investors@digitalocean.com
Media Relations
press@digitalocean.com
Source: