Penguin Solutions’ OriginAI Factory Platform Delivers Optimized Performance for AI Inference
Breakthrough KV cache technology provides low latency, high throughput inference for AI, accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition and NVIDIA B300 GPUs
This press release features multimedia. View the full release here: https://www.businesswire.com/news/home/20260316410520/en/
Penguin Solutions’ OriginAI Factory Platform delivers optimized performance for AI inference with the expansion of its OriginAI portfolio with solutions that address the need for more GPU memory to solve context size and concurrency, and meet low latency demands of enterprise-scale AI inference.
OriginAI inference solutions are designed leveraging
“Penguin Solutions operationalizes and optimizes AI inferencing by delivering the performance, scalability, and reliability required to realize fully actionable insight and discovery,” said
Penguin’s MemoryAI™ KV Cache Server Matched with NVIDIA GPUs Optimizes OriginAI Solutions for Scalable AI Inference
Penguin Solutions OriginAI solutions also offer the flexibility to incorporate Penguin’s CXL-based MemoryAI KV cache server, designed to support customers’ KV strategies by expanding KV cache capacity, enabling low-latency, high-concurrency inference and extended context lengths for the most demanding applications. Use of Penguin’s MemoryAI KV cache server, which is compatible with the NVIDIA Dynamo framework, provides cost-efficiency and optimal design for the next wave of AI deployment.
OriginAI AI factory solutions also include
The OriginAI portfolio offers a range of configurations to address diverse customer needs. NVIDIA RTX PRO 6000-based architecture targets enterprise-class copilots, retrieval-augmented generation (RAG) systems, code assistance, and document summarization, delivering a lower acquisition cost, flexible deployment, and power-efficient performance for mid-sized models. NVIDIA B300-based architecture is designed for enterprise-wide AI platforms, long-context assistants, frontier model hosting, and agentic workloads, providing massive memory bandwidth and future-proof scalability for large, shared services.
Enterprise Inference for Financial Services, Healthcare, and Retail
OriginAI inference architectures help provide the flexibility to scale out and avoid overprovisioning by combining expert infrastructure design with meticulous in-factory builds and on-site deployment. This approach enables enterprises as well as cloud service providers (CSPs) and neoclouds to cost-efficiently deploy infrastructure tailored for use case and inference applications at scale. For example:
- Financial Services : AI-driven applications in financial services, such as fraud detection and algorithmic or high-frequency trading, require ultra-low latency to process transactions in real time, optimize trading opportunities, and ensure security.
- Healthcare: Precision in AI-powered diagnostics, patient monitoring, voice-enabled applications, and real-time medical translations depends on minimal latency to deliver timely and accurate insights, often in life-critical situations.
- Retail: AI-driven personalization, inventory management, and agentic decision-making systems enable real-time customer engagement and operational efficiency, helping businesses stay competitive.
AI is reshaping how organizations achieve efficiency, accuracy, and innovation.
To learn more, explore Penguin Solutions’ OriginAI inference solutions and/or visit booth #1031 at the
MemoryAI, OriginAI, and ICE ClusterWare are trademarks or registered trademarks of
About
The most transformative technological advancements are often the hardest to deploy and optimize.
In addition to our AI capabilities,
For more information, visit www.penguinsolutions.com.
View source version on businesswire.com: https://www.businesswire.com/news/home/20260316410520/en/
PR Contact
Maureen O’Leary
Corporate Communications
1-602-330-6846
pr@penguinsolutions.com
Source: