ShengShu Launches Vidu Q3 Reference-to-Video with Expanded Visual and Audio Capabilities
Alibaba Leads
Built for storytelling, Vidu Q3 Reference-to-Video enables creators to generate high-quality videos by referencing and combining a wide range of inputs—including subjects, environments, costumes, props, and visual styles—within a single workflow, significantly improving creative control, consistency, and efficiency.
The release expands capabilities across visual effects, audio, and scene composition. It supports six types of cinematic visual effects, including particle systems, fluid simulation, dynamic motion, camera movement, transitions, and lighting, enabling more expressive visual outputs. In parallel, the model enhances audio generation with five categories of sound capabilities for more natural and expressive results, covering ambient sound, motion-driven audio, atmospheric layers, foley effects, and emotion-driven cues. Together, these improvements enable greater scene diversity and more immersive, production-ready video outputs.
Designed for use cases including short-form series, animation, film and television, as well as advertising and e-commerce, Vidu Q3 Reference-to-Video enables faster production of high-quality video content for both creators and enterprises. This performance is further reflected in third-party benchmarks, where Vidu Q3 ranked No.1 in the first global Reference-to-Video leaderboard released by SuperCLUE.
Built on the Vidu Q3 model foundation, Vidu has been fully integrated across its product ecosystem, including
The release reflects ShengShu's broader progress in advancing its world model capabilities across digital environments.
In parallel with the product release, ShengShu announced it has raised
The funding will support ShengShu's broader vision of building a general world model that bridges the digital and physical worlds. The company is advancing both its World Generation Model (WGM), which powers digital content creation through the Vidu model family, and its World Action Model (WAM), designed for physical-world interaction. Together, these systems aim to enable unified modeling, prediction, and action across environments.
ShengShu is among the first globally to pursue a unified world model architecture that connects digital and physical domains. At the core of this system is its Foundation World Model, which underpins both WGM and WAM.
Within this framework, the Vidu model family focuses on content generation and interaction in digital environments. It supports synchronized audio-visual generation, extended video duration, strong temporal and spatial consistency, and cinematic-quality visuals. Its proprietary reference-based video generation capability addresses consistency challenges in multi-subject video production.
At launch, Vidu Q3 ranked No.1 globally on the benchmark published by
Vidu is available to global developers, creators, and enterprises through both MaaS (Vidu API platform) and SaaS offerings, and has been integrated into
Dr.
"At its core, a world model gives AI a unified way to represent and predict the real world.
Video plays a critical role in this, as it naturally captures time, space, motion, and causality.
By building a unified model architecture, we aim to connect perception and action—creating a complete loop from understanding the world, to generating it, to acting within it, and ultimately making the world model a true bridge between the digital and physical worlds."
View original content to download multimedia:https://www.prnewswire.com/news-releases/shengshu-launches-vidu-q3-reference-to-video-with-expanded-visual-and-audio-capabilities-302740489.html
SOURCE ShengShu Technology