Adding audio-to-video generative capability to our LTX-2 model allows video generation with your chosen sound as the starting point.
Audio-to-video is now live in LTX and ElevenLabs’ as the exclusive launch partner, with broader access rolling out January 27.
This isn’t text-to-video with audio added on. It starts with sound. Audio becomes the control layer, with voice, music, and sound effects shaping timing, motion, and performance from the very first frame, not decorating visuals after the fact. The result is a faster, more seamless way for creators to build beautiful video directly from their audio concepts, including voice and music-driven content.
The structure of the video emerges from the audio itself. Speech cadence determines pacing. Musical energy influences motion and camera behavior. Scene changes happen where the sound demands them, not where a prompt guesses they should go.
This is particularly powerful early in the creative process, when teams are trying to explore ideas quickly and get a sense of how something feels before committing to polish.
Solving a Longstanding Challenge
For years, video tools have treated audio as a separate workstream. Even advanced generative systems bring sound in late, after scenes, shots, and motion are already decided.
If you want visuals that actually match a voice or a piece of music, you’re forced to translate sound into something else: prompts, timestamps, camera notes, or edits after the fact.
These workarounds are so common we’ve stopped questioning it. But it breaks down quickly.
Audio already contains intent. It carries timing, emphasis, rhythm, and emotion. When it isn’t allowed to lead, videos feel less natural.
Audio-to-video starts from a simple idea: stop translating sound, and let it drive generation directly.
Launching with ElevenLabs
We’re launching audio-to-video with ElevenLabs, a global leader in AI audio research and deployment, as our exclusive launch partner during the initial release window.
ElevenLabs’ technology makes it possible to create state-of-the-art audio that tells a story, and using LTX-2’s technology it can now seamlessly turn into a full visual story by generating the video layer.
“Exclusively providing our users with LTX’s unmatched audio to video generative capabilities enables our community to tap into their incredible creativity, and build professional-grade videos quickly. We are extremely excited about this partnership with Lightricks because we have always believed that AI should empower creators to quickly and easily get past technical roadblocks to achieve their full vision.”
Luke Harries, Growth at ElevenLabs
“As we expand audio-to-video capabilities to supercharge the creative process, ElevenLabs, a global leader in AI audio, is a natural partner. Starting the creative process from sound gives creators precise control over pace, performance, and structure - an approach long used in animation and now becoming accessible across all video creation.”
Daniel Berkovitz, Chief Product Officer at Lightricks
Built for Real Workflows
Audio-to-video will be available in LTX and ElevenLabs Image & Video on January 20th with API and open-source access on January 27.
Users provide an audio file — voice, dialogue, music, or sound effects — as the primary input. An optional image can anchor a character or scene, and a short text prompt can guide visual style, but audio remains in control.
The output is a single Full HD video clip whose length and motion are driven by the audio. For longer sequences, clips can be chained together, allowing teams to build full videos modularly without abandoning the audio-first approach.
This is infrastructure, not a demo, designed for platforms, developers, and studios building products and pipelines, not just experimenting.
LTX Empowering Creators
Audio-to-video is part of the broader LTX ecosystem.
Lightricks builds the underlying models and infrastructure, with LTX as the access point for developers and creators. LTX applies the same technology to real creative workflows, helping shape how these tools are used in practice.
Each layer informs the others — research, platform, and production moving together.
For platforms, true audio-led control enables new audio-first products without complex orchestration. For builders, it reduces the gap between intent and output. For creative teams, it makes sound something you can start with, not fix later.
Sound has always done the hard work of storytelling. Now it finally gets to lead.










.png)
