Lightricks Launches Audio-to-Video Generation with Exclusive ElevenLabs Partnership

Lightricks introduces audio to video generation with LTX, launching exclusively with ElevenLabs to let sound drive video from the first frame.

Download Backgrounds
Lightricks Launches Audio-to-Video Generation with Exclusive ElevenLabs Partnership

Lightricks Launches Audio-to-Video Generation with Exclusive ElevenLabs Partnership

Lightricks introduces audio to video generation with LTX, launching exclusively with ElevenLabs to let sound drive video from the first frame.

Custom Video Thumbnail Play Button

Lightricks Launches Audio-to-Video Generation with Exclusive ElevenLabs Partnership

Lightricks introduces audio to video generation with LTX, launching exclusively with ElevenLabs to let sound drive video from the first frame.

Custom Video Thumbnail Play Button
Table of contents:

Adding audio-to-video generative capability to our LTX-2 model allows video generation with your chosen sound as the starting point.

Audio-to-video is now live in LTX and ElevenLabs’ as the exclusive launch partner, with broader access rolling out January 27.

This isn’t text-to-video with audio added on. It starts with sound. Audio becomes the control layer, with voice, music, and sound effects shaping timing, motion, and performance from the very first frame, not decorating visuals after the fact. The result is a faster, more seamless way for creators to build beautiful video directly from their audio concepts, including voice and music-driven content.

The structure of the video emerges from the audio itself. Speech cadence determines pacing. Musical energy influences motion and camera behavior. Scene changes happen where the sound demands them, not where a prompt guesses they should go.

This is particularly powerful early in the creative process, when teams are trying to explore ideas quickly and get a sense of how something feels before committing to polish.

Solving a Longstanding Challenge

For years, video tools have treated audio as a separate workstream. Even advanced generative systems bring sound in late, after scenes, shots, and motion are already decided.

If you want visuals that actually match a voice or a piece of music, you’re forced to translate sound into something else: prompts, timestamps, camera notes, or edits after the fact.

These workarounds are so common we’ve stopped questioning it. But it breaks down quickly.

Audio already contains intent. It carries timing, emphasis, rhythm, and emotion. When it isn’t allowed to lead, videos feel less natural.

Audio-to-video starts from a simple idea: stop translating sound, and let it drive generation directly.

Launching with ElevenLabs

We’re launching audio-to-video with ElevenLabs, a global leader in AI audio research and deployment, as our exclusive launch partner during the initial release window.

ElevenLabs’ technology makes it possible to create state-of-the-art audio that tells a story, and using LTX-2’s technology it can now seamlessly turn into a full visual story by generating the video layer. 

“Exclusively providing our users with LTX’s unmatched audio to video generative capabilities enables our community to tap into their incredible creativity, and build professional-grade videos quickly. We are extremely excited about this partnership with Lightricks because we have always believed that AI should empower creators to quickly and easily get past technical roadblocks to achieve their full vision.”

Luke Harries, Growth at ElevenLabs

“As we expand audio-to-video capabilities to supercharge the creative process, ElevenLabs, a global leader in AI audio, is a natural partner. Starting the creative process from sound gives creators precise control over pace, performance, and structure - an approach long used in animation and now becoming accessible across all video creation.”

Daniel Berkovitz, Chief Product Officer at Lightricks

Built for Real Workflows

Audio-to-video will be available in LTX and ElevenLabs Image & Video on January 20th with API and open-source access on January 27.

Users provide an audio file — voice, dialogue, music, or sound effects — as the primary input. An optional image can anchor a character or scene, and a short text prompt can guide visual style, but audio remains in control.

The output is a single Full HD video clip whose length and motion are driven by the audio. For longer sequences, clips can be chained together, allowing teams to build full videos modularly without abandoning the audio-first approach.

This is infrastructure, not a demo, designed for platforms, developers, and studios building products and pipelines, not just experimenting.

LTX Empowering Creators

Audio-to-video is part of the broader LTX ecosystem.

Lightricks builds the underlying models and infrastructure, with LTX as the access point for developers and creators. LTX applies the same technology to real creative workflows, helping shape how these tools are used in practice.

Each layer informs the others — research, platform, and production moving together.

For platforms, true audio-led control enables new audio-first products without complex orchestration. For builders, it reduces the gap between intent and output. For creative teams, it makes sound something you can start with, not fix later.

Sound has always done the hard work of storytelling. Now it finally gets to lead.

No items found.
Share this post

Experience storytelling transformed

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Start for Free