LTX 2.3 is live in LTX Studio

LTX 2.3 brings native 4K portrait video, keyframe control, rebuilt audio, and 4x better prompt understanding to LTX Studio. Five structural upgrades.

LTX Studio interface showing LTX 2.3 features including 4K portrait video generation and keyframe control

LTX 2.3 is live in LTX Studio

LTX 2.3 brings native 4K portrait video, keyframe control, rebuilt audio, and 4x better prompt understanding to LTX Studio. Five structural upgrades.

Custom Video Thumbnail Play Button

LTX 2.3 is live in LTX Studio

LTX 2.3 brings native 4K portrait video, keyframe control, rebuilt audio, and 4x better prompt understanding to LTX Studio. Five structural upgrades.

Custom Video Thumbnail Play Button
Key takeaways:

LTX 2.3 is live in LTX Studio.

This release touches every part of the generation pipeline—visual output, audio, image-to-video, prompt understanding, and format support. The changes are structural, not cosmetic. A rebuilt VAE, a 4x larger text connector, a completely reworked audio pipeline, keyframe control, and native 4K portrait video. Less time fixing outputs. More time using them.

Here’s what’s new and what it means for your work.

Native 4K portrait video, trained from scratch

Vertical video is no longer a workaround in LTX Studio. With LTX 2.3, you generate portrait-orientation content natively, at up to 4K resolution.

Previous versions required generating in landscape and cropping or reformatting in post to get a usable vertical output. That added steps, reduced quality at the edges, and meant your vertical content was always derived from something designed for a different format. The result was rarely as sharp or well-composed as content built for vertical from the start.

LTX 2.3 was trained on portrait-orientation data. The model understands vertical composition at the architecture level. Set the resolution in Studio, generate, and you have a 4K vertical clip ready for TikTok, Instagram Reels, YouTube Shorts, or any mobile-first format.

For creative teams that run regular content programs across platforms, this is not a minor addition. Vertical has been the default format for social for years. Generating it natively, at 4K, without a reformatting step baked into every project, removes friction from a task that was already repeating constantly.

If short-form social content is a regular deliverable, you will feel this change immediately.

Keyframe control: direct the motion, not just the model

The most-requested feature in LTX Studio history is live.

Keyframe control lets you define motion between specific points in your video. Instead of generating a clip and hoping the movement lands where you want it, you set the frames that matter and the model fills in the motion between them with consistency.

In practice, this means you are directing the output rather than prompting it. Set a keyframe at the moment a product comes into focus. Set another at the end of a camera movement. The model respects those anchors and generates the motion in between accordingly. The result is a clip that behaves the way you intended, not just one that happens to look good.

Start and end frame support is included in this release. You can anchor the opening image and the closing image of a clip separately, defining both the entry point and the exit point. This is particularly useful for ad sequences, storyboard panels, and any content where you need a precise start and a precise finish without regenerating until something happens to work.

For creative teams building multi-shot sequences or working from a detailed creative board, keyframe control changes the production dynamic. You are composing. Not guessing.

The control is available directly inside LTX Studio as part of the image-to-video workflow. No additional tools, no external pipeline steps required.

Image-to-video: more reliable, more usable output

Keyframe control and start/end frame are the headline I2V changes in LTX 2.3, but the underlying training improvements matter just as much for day-to-day production.

Previous versions occasionally produced frozen frames, static pans, or unexpected cuts on complex inputs. For teams running production pipelines, this created a real inefficiency: generating the same clip several times to get something usable, then deciding whether an acceptable output was worth keeping or whether spending more time generating was the right call. That decision loop eats time.

LTX 2.3 reworks the training to address these specific issues. Frozen videos are significantly reduced. Static pans appear less often. Unexpected cuts are fewer. Visual consistency from the input frame to the final output is stronger across the board.

Up to 60% fewer unusable outputs. For teams generating at volume, that reduction compounds across every project and every client. Less time waiting for something workable. More time editing, refining, and delivering.

A completely rebuilt audio pipeline

Audio in LTX 2.3 is not a patch. It is a rebuild.

The training set was filtered to remove silence, noise, and artifacts that were producing inconsistent results in previous versions. A new vocoder replaces the previous one. Together, these changes produce audio generation that is cleaner, more consistent, and better aligned with what is happening on screen.

3x fewer artifacts than the previous checkpoint. For teams producing branded video content, that reduction has a direct workflow impact: fewer audio correction passes, fewer times reaching for a separate audio editor to fix something that should not have needed fixing.

The improvement applies to both text-to-video and audio-to-video workflows. Whether you are generating from a text prompt or feeding in an audio track, the output is cleaner and more intentional.

For creative directors and producers working on content that will be published or presented to clients, audio quality is often the detail audiences notice first when it is off. With LTX 2.3, that risk is substantially lower.

Sharper outputs. Less post-production cleanup.

LTX 2.3 ships with a rebuilt VAE architecture and an updated latent space, both trained on higher-quality data. The difference shows in every output.

Fine textures, on-screen text, and edge detail are better preserved through the generation process. In previous versions, these elements had a tendency to soften, particularly at lower resolutions or in outputs with a lot of fine detail in the frame. Compensating for that often meant an additional sharpening pass in post, which added time and sometimes introduced its own issues.

With 2.3, that sharpening step should be necessary less often. The model preserves more of the fine detail from the source material and generates it with higher fidelity through the pipeline.

For creative teams, this shows up most clearly in outputs with detailed product shots, branded overlays with text, architectural or interior scenes, or any content where texture and edge clarity determine whether an output is publishable. It also means the gap between your reference image and your generated output is smaller. What you put in maps more accurately to what you get out.

4X better prompt understanding

The text connector in LTX 2.3 is 4X larger than the previous version, with an improved architecture that bridges prompt encoding and the generation model more accurately.

What changes in practice: complex prompts work better. If you have been writing shorter, simplified prompts to get consistent outputs, try being more specific. Multi-subject scenes, spatial positioning, particular lighting conditions, specific stylistic directions: the model interprets all of these more accurately now, with less drift from what you actually typed.

This is especially useful for creative teams working from detailed briefs or precise creative direction. Getting output that reflects a specific, detailed prompt, rather than a generalized approximation of it, reduces the number of generation attempts needed to reach something usable. That compounds quickly across a full project or a high-volume content program.

Think about how you have been prompting. You have likely been editing your inputs to fit what you know the model handles well. With 2.3, you can write the prompt that actually describes what you want. Give it context. Give it specifics. It handles them.

Every improvement in this release was built for teams using Studio in real production workflows. The difference is in the outputs. Open a project and try it—you’ll feel it fastest in your I2V work and anything with detailed textures or complex prompts.

Frequently Asked Questions

How is LTX 2.3 different from LTX 2?

This isn't a parameter bump. LTX 2.3 rebuilds core architecture across visual output, audio, image-to-video, and prompt understanding. The result is sharper output, cleaner audio, and more predictable results from the first generation.

How much better is prompt understanding?

The text connector is 4x larger. Complex, multi-subject prompts and detailed stylistic direction are interpreted more accurately, with less drift from what you typed.

What improved in audio?

The audio pipeline is a full rebuild: 3x fewer artifacts, tighter AV sync, and fewer correction passes across both text-to-video and audio-to-video workflows.

Does LTX 2.3 support vertical video?

Yes. Native portrait video up to 4K, trained on portrait-orientation data. Not a crop. Set the resolution in Studio and generate.

Will outputs look sharper?

Yes. The rebuilt VAE and updated latent space preserve fine textures, on-screen text, and edge detail better through the pipeline — less softening, less post-production cleanup needed.

Share this post
Table of contents: