AI dubbing | P Foundation

The problem

A dubbing chain, scattered across specialists

Traditional dubbing is spread across a transcription house, a translator working against timecodes, voice talent in a booth, an audio engineer rebuilding the mix, and a graphics operator re-versioning the lower thirds, with weeks of round trips between them. AI dubbing folds that chain into one surface where the transcript, the timed translation, the generated voices, the stems, and the final picture are all views of the same project.

How it works

From source to dubbed cut

A project starts from a video already in the media library and ends with versioned, downloadable cuts. The AI drafts each stage; nothing advances until an editor has reviewed it, and every long stage renders in the background and re-attaches when you return.

1
Transcript
Three AI passes prepare the source: speaker-diarized transcription, audio separated into clean speech and background stems, and a vision pass that inventories every piece of burned-in text. The transcript is an editable document, and every segment carries a treatment.
2
Translation
Each line is generated against its slot duration and the target language’s natural speaking rate, compressing or expanding the phrasing until it can be spoken in the time it has. Expressive delivery tags ride along and steer the synthesis.
3
Voices and mix
Each line is synthesized in its assigned voice and tempo-fitted to its slot. The compile places every take at its true timecode, ducks the original under voice-over segments, keeps original voices for CC, and lays the background bed underneath, producing a master mix plus stems.
4
On-screen graphics
Each language version translates the on-screen text inventory, and translated lower thirds render through HTML templates into transparent animated clips, previewed before they reach a final cut.
5
Optional lip-sync
An AI model can re-animate the speakers’ mouths to match the dubbed audio, driven per voice stem and touching only the frames where someone is actually being dubbed. Long renders are resume-safe end to end.
6
The final cut
Pick a base, compiled audio over the source picture or a lip-synced video, and a graphics version. The studio combines them without re-running anything upstream, and every cut records exactly which versions it sits on.

Translation built for the clock

A translation that fits the time it has

Dubbing translation is timing work as much as language work. Each line is generated against its slot duration and the target language's natural speaking rate, and provenance tracking flags a translation, then its audio, stale the moment something upstream changes.

English sourcesource
We worked through the night to reach this agreement.
Word for wordruns long
Nous avons travaillé pendant toute la nuit afin de parvenir à cet accord.
Fitted to the clockfits
On a travaillé toute la nuit pour cet accord.

The line marks the segment's 4.4 second slot. A word-for-word rendering runs past it; the studio compresses the phrasing until it fits. Arabic and Persian compress differently again, and right-to-left scripts are handled throughout.

Per-segment treatments

Every line gets the treatment it needs

The transcript is editable, and each segment carries the dubbing it will receive. Statements that must stay authentic keep their original voice; overlap and chatter are cut. The dubbed timeline is built from those choices.

Lip-syncDubLip-syncCCcut

Lip-syncGenerated voice replaces the slot; mouths can be re-animated to match.
DubGenerated voice plays over the original, which ducks underneath.
CCOriginal voice kept as recorded; nothing is synthesized.
DiscardCut from the dubbed timeline entirely.

One source, many languages

A foundation that stops moving once you build on it

Lock the source

Creating the first language version locks the source transcript permanently. Every translation downstream is anchored to that timing, and the foundation stops shifting under the work built on it.

Independent versions

Each language version lives its own life: its own translations, voices, audio, graphics, and cuts, fully independent of its siblings.

A human at every step

The AI drafts each stage; nothing advances until an editor has reviewed and corrected it. Provenance tracking keeps translations and audio from drifting out of sync quietly.

Runs in the background

Every long-running stage renders in the background with live progress. Leave the page mid-render and the editor re-attaches to the running work when you return.

Current target languages

Arabic
English
French
Persian

Who it is for

Built for multi-language versioning

Broadcast

Broadcasters re-versioning programming

Carry a finished program to other languages without a chain of outside specialists. The transcript, the timed translation, the voices, the mix, and the graphics are views of the same project, and your own editors review every stage.

News

Newsrooms carrying reporting across languages

Take reporting to audiences in other languages with broadcast-ready cuts. Per-segment treatments keep statements in their original voice where they must stay authentic, and right-to-left scripts are handled throughout.

Carry your programming to every audience

AI dubbing is available to select MediaGuard partners as part of the AI inference support the foundation provides. Tell us about your organization and what you produce, and we will take it from there.

Apply through MediaGuard Talk to us first

One finished video. Broadcast-ready in every language.

A dubbing chain, scattered across specialists

From source to dubbed cut

Transcript

Translation

Voices and mix

On-screen graphics

Optional lip-sync

The final cut