Skip to main content

Video Director Mode – Multi-Clip Generation & Merging

Video Director Mode

Director Mode extends NeuroLink's video generation capability to produce multi-segment videos with seamless AI-generated transitions. Instead of a single clip, you define an array of segments — each with its own prompt and image — and NeuroLink orchestrates the full pipeline: generating each clip, extracting boundary frames, producing transition videos (with individually configurable durations) using Veo 3.1's first-and-last-frame interpolation, and merging everything into one continuous video.

Overview

Director Mode is triggered automatically when you supply an input.segments array to the video generation API. Each segment is a self-documenting { prompt, image } object, mapping cleanly to the pipeline concept of ordered video segments.

graph TD
A["Segment 1: Image A + Prompt 1"] --> G1["Generate Clip 1 (4-8s)"]
B["Segment 2: Image B + Prompt 2"] --> G2["Generate Clip 2 (4-8s)"]
C["Segment 3: Image C + Prompt 3"] --> G3["Generate Clip 3 (4-8s)"]

G1 --> F1["Extract Last Frame of Clip 1"]
G2 --> F2a["Extract First Frame of Clip 2"]
G2 --> F2b["Extract Last Frame of Clip 2"]
G3 --> F3["Extract First Frame of Clip 3"]

F1 --> T1["Generate Transition 1→2"]
F2a --> T1
F2b --> T2["Generate Transition 2→3"]
F3 --> T2

G1 --> M["Merge: Clip1 + Trans1 + Clip2 + Trans2 + Clip3"]
T1 --> M
G2 --> M
T2 --> M
G3 --> M

M --> O["Final Merged Video (MP4)"]

How It Works

  1. Parallel clip generation – All main clips are generated concurrently via Veo 3.1's image-to-video endpoint
  2. Frame extraction – The last frame of clip N and first frame of clip N+1 are extracted from generated video buffers
  3. Transition generation – Veo 3.1 Fast's first-and-last-frame interpolation API generates a transition between each pair of adjacent clips, with individually configurable duration (4, 6, or 8 seconds each)
  4. Sequential merge – Clips and transitions are concatenated: Clip₁ → Trans₁₋₂ → Clip₂ → Trans₂₋₃ → Clip₃ → …
  5. Single output – The merged result is returned as one VideoGenerationResult buffer

Key Technology: Veo First-and-Last-Frame Interpolation

The transition clips use Veo 3.1's native lastFrame parameter in the predictLongRunning API. Instead of generating from a single image, you provide two images — the first frame and the last frame — and Veo generates a video that smoothly interpolates between them:

{
"instances": [
{
"prompt": "Smooth cinematic transition",
"image": {
"bytesBase64Encoded": "<LAST_FRAME_OF_CLIP_N>",
"mimeType": "image/jpeg"
},
"lastFrame": {
"bytesBase64Encoded": "<FIRST_FRAME_OF_CLIP_N+1>",
"mimeType": "image/jpeg"
}
}
],
"parameters": {
"sampleCount": 1,
"durationSeconds": 6,
"aspectRatio": "16:9",
"resolution": "720p"
}
}

This produces a physically coherent, AI-generated morph — far superior to simple crossfade or dissolve effects. The durationSeconds value is set independently for each transition (from the transitionDurations array), allowing shorter or longer interpolations per segment boundary.

What You Get

  • Multi-segment video – Chain any number of video segments into a single continuous output
  • AI transitions – Per-transition configurable duration (4, 6, or 8 seconds each) generated by Veo 3.1 frame interpolation (not simple crossfades)
  • Parallel generation – Main clips are generated concurrently for faster pipeline execution
  • Mixed image inputs – Each segment's image field accepts a Buffer, file path, URL, or ImageWithAltText
  • Consistent settings – Resolution, aspect ratio, and audio settings apply uniformly across all segments and transitions
  • Per-segment customization – Each segment is a self-contained { prompt, image } object
  • SDK only – Use programmatically via generate() (CLI not supported for Director Mode)

Supported Provider & Model

ProviderModelInterpolation SupportTransition DurationMax Segments
vertexveo-3.1 (clips) / veo-3.1-fast (transitions)First + Last Frame4-8s per transition10

Note: The lastFrame parameter is supported by veo-2.0-generate-001, veo-3.1-generate-001, and veo-3.1-fast-generate-001. NeuroLink uses veo-3.1-generate-001 for main clips and veo-3.1-fast-generate-001 for transition clips (faster generation with minimal quality difference for short interpolations).

Prerequisites

Same as Video Generation prerequisites, plus:

  1. Sufficient quota – Director Mode generates N + (N-1) video operations (N clips + N-1 transitions). Ensure your Vertex AI project has adequate quota.
  2. Adequate timeout – Multi-segment generation takes proportionally longer. Set timeout accordingly (recommended: 5-10 minutes for 3+ segments).

Quick Start

SDK Usage

import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";

const neurolink = new NeuroLink();

// Director Mode: define segments → merged video
const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Camera slowly pans across the product on a white table",
image: await readFile("./scene1.jpg"),
},
{
prompt: "Dynamic zoom into product details with dramatic lighting",
image: await readFile("./scene2-detail.jpg"),
},
{
prompt: "Wide shot pulling back to reveal the full scene",
image: await readFile("./scene3-wide.jpg"),
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: {
resolution: "720p",
length: 6, // Per-segment clip duration (reused from VideoOutputOptions)
aspectRatio: "16:9",
audio: true,
},
},
timeout: 600000, // 10 minutes for multi-segment
});

if (result.video) {
await writeFile("director-output.mp4", result.video.data);
console.log(`Total duration: ${result.video.metadata?.duration}s`);
console.log(`Segments: ${result.video.metadata?.segmentCount}`);
}

Using Image URLs

import { NeuroLink } from "@juspay/neurolink";
import { writeFile } from "fs/promises";

const neurolink = new NeuroLink();

const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Serene sunrise over calm waters",
image: "https://example.com/sunrise.jpg",
},
{
prompt: "Waves crashing on a rocky coastline",
image: "https://example.com/coastline.jpg",
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "1080p", length: 8 },
},
timeout: 600000,
});

if (result.video) {
await writeFile("ocean-director.mp4", result.video.data);
}

Mixed Input Types

Each segment's image field accepts a Buffer, file path, URL, or ImageWithAltText:

const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Product reveal from shadow to light",
image: await readFile("./product-dark.jpg"), // Buffer
},
{
prompt: "360-degree rotation showcasing all angles",
image: "https://cdn.example.com/product-turntable.png", // URL
},
{
prompt: "Final hero shot with brand overlay",
image: { data: await readFile("./hero.jpg"), altText: "Hero" }, // ImageWithAltText
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "1080p", length: 6, aspectRatio: "16:9" },
},
});

Note: Director Mode is SDK-only. CLI support is not available for this generation type. Use the standard --outputMode video CLI flags for single-clip video generation.

Comprehensive Examples

Example 1: Product Commercial (3 Segments)

import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";

const neurolink = new NeuroLink();

const result = await neurolink.generate({
input: {
segments: [
{
prompt:
"Dramatic reveal: camera sweeps up from a dark surface to unveil the product under a spotlight",
image: await readFile("./product-dark.jpg"),
},
{
prompt:
"Close-up detail shot: camera slowly orbits the product, focusing on texture and craftsmanship",
image: await readFile("./product-detail.jpg"),
},
{
prompt:
"Lifestyle context: camera pulls back to show the product in an elegant room setting",
image: await readFile("./product-lifestyle.jpg"),
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: {
resolution: "1080p",
length: 8,
aspectRatio: "16:9",
audio: true,
},
// Director-specific options
director: {
transitionPrompts: [
"Elegant dissolve with subtle camera drift",
"Smooth pull-back revealing the wider scene",
],
transitionDurations: [4, 6], // Per-transition: first transition 4s, second 6s
},
},
timeout: 600000,
});

if (result.video) {
await writeFile("product-commercial.mp4", result.video.data);

console.log("Director Mode output:", {
totalDuration: result.video.metadata?.duration, // ~34s (3×8s + 4s + 6s)
segmentCount: result.video.metadata?.segmentCount, // 3
transitionCount: result.video.metadata?.transitionCount, // 2
resolution: result.video.metadata?.dimensions,
fileSize: `${(result.video.data.length / 1024 / 1024).toFixed(1)} MB`,
});
}

Example 2: Social Media Story (Portrait, 4 Segments)

import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";

const neurolink = new NeuroLink();

const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Morning coffee being poured in slow motion",
image: await readFile("./coffee.jpg"),
},
{
prompt: "Hands wrapping a gift box with a ribbon",
image: await readFile("./wrapping.jpg"),
},
{
prompt: "Gift box placed on a doorstep, camera tilts up",
image: await readFile("./doorstep.jpg"),
},
{
prompt: "Recipient opens door, reaction shot",
image: await readFile("./reaction.jpg"),
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: {
resolution: "1080p",
length: 4,
aspectRatio: "9:16", // Portrait for stories/reels
audio: true,
},
director: {
transitionPrompts: [
"Quick, energetic swipe transition",
"Fast zoom through a blur into the next scene",
"Snap cut with motion blur connecting the moments",
],
transitionDurations: [4, 6, 8], // Each transition can have its own duration
},
},
timeout: 900000,
});

if (result.video) {
await writeFile("story.mp4", result.video.data);
// Total: 4×4s clips + transitions (4s + 6s + 8s) = 34 seconds
}

Example 3: AI-Driven Storyboard

import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";

const neurolink = new NeuroLink();

// Step 1: Use AI to generate a storyboard from a concept
const storyboard = await neurolink.generate({
input: {
text: `Create a 3-scene storyboard for a 30-second product commercial for a luxury watch.
Return a JSON array of objects with "scene" (number), "prompt" (video generation prompt),
and "imageDescription" (what the input image should show).`,
},
provider: "vertex",
model: "gemini-2.5-flash",
output: { format: "json" },
});

const scenes = JSON.parse(storyboard.content);

// Step 2: Generate video using the AI storyboard
const watchImages = [
await readFile("./watch-closeup.jpg"),
await readFile("./watch-wrist.jpg"),
await readFile("./watch-lifestyle.jpg"),
];

const result = await neurolink.generate({
input: {
segments: scenes.map((s: { prompt: string }, i: number) => ({
prompt: s.prompt,
image: watchImages[i],
})),
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "1080p", length: 8, aspectRatio: "16:9" },
director: {
transitionPrompts: [
"Cinematic slow dissolve with depth of field shift",
"Smooth pan transitioning to the next scene",
],
},
},
timeout: 600000,
});

if (result.video) {
await writeFile("ai-storyboard.mp4", result.video.data);
}

Example 4: Batch Director Mode

import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile, readdir } from "fs/promises";
import path from "path";

type StoryConfig = {
name: string;
segments: Array<{ prompt: string; imagePath: string }>;
};

async function batchDirectorGenerate(stories: StoryConfig[]) {
const neurolink = new NeuroLink();
const results = [];

for (const story of stories) {
console.log(`Generating: ${story.name}`);

try {
const segments = await Promise.all(
story.segments.map(async (seg) => ({
prompt: seg.prompt,
image: await readFile(seg.imagePath),
})),
);

const result = await neurolink.generate({
input: {
segments,
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "720p", length: 6 },
},
timeout: 600000,
});

if (result.video) {
const outputPath = `./output/${story.name}.mp4`;
await writeFile(outputPath, result.video.data);
results.push({ name: story.name, output: outputPath, success: true });
}
} catch (error) {
results.push({
name: story.name,
success: false,
error: error instanceof Error ? error.message : String(error),
});
}
}

return results;
}

// Usage
const results = await batchDirectorGenerate([
{
name: "product-A",
segments: [
{ prompt: "Hero reveal", imagePath: "./a-hero.jpg" },
{ prompt: "Feature showcase", imagePath: "./a-feature.jpg" },
{ prompt: "Call to action", imagePath: "./a-cta.jpg" },
],
},
{
name: "product-B",
segments: [
{ prompt: "Unboxing experience", imagePath: "./b-unbox.jpg" },
{ prompt: "In-use demonstration", imagePath: "./b-demo.jpg" },
],
},
]);

console.table(results);

Example 5: Error Handling in Director Mode

⚠️ Full-job retry warning: The generateWithRetry function below retries the entire neurolink.generate() call on any retriable VideoError. This means all segments and transitions are re-generated from scratch, incurring full cost each attempt ($10-60+ depending on settings). This is appropriate only for transient failures (e.g., rate limits) where partial results are not recoverable.

Note that Director Mode already handles transition failures gracefully — failed transitions fall back to hard cuts rather than failing the pipeline (see Partial Failure Handling). Only fatal errors like DIRECTOR_CLIP_FAILED or DIRECTOR_MERGE_FAILED propagate as VideoError. Keep this in mind when deciding whether a full-job retry is warranted.

Preferred approach: Once per-segment resume semantics are available, prefer retrying at the clip/transition level rather than re-running the entire pipeline. Until then, if you use full-job retry, keep maxRetries low (1-2) and restrict retries to rate-limit or timeout errors to control costs.

import { NeuroLink, VideoError } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";

const neurolink = new NeuroLink();

// WARNING: Each retry re-runs ALL segments via neurolink.generate(),
// incurring full API cost. See note above.
async function generateWithRetry(maxRetries = 2) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Product introduction",
image: await readFile("./intro.jpg"),
},
{
prompt: "Feature highlight",
image: await readFile("./feature.jpg"),
},
{
prompt: "Brand closing",
image: await readFile("./closing.jpg"),
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "720p", length: 6 },
},
timeout: 600000,
});

if (result.video) {
await writeFile("output.mp4", result.video.data);
return result;
}

throw new Error("No video in result");
} catch (error) {
if (error instanceof VideoError) {
console.error(
`Attempt ${attempt} failed [${error.code}]:`,
error.message,
);

// Don't retry configuration or validation errors
if (
error.category === "configuration" ||
error.category === "validation" ||
error.category === "permission"
) {
throw error;
}

// Retry on rate limits and transient failures only.
// Be aware: this re-runs the full Director pipeline (all segments + transitions).
if (error.retriable && attempt < maxRetries) {
const backoff = Math.pow(2, attempt) * 5000;
console.log(
`Retrying entire Director pipeline in ${backoff / 1000}s (attempt ${attempt + 1}/${maxRetries})...`,
);
await new Promise((r) => setTimeout(r, backoff));
continue;
}
}

throw error;
}
}
}

Type Definitions

Director Mode Input (Extended GenerateOptions)

Director Mode introduces a DirectorSegment type and adds a segments field to GenerateOptions.input:

/**
* A single segment in Director Mode, representing one video clip.
*/
type DirectorSegment = {
/** Prompt describing the video content for this segment */
prompt: string;
/** Input image for this segment (Buffer, URL string, file path, or ImageWithAltText) */
image: Buffer | string | ImageWithAltText;
};

type GenerateOptions = {
input: {
/** Prompt for standard (single-clip) mode */
text: string;

/** Standard mode images */
images?: Array<Buffer | string | ImageWithAltText>;

/**
* Director Mode segments. When provided, Director Mode is activated automatically.
* Each segment contains its own prompt and image — no need for separate text/images arrays.
* Must contain 2-10 segments.
*/
segments?: DirectorSegment[];

// ... other existing fields
};

output?: {
mode?: "text" | "video" | "ppt";

/**
* Video output options. In Director Mode, `video.length` controls the
* per-segment clip duration (4, 6, or 8 seconds). There is no separate
* `segmentDurationSeconds` field — this single field applies uniformly
* to all segments to avoid duplication and ambiguity.
*/
video?: VideoOutputOptions;

/** Director Mode configuration (only used when input.segments is provided) */
director?: DirectorModeOptions;
};

// ... other existing fields
};

DirectorModeOptions

type DirectorModeOptions = {
/**
* Prompts for generating transition clips (array of N-1 entries for N segments).
* transitionPrompts[i] is used for the transition between segment i and segment i+1.
* If provided, must contain exactly N-1 prompts where N is the number of segments.
*
* **When omitted:** The pipeline auto-generates a default prompt for each transition:
* `"Smooth cinematic transition between scenes"`. This produces a generic but
* visually coherent interpolation. For narrative-driven videos, explicit prompts
* that describe the desired camera movement or visual flow are recommended.
*/
transitionPrompts?: string[];

/**
* Duration of each transition clip in seconds (array of N-1 entries for N segments).
* transitionDurations[i] sets the duration for the transition between segment i and segment i+1.
* Each value must be 4, 6, or 8 (4 recommended for seamless feel).
* If omitted, all transitions default to 4 seconds.
* @default [4, 4, ...] (all 4s)
*/
transitionDurations?: Array<4 | 6 | 8>;

/**
* Maximum number of clips to generate concurrently (1-3).
* Lower values reduce API load; higher values speed up generation.
* @default 3
*/
concurrency?: 1 | 2 | 3;
};

Extended VideoGenerationResult (Director Mode)

type VideoGenerationResult = {
data: Buffer;
mediaType: "video/mp4" | "video/webm";
metadata?: {
// Standard fields
duration?: number;
dimensions?: { width: number; height: number };
model?: string;
provider?: string;
aspectRatio?: string;
audioEnabled?: boolean;
processingTime?: number;

// Director Mode fields (present when Director Mode is used)
/** Number of main segments in the video */
segmentCount?: number;
/** Number of transition clips generated */
transitionCount?: number;
/** Duration of each main clip in seconds */
clipDuration?: number;
/** Durations of each transition in seconds (one per transition) */
transitionDurations?: number[];
/** Per-segment metadata */
segments?: Array<{
index: number;
duration: number;
processingTime: number;
}>;
/** Per-transition metadata */
transitions?: Array<{
fromSegment: number;
toSegment: number;
duration: number;
processingTime: number;
}>;
};
};

Architecture & Implementation

Pipeline Flow

User Input (N segments, each with prompt + image)


┌─────────────────────────────────┐
│ 1. Validation │
│ - Validate each segment has │
│ prompt and image │
│ - Enforce segment limit (≤10) │
└─────────────────┬───────────────┘


┌─────────────────────────────────┐
│ 2. Parallel Clip Generation │
│ - Generate N clips via │
│ generateVideoWithVertex() │
│ - Respect concurrency limit │
└─────────────────┬───────────────┘


┌─────────────────────────────────┐
│ 3. Frame Extraction │
│ - Extract last frame from │
│ clip[i] (decode MP4 → JPEG) │
│ - Extract first frame from │
│ clip[i+1] │
└─────────────────┬───────────────┘


┌─────────────────────────────────┐
│ 4. Transition Generation │
│ - For each pair (i, i+1): │
│ Call Veo with image (last │
│ frame) + lastFrame (first │
│ frame of next clip) │
│ - Per-transition duration │
│ - Sequential (depends on step │
│ 3 output) │
└─────────────────┬───────────────┘


┌─────────────────────────────────┐
│ 5. Video Merge │
│ - Concatenate: │
│ clip1 + trans1 + clip2 + │
│ trans2 + ... + clipN │
│ - Re-mux to single MP4 │
└─────────────────┬───────────────┘


VideoGenerationResult
(merged buffer + metadata)

Dependency DAG

The pipeline has a per-pair dependency structure — each transition depends only on its two adjacent clips, not on all clips globally. Understanding this DAG is essential for maximizing parallelism without race conditions:

Phase 1 – Clip Generation (parallel, subject to concurrency limit):

Clip₁ Clip₂ Clip₃ ... ClipN
│ │ │ │
▼ ▼ ▼ ▼

Phase 2 – Frame Extraction (per-clip, runs as soon as a clip completes):

lastFrame₁ firstFrame₂ lastFrame₂ firstFrame₃ ... firstFrameN
│ │ │ │ │
└──────┬──────┘ └──────┬──────┘ │
▼ ▼ │

Phase 3 – Transition Generation (per-pair, each depends on its two boundary frames):

(once Clip₁ & Clip₂ done) (once Clip₂ & Clip₃ done)
Trans₁₋₂ Trans₂₋₃ ... Trans₍N₋₁₎₋N
│ │ │
└───────────┬───────────────┘───────────────────────┘


Phase 4 – Sequential Merge (must wait for ALL clips and transitions):

Clip₁ → Trans₁₋₂ → Clip₂ → Trans₂₋₃ → Clip₃ → ... → ClipN


Final Merged Video

Key constraint: Each transition Trans₍ᵢ₎₋₍ᵢ₊₁₎ depends only on Clip₍ᵢ₎ and Clip₍ᵢ₊₁₎ — specifically, the last frame of Clip₍ᵢ₎ and the first frame of Clip₍ᵢ₊₁₎. Once those two clips complete and their boundary frames are extracted (Phase 2), the transition may begin independently, without waiting for other clips to finish. Multiple transitions whose input clips are ready can run in parallel (subject to the concurrency limit). Phase 4 (merge) remains strictly sequential and must wait for all clips and transitions to complete before concatenation.

Technology Dependencies

Frame Extraction (frameExtractor.ts)

Frame extraction supports two strategies — a native FFmpeg binary (recommended) and an experimental @ffmpeg/ffmpeg WASM path:

  • Native binary (recommended): Delegates to a system-installed or bundled ffmpeg binary. This is the most reliable option for traditional servers, containers, and most serverless runtimes (e.g., AWS Lambda layers). Detection is configurable via:
    • FFMPEG_PATH environment variable (explicit path to the binary), or
    • NeuroLink config option video.ffmpegPath, or
    • Automatic PATH lookup as a last resort.
  • WASM path (experimental): Uses @ffmpeg/ffmpeg (FFmpeg compiled to WebAssembly) for environments where a native binary is unavailable. Caveats:
    • Bundle size: The WASM binary adds ~25-30 MB to deployment artifacts.
    • Startup overhead: First invocation incurs a cold-start penalty (~1-3s) for WASM compilation.
    • Runtime compatibility: Not all edge runtimes support WASM with sufficient memory (e.g., Cloudflare Workers has a 128 MB limit). Verify compatibility with your target platform before relying on this path.
    • Node.js threading: @ffmpeg/ffmpeg ≥0.12 requires SharedArrayBuffer, which needs --experimental-shared-memory or specific HTTP headers (Cross-Origin-Embedder-Policy) in some environments.
  • Operation: Decodes MP4 → seeks to target frame → encodes to JPEG.
  • Performance: First/last frame extraction from a 4-8s clip completes in <100ms (native) or <500ms (WASM).

Video Merging (videoMerger.ts)

Video concatenation uses the same dual-strategy approach (native binary preferred, WASM experimental):

  • Method: FFmpeg concat demuxer for lossless MP4 concatenation (no re-encoding when codecs match).
  • Operation: Creates a concat list → runs ffmpeg -f concat -safe 0 -i list.txt -c copy output.mp4.
  • Re-encoding fallback: If clips have mismatched codecs (unlikely since all come from Veo), falls back to re-encoding with H.264.
  • WASM note: The WASM runtime handles file I/O via an in-memory filesystem (FS), avoiding disk writes — but memory usage scales with total video size, which can be significant for multi-segment merges.

Dependency: @ffmpeg/ffmpeg and @ffmpeg/util are optional peer dependencies for the experimental WASM path. Install with pnpm add @ffmpeg/ffmpeg @ffmpeg/util only if you cannot provide a native FFmpeg binary. A native ffmpeg binary (installed via your OS package manager, Docker layer, or Lambda layer) is the recommended approach for production deployments.

Before committing to an FFmpeg strategy, validate your chosen implementation with real-world testing on your target runtime (Lambda, ECS, edge, etc.) to confirm compatibility, cold-start behavior, and memory limits.


### Transition Generation: Veo API Request

Each transition clip uses the **first-and-last-frame** Veo endpoint. The request body includes both `image` (first frame = last frame of previous clip) and `lastFrame` (last frame = first frame of next clip):

```typescript
const transitionRequestBody = {
instances: [
{
prompt: transitionPrompts[i], // i-th transition prompt (0-indexed)
image: {
bytesBase64Encoded: lastFrameOfClipN, // Last frame of preceding clip
mimeType: "image/jpeg",
},
lastFrame: {
bytesBase64Encoded: firstFrameOfClipN1, // First frame of following clip
mimeType: "image/jpeg",
},
},
],
parameters: {
sampleCount: 1,
durationSeconds: transitionDurations[i], // Per-transition duration (4, 6, or 8)
aspectRatio: "16:9", // Matches main clips
resolution: "720p", // Matches main clips
generateAudio: true, // Matches main clips
},
};

This uses the same predictLongRunningfetchPredictOperation polling flow as standard video generation, with the addition of the lastFrame field.

Implementation Files

FilePurpose
src/lib/adapters/video/vertexVideoHandler.tsExtended with generateTransitionWithVertex() and lastFrame support
src/lib/adapters/video/directorPipeline.tsDirector Mode orchestrator: clip generation, frame extraction, transition generation, merging
src/lib/adapters/video/frameExtractor.tsExtract first/last frames from MP4 buffers via @ffmpeg/ffmpeg (WASM); falls back to native FFmpeg binary when available
src/lib/adapters/video/videoMerger.tsConcatenate video buffers into single MP4 via @ffmpeg/ffmpeg concat demuxer (lossless when codecs match)
src/lib/types/multimodal.tsDirectorSegment, DirectorModeOptions type definitions
src/lib/types/generateTypes.tsExtended GenerateOptions input with segments field
src/lib/core/baseProvider.tsDirector Mode detection and routing in handleVideoGeneration()
src/lib/utils/parameterValidation.tsvalidateDirectorModeInput() validation

Key Functions

  • generateTransitionWithVertex(firstFrame, lastFrame, prompt, options) – Generates a transition clip using Veo 3.1 Fast's first-and-last-frame API
  • extractFirstFrame(videoBuffer) – Extracts the first frame from a video buffer as JPEG
  • extractLastFrame(videoBuffer) – Extracts the last frame from a video buffer as JPEG
  • mergeVideoBuffers(buffers) – Concatenates multiple MP4 buffers into one
  • executeDirectorPipeline(segments, options) – Full Director Mode orchestrator
  • validateDirectorModeInput(options) – Validates segment structure, count, image types, etc.

Configuration & Best Practices

Duration Calculation

Note: In Director Mode, output.video.length controls the duration of each main segment clip. There is no separate segmentDurationSeconds field — the existing VideoOutputOptions.length is reused to avoid duplication. All segments share the same clip duration; per-segment duration variance is not currently supported (use different Director Mode calls if needed).

Each transition can have its own duration, so the total is the sum of all clip durations plus the sum of all individual transition durations:

SegmentsClip DurationTransition DurationsTotal Duration
26s[4s]16s (2×6 + 4)
38s[4s, 6s]34s (3×8 + 4 + 6)
44s[4s, 6s, 8s]34s (4×4 + 4 + 6 + 8)
56s[4s, 6s, 4s, 8s]52s (5×6 + 4 + 6 + 4 + 8)
NDs[T₁, T₂, …, T₍ₙ₋₁₎]N×D + Σ Tᵢ

API Call Count

SegmentsMain ClipsTransition ClipsTotal API Calls
2213
3325
5549
1010919

Worst-Case Analysis (10 Segments at Maximum Settings)

The 10-segment limit balances capability with practical constraints:

MetricValueCalculation
Total API calls1910 clips + 9 transitions
Wall-clock time (concurrency=3)~18 minutesceil(10/3) × 3min (clips) + ceil(9/3) × 2min (transitions)
Wall-clock time (concurrency=1)~48 minutes10 × 3min + 9 × 2min (fully sequential)
Total video duration~152 seconds10 × 8s (clips) + 9 × 8s (transitions, worst case)
Burst quota required3-10 concurrentDepends on concurrency setting

Why 10? Beyond 10 segments, single-pipeline wall-clock time exceeds 30 minutes and costs grow proportionally. For longer productions, chain multiple Director Mode calls and concatenate the outputs externally, or use the upcoming Batch Director API.

Best Practices

1. Prompt Engineering for Transitions

// ❌ Generic / unhelpful
const badTransition = "Make a transition";

// ✅ Describe the camera movement or visual flow
const goodTransition =
"Camera smoothly drifts right, transitioning between scenes";

// ✅ Match the scene context
const contextualTransition =
"Focus shifts from foreground to background as light changes";

// ✅ Use per-transition prompts for narrative coherence
const perTransition = [
"Camera follows a path from the garden into the house",
"Time-lapse of light changing from day to dusk",
];

2. Image Preparation for Smooth Transitions

// Best results when adjacent segments share visual characteristics:
// - Similar color palette
// - Compatible lighting conditions
// - Related subject matter

// The Veo interpolation works best when:
// 1. Both frames have similar aspect ratios to the target
// 2. The visual distance between frames is moderate (not extreme jumps)
// 3. Images are well-exposed and sharp

3. Timeout Configuration

// Rule of thumb: ~2-3 minutes per video generation call
// For N segments: timeout ≥ (N + (N-1)) × 180000 ms (worst case sequential)
// With concurrency=3: timeout ≥ ceil(N/3) × 180000 + (N-1) × 180000

// 3 segments: ~10 minutes
const threeSegments = { timeout: 600000 };

// 5 segments: ~15 minutes
const fiveSegments = { timeout: 900000 };

// 10 segments: ~30 minutes
const tenSegments = { timeout: 1800000 };

4. Cost Optimization

StrategyImpactTrade-off
Use 720p for drafts~20% lower costLower visual quality
Use 4s clips for previews~50% lower costShorter segments
Limit to 3-5 segmentsFewer API callsShorter total video
Use veo-3.1-fast for main clips tooFaster generationSlightly lower quality
Reduce concurrency to 1Lower burst quotaLonger wall-clock time

Pricing reference: Look up current per-second rates for Veo 3.1 (main clips) and Veo 3.1 Fast (transitions) on the Vertex AI Generative AI pricing page. Rates vary by resolution (720p vs 1080p) and model variant.

Error Handling & Validation

Director Mode Validation Rules

ParameterValidationError Message
input.segmentsMust be array with 2-10 entriesDirector Mode requires 2-10 segments
input.segments[i].promptMust be a non-empty stringSegment X requires a non-empty prompt
input.segments[i].imageMust be Buffer, string (URL/path), or ImageWithAltTextSegment X requires a valid image (Buffer, URL, path, or ImageWithAltText)
transitionPromptsOptional; if provided, length must be N-1Expected X transition prompts, got Y
transitionDurationsOptional; if provided, array of N-1 values, each 4, 6, or 8Expected N-1 transition durations, got X / Invalid transition duration at index X. Use 4, 6, or 8
director.concurrencyMust be 1-3Concurrency must be between 1 and 3
Segment limitMax 10 segmentsDirector Mode supports up to 10 segments

Partial Failure Handling

Director Mode uses a differentiated failure strategy depending on which pipeline stage fails:

Failure TypeBehaviorRationale
Main clip generation failsPipeline fails immediately with DIRECTOR_CLIP_FAILED error. Returns metadata about which segments succeeded (for debugging).A missing segment cannot be meaningfully recovered — the final video would have a gap.
Frame extraction failsRetry extraction once. If retry fails, skip the affected transition and fall back to a hard cut.Frame extraction is a local CPU operation; transient failures are rare but possible with corrupted buffers.
Transition generation failsSkip the failed transition and concatenate adjacent clips directly (hard cut). Log a warning.A missing transition degrades quality but produces a valid video. The user can re-run with a simpler transition prompt.
Video merge failsPipeline fails with DIRECTOR_MERGE_FAILED error. Returns individual clip buffers in error.context.clipBuffers for manual recovery.Merge failure is non-recoverable within the pipeline, but individual clips are still valuable.

Design rationale: Main clip failures are fatal because there's no sensible way to fill a segment gap. Transition failures are non-fatal because a hard cut (direct concatenation) is a valid — if less polished — editing technique. This mirrors how professional video editors treat transitions as optional polish, not structural requirements.

Partial result metadata: When transitions fall back to hard cuts, the result metadata indicates which transitions were skipped:

// If a transition clip fails, the pipeline can:
// 1. Skip the transition (hard cut between clips)
// 2. Retry the transition with a simpler prompt
// 3. Fail the entire operation (default for main clip failures)

const result = await neurolink.generate({
input: {
segments: [
{ prompt: "Scene 1", image: img1 },
{ prompt: "Scene 2", image: img2 },
{ prompt: "Scene 3", image: img3 },
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "720p", length: 6 },
director: {
transitionPrompts: [
"Smooth cinematic transition between scenes",
"Gentle camera drift connecting the moments",
],
},
},
});

// If transition 1→2 fails but clips succeed, the result may contain
// a hard cut at that point. Check metadata for details:
if (result.video?.metadata?.transitions) {
for (const tx of result.video.metadata.transitions) {
if (!tx.duration) {
console.warn(
`Transition ${tx.fromSegment}${tx.toSegment} used hard cut`,
);
}
}
}

Error Types

// Director-specific error codes (extend VIDEO_ERROR_CODES)
const DIRECTOR_ERROR_CODES = {
/** Invalid segment structure (missing prompt or image) */
SEGMENT_MISMATCH: "DIRECTOR_SEGMENT_MISMATCH",
/** Too many segments requested */
SEGMENT_LIMIT_EXCEEDED: "DIRECTOR_SEGMENT_LIMIT_EXCEEDED",
/** A main clip generation call failed (fatal) */
CLIP_FAILED: "DIRECTOR_CLIP_FAILED",
/** Frame extraction from clip failed */
FRAME_EXTRACTION_FAILED: "DIRECTOR_FRAME_EXTRACTION_FAILED",
/** Transition clip generation failed (non-fatal, falls back to hard cut) */
TRANSITION_FAILED: "DIRECTOR_TRANSITION_FAILED",
/** Video merge/concatenation failed */
MERGE_FAILED: "DIRECTOR_MERGE_FAILED",
/** Pipeline timeout (overall) */
PIPELINE_TIMEOUT: "DIRECTOR_PIPELINE_TIMEOUT",
};

Comparison: Standard vs Director Mode

FeatureStandard Video GenerationDirector Mode
Input formatinput.text + input.imagesinput.segments array (2-10 DirectorSegment objects)
OutputSingle clip (4-8s)Merged multi-segment video
TransitionsN/AAI-generated clips with per-transition duration (4-8s)
API calls1N + (N-1) calls
Veo API featureimage onlyimage + lastFrame
Processing time1-3 minutes5-30 minutes (depends on segment count)
Max duration8 seconds~152s (10×8s clips + 9×8s transitions max)
ConcurrencyN/AUp to 3 parallel clip generations
Error recoveryAll-or-nothingFatal for clips, fallback hard cuts for transitions

Troubleshooting

SymptomCauseSolution
"Segment mismatch" errorMissing prompt or image in a segmentEnsure each segment has both prompt and image
Transition looks jarringLarge visual gap between adjacent clipsUse visually similar images; improve transition prompt
Pipeline timeoutToo many segments or high resolutionReduce segment count, use 720p, or increase timeout
Rate limit errorsToo many concurrent API callsReduce director.concurrency to 1-2
Frame extraction failsCorrupted video bufferRetry the failed clip generation
Audio discontinuity at transitionsEach clip has independently generated audioExpected behavior — transition clips bridge the audio gap
"Segment limit exceeded"More than 10 segments providedSplit into multiple Director Mode calls
High costMany high-resolution segmentsUse 720p and 4s clips for drafts, upgrade for final output

Debug Mode

// Enable debug logging to trace the Director pipeline
const neurolink = new NeuroLink({
debug: true,
logLevel: "verbose",
});

// Or via environment variable
// export NEUROLINK_DEBUG=true

// Debug output shows:
// - Segment validation results
// - Per-clip generation start/completion
// - Frame extraction timing
// - Transition generation details
// - Merge operation status

Limitations

LimitationDescriptionWorkaround
Max 10 segmentsAPI and processing constraintsChain multiple Director Mode calls
Fixed transition modelTransitions always use veo-3.1-fast; not configurableN/A
No custom audioAudio is AI-generated for each clip independentlyPost-process with external audio editing tools
Sequential transitionsTransitions must wait for clip frames to be extractedInherent to the pipeline (frames depend on clips)
MP4 output onlyMerged output is always MP4Convert with ffmpeg post-generation if needed
Vertex AI onlyVeo models are Vertex-exclusiveNo alternative providers currently
Processing timeMulti-segment is inherently slowerUse concurrency and lower settings for drafts

Next: Video Generation | Multimodal Chat