Video Director Mode – Multi-Clip Generation & Merging
Video Director Mode
Director Mode extends NeuroLink's video generation capability to produce multi-segment videos with seamless AI-generated transitions. Instead of a single clip, you define an array of segments — each with its own prompt and image — and NeuroLink orchestrates the full pipeline: generating each clip, extracting boundary frames, producing transition videos (with individually configurable durations) using Veo 3.1's first-and-last-frame interpolation, and merging everything into one continuous video.
Overview
Director Mode is triggered automatically when you supply an input.segments array to the video generation API. Each segment is a self-documenting { prompt, image } object, mapping cleanly to the pipeline concept of ordered video segments.
graph TD
A["Segment 1: Image A + Prompt 1"] --> G1["Generate Clip 1 (4-8s)"]
B["Segment 2: Image B + Prompt 2"] --> G2["Generate Clip 2 (4-8s)"]
C["Segment 3: Image C + Prompt 3"] --> G3["Generate Clip 3 (4-8s)"]
G1 --> F1["Extract Last Frame of Clip 1"]
G2 --> F2a["Extract First Frame of Clip 2"]
G2 --> F2b["Extract Last Frame of Clip 2"]
G3 --> F3["Extract First Frame of Clip 3"]
F1 --> T1["Generate Transition 1→2"]
F2a --> T1
F2b --> T2["Generate Transition 2→3"]
F3 --> T2
G1 --> M["Merge: Clip1 + Trans1 + Clip2 + Trans2 + Clip3"]
T1 --> M
G2 --> M
T2 --> M
G3 --> M
M --> O["Final Merged Video (MP4)"]
How It Works
- Parallel clip generation – All main clips are generated concurrently via Veo 3.1's image-to-video endpoint
- Frame extraction – The last frame of clip N and first frame of clip N+1 are extracted from generated video buffers
- Transition generation – Veo 3.1 Fast's first-and-last-frame interpolation API generates a transition between each pair of adjacent clips, with individually configurable duration (4, 6, or 8 seconds each)
- Sequential merge – Clips and transitions are concatenated:
Clip₁ → Trans₁₋₂ → Clip₂ → Trans₂₋₃ → Clip₃ → … - Single output – The merged result is returned as one
VideoGenerationResultbuffer
Key Technology: Veo First-and-Last-Frame Interpolation
The transition clips use Veo 3.1's native lastFrame parameter in the predictLongRunning API. Instead of generating from a single image, you provide two images — the first frame and the last frame — and Veo generates a video that smoothly interpolates between them:
{
"instances": [
{
"prompt": "Smooth cinematic transition",
"image": {
"bytesBase64Encoded": "<LAST_FRAME_OF_CLIP_N>",
"mimeType": "image/jpeg"
},
"lastFrame": {
"bytesBase64Encoded": "<FIRST_FRAME_OF_CLIP_N+1>",
"mimeType": "image/jpeg"
}
}
],
"parameters": {
"sampleCount": 1,
"durationSeconds": 6,
"aspectRatio": "16:9",
"resolution": "720p"
}
}
This produces a physically coherent, AI-generated morph — far superior to simple crossfade or dissolve effects. The durationSeconds value is set independently for each transition (from the transitionDurations array), allowing shorter or longer interpolations per segment boundary.
What You Get
- Multi-segment video – Chain any number of video segments into a single continuous output
- AI transitions – Per-transition configurable duration (4, 6, or 8 seconds each) generated by Veo 3.1 frame interpolation (not simple crossfades)
- Parallel generation – Main clips are generated concurrently for faster pipeline execution
- Mixed image inputs – Each segment's
imagefield accepts a Buffer, file path, URL, orImageWithAltText - Consistent settings – Resolution, aspect ratio, and audio settings apply uniformly across all segments and transitions
- Per-segment customization – Each segment is a self-contained
{ prompt, image }object - SDK only – Use programmatically via
generate()(CLI not supported for Director Mode)
Supported Provider & Model
| Provider | Model | Interpolation Support | Transition Duration | Max Segments |
|---|---|---|---|---|
vertex | veo-3.1 (clips) / veo-3.1-fast (transitions) | First + Last Frame | 4-8s per transition | 10 |
Note: The
lastFrameparameter is supported byveo-2.0-generate-001,veo-3.1-generate-001, andveo-3.1-fast-generate-001. NeuroLink usesveo-3.1-generate-001for main clips andveo-3.1-fast-generate-001for transition clips (faster generation with minimal quality difference for short interpolations).
Prerequisites
Same as Video Generation prerequisites, plus:
- Sufficient quota – Director Mode generates
N + (N-1)video operations (N clips + N-1 transitions). Ensure your Vertex AI project has adequate quota. - Adequate timeout – Multi-segment generation takes proportionally longer. Set
timeoutaccordingly (recommended: 5-10 minutes for 3+ segments).
Quick Start
SDK Usage
import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";
const neurolink = new NeuroLink();
// Director Mode: define segments → merged video
const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Camera slowly pans across the product on a white table",
image: await readFile("./scene1.jpg"),
},
{
prompt: "Dynamic zoom into product details with dramatic lighting",
image: await readFile("./scene2-detail.jpg"),
},
{
prompt: "Wide shot pulling back to reveal the full scene",
image: await readFile("./scene3-wide.jpg"),
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: {
resolution: "720p",
length: 6, // Per-segment clip duration (reused from VideoOutputOptions)
aspectRatio: "16:9",
audio: true,
},
},
timeout: 600000, // 10 minutes for multi-segment
});
if (result.video) {
await writeFile("director-output.mp4", result.video.data);
console.log(`Total duration: ${result.video.metadata?.duration}s`);
console.log(`Segments: ${result.video.metadata?.segmentCount}`);
}
Using Image URLs
import { NeuroLink } from "@juspay/neurolink";
import { writeFile } from "fs/promises";
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Serene sunrise over calm waters",
image: "https://example.com/sunrise.jpg",
},
{
prompt: "Waves crashing on a rocky coastline",
image: "https://example.com/coastline.jpg",
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "1080p", length: 8 },
},
timeout: 600000,
});
if (result.video) {
await writeFile("ocean-director.mp4", result.video.data);
}
Mixed Input Types
Each segment's image field accepts a Buffer, file path, URL, or ImageWithAltText:
const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Product reveal from shadow to light",
image: await readFile("./product-dark.jpg"), // Buffer
},
{
prompt: "360-degree rotation showcasing all angles",
image: "https://cdn.example.com/product-turntable.png", // URL
},
{
prompt: "Final hero shot with brand overlay",
image: { data: await readFile("./hero.jpg"), altText: "Hero" }, // ImageWithAltText
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "1080p", length: 6, aspectRatio: "16:9" },
},
});
Note: Director Mode is SDK-only. CLI support is not available for this generation type. Use the standard
--outputMode videoCLI flags for single-clip video generation.
Comprehensive Examples
Example 1: Product Commercial (3 Segments)
import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: {
segments: [
{
prompt:
"Dramatic reveal: camera sweeps up from a dark surface to unveil the product under a spotlight",
image: await readFile("./product-dark.jpg"),
},
{
prompt:
"Close-up detail shot: camera slowly orbits the product, focusing on texture and craftsmanship",
image: await readFile("./product-detail.jpg"),
},
{
prompt:
"Lifestyle context: camera pulls back to show the product in an elegant room setting",
image: await readFile("./product-lifestyle.jpg"),
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: {
resolution: "1080p",
length: 8,
aspectRatio: "16:9",
audio: true,
},
// Director-specific options
director: {
transitionPrompts: [
"Elegant dissolve with subtle camera drift",
"Smooth pull-back revealing the wider scene",
],
transitionDurations: [4, 6], // Per-transition: first transition 4s, second 6s
},
},
timeout: 600000,
});
if (result.video) {
await writeFile("product-commercial.mp4", result.video.data);
console.log("Director Mode output:", {
totalDuration: result.video.metadata?.duration, // ~34s (3×8s + 4s + 6s)
segmentCount: result.video.metadata?.segmentCount, // 3
transitionCount: result.video.metadata?.transitionCount, // 2
resolution: result.video.metadata?.dimensions,
fileSize: `${(result.video.data.length / 1024 / 1024).toFixed(1)} MB`,
});
}
Example 2: Social Media Story (Portrait, 4 Segments)
import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Morning coffee being poured in slow motion",
image: await readFile("./coffee.jpg"),
},
{
prompt: "Hands wrapping a gift box with a ribbon",
image: await readFile("./wrapping.jpg"),
},
{
prompt: "Gift box placed on a doorstep, camera tilts up",
image: await readFile("./doorstep.jpg"),
},
{
prompt: "Recipient opens door, reaction shot",
image: await readFile("./reaction.jpg"),
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: {
resolution: "1080p",
length: 4,
aspectRatio: "9:16", // Portrait for stories/reels
audio: true,
},
director: {
transitionPrompts: [
"Quick, energetic swipe transition",
"Fast zoom through a blur into the next scene",
"Snap cut with motion blur connecting the moments",
],
transitionDurations: [4, 6, 8], // Each transition can have its own duration
},
},
timeout: 900000,
});
if (result.video) {
await writeFile("story.mp4", result.video.data);
// Total: 4×4s clips + transitions (4s + 6s + 8s) = 34 seconds
}
Example 3: AI-Driven Storyboard
import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";
const neurolink = new NeuroLink();
// Step 1: Use AI to generate a storyboard from a concept
const storyboard = await neurolink.generate({
input: {
text: `Create a 3-scene storyboard for a 30-second product commercial for a luxury watch.
Return a JSON array of objects with "scene" (number), "prompt" (video generation prompt),
and "imageDescription" (what the input image should show).`,
},
provider: "vertex",
model: "gemini-2.5-flash",
output: { format: "json" },
});
const scenes = JSON.parse(storyboard.content);
// Step 2: Generate video using the AI storyboard
const watchImages = [
await readFile("./watch-closeup.jpg"),
await readFile("./watch-wrist.jpg"),
await readFile("./watch-lifestyle.jpg"),
];
const result = await neurolink.generate({
input: {
segments: scenes.map((s: { prompt: string }, i: number) => ({
prompt: s.prompt,
image: watchImages[i],
})),
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "1080p", length: 8, aspectRatio: "16:9" },
director: {
transitionPrompts: [
"Cinematic slow dissolve with depth of field shift",
"Smooth pan transitioning to the next scene",
],
},
},
timeout: 600000,
});
if (result.video) {
await writeFile("ai-storyboard.mp4", result.video.data);
}
Example 4: Batch Director Mode
import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile, readdir } from "fs/promises";
import path from "path";
type StoryConfig = {
name: string;
segments: Array<{ prompt: string; imagePath: string }>;
};
async function batchDirectorGenerate(stories: StoryConfig[]) {
const neurolink = new NeuroLink();
const results = [];
for (const story of stories) {
console.log(`Generating: ${story.name}`);
try {
const segments = await Promise.all(
story.segments.map(async (seg) => ({
prompt: seg.prompt,
image: await readFile(seg.imagePath),
})),
);
const result = await neurolink.generate({
input: {
segments,
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "720p", length: 6 },
},
timeout: 600000,
});
if (result.video) {
const outputPath = `./output/${story.name}.mp4`;
await writeFile(outputPath, result.video.data);
results.push({ name: story.name, output: outputPath, success: true });
}
} catch (error) {
results.push({
name: story.name,
success: false,
error: error instanceof Error ? error.message : String(error),
});
}
}
return results;
}
// Usage
const results = await batchDirectorGenerate([
{
name: "product-A",
segments: [
{ prompt: "Hero reveal", imagePath: "./a-hero.jpg" },
{ prompt: "Feature showcase", imagePath: "./a-feature.jpg" },
{ prompt: "Call to action", imagePath: "./a-cta.jpg" },
],
},
{
name: "product-B",
segments: [
{ prompt: "Unboxing experience", imagePath: "./b-unbox.jpg" },
{ prompt: "In-use demonstration", imagePath: "./b-demo.jpg" },
],
},
]);
console.table(results);
Example 5: Error Handling in Director Mode
⚠️ Full-job retry warning: The
generateWithRetryfunction below retries the entireneurolink.generate()call on any retriableVideoError. This means all segments and transitions are re-generated from scratch, incurring full cost each attempt ($10-60+ depending on settings). This is appropriate only for transient failures (e.g., rate limits) where partial results are not recoverable.Note that Director Mode already handles transition failures gracefully — failed transitions fall back to hard cuts rather than failing the pipeline (see Partial Failure Handling). Only fatal errors like
DIRECTOR_CLIP_FAILEDorDIRECTOR_MERGE_FAILEDpropagate asVideoError. Keep this in mind when deciding whether a full-job retry is warranted.Preferred approach: Once per-segment resume semantics are available, prefer retrying at the clip/transition level rather than re-running the entire pipeline. Until then, if you use full-job retry, keep
maxRetrieslow (1-2) and restrict retries to rate-limit or timeout errors to control costs.
import { NeuroLink, VideoError } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";
const neurolink = new NeuroLink();
// WARNING: Each retry re-runs ALL segments via neurolink.generate(),
// incurring full API cost. See note above.
async function generateWithRetry(maxRetries = 2) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Product introduction",
image: await readFile("./intro.jpg"),
},
{
prompt: "Feature highlight",
image: await readFile("./feature.jpg"),
},
{
prompt: "Brand closing",
image: await readFile("./closing.jpg"),
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "720p", length: 6 },
},
timeout: 600000,
});
if (result.video) {
await writeFile("output.mp4", result.video.data);
return result;
}
throw new Error("No video in result");
} catch (error) {
if (error instanceof VideoError) {
console.error(
`Attempt ${attempt} failed [${error.code}]:`,
error.message,
);
// Don't retry configuration or validation errors
if (
error.category === "configuration" ||
error.category === "validation" ||
error.category === "permission"
) {
throw error;
}
// Retry on rate limits and transient failures only.
// Be aware: this re-runs the full Director pipeline (all segments + transitions).
if (error.retriable && attempt < maxRetries) {
const backoff = Math.pow(2, attempt) * 5000;
console.log(
`Retrying entire Director pipeline in ${backoff / 1000}s (attempt ${attempt + 1}/${maxRetries})...`,
);
await new Promise((r) => setTimeout(r, backoff));
continue;
}
}
throw error;
}
}
}
Type Definitions
Director Mode Input (Extended GenerateOptions)
Director Mode introduces a DirectorSegment type and adds a segments field to GenerateOptions.input:
/**
* A single segment in Director Mode, representing one video clip.
*/
type DirectorSegment = {
/** Prompt describing the video content for this segment */
prompt: string;
/** Input image for this segment (Buffer, URL string, file path, or ImageWithAltText) */
image: Buffer | string | ImageWithAltText;
};
type GenerateOptions = {
input: {
/** Prompt for standard (single-clip) mode */
text: string;
/** Standard mode images */
images?: Array<Buffer | string | ImageWithAltText>;
/**
* Director Mode segments. When provided, Director Mode is activated automatically.
* Each segment contains its own prompt and image — no need for separate text/images arrays.
* Must contain 2-10 segments.
*/
segments?: DirectorSegment[];
// ... other existing fields
};
output?: {
mode?: "text" | "video" | "ppt";
/**
* Video output options. In Director Mode, `video.length` controls the
* per-segment clip duration (4, 6, or 8 seconds). There is no separate
* `segmentDurationSeconds` field — this single field applies uniformly
* to all segments to avoid duplication and ambiguity.
*/
video?: VideoOutputOptions;
/** Director Mode configuration (only used when input.segments is provided) */
director?: DirectorModeOptions;
};
// ... other existing fields
};
DirectorModeOptions
type DirectorModeOptions = {
/**
* Prompts for generating transition clips (array of N-1 entries for N segments).
* transitionPrompts[i] is used for the transition between segment i and segment i+1.
* If provided, must contain exactly N-1 prompts where N is the number of segments.
*
* **When omitted:** The pipeline auto-generates a default prompt for each transition:
* `"Smooth cinematic transition between scenes"`. This produces a generic but
* visually coherent interpolation. For narrative-driven videos, explicit prompts
* that describe the desired camera movement or visual flow are recommended.
*/
transitionPrompts?: string[];
/**
* Duration of each transition clip in seconds (array of N-1 entries for N segments).
* transitionDurations[i] sets the duration for the transition between segment i and segment i+1.
* Each value must be 4, 6, or 8 (4 recommended for seamless feel).
* If omitted, all transitions default to 4 seconds.
* @default [4, 4, ...] (all 4s)
*/
transitionDurations?: Array<4 | 6 | 8>;
/**
* Maximum number of clips to generate concurrently (1-3).
* Lower values reduce API load; higher values speed up generation.
* @default 3
*/
concurrency?: 1 | 2 | 3;
};
Extended VideoGenerationResult (Director Mode)
type VideoGenerationResult = {
data: Buffer;
mediaType: "video/mp4" | "video/webm";
metadata?: {
// Standard fields
duration?: number;
dimensions?: { width: number; height: number };
model?: string;
provider?: string;
aspectRatio?: string;
audioEnabled?: boolean;
processingTime?: number;
// Director Mode fields (present when Director Mode is used)
/** Number of main segments in the video */
segmentCount?: number;
/** Number of transition clips generated */
transitionCount?: number;
/** Duration of each main clip in seconds */
clipDuration?: number;
/** Durations of each transition in seconds (one per transition) */
transitionDurations?: number[];
/** Per-segment metadata */
segments?: Array<{
index: number;
duration: number;
processingTime: number;
}>;
/** Per-transition metadata */
transitions?: Array<{
fromSegment: number;
toSegment: number;
duration: number;
processingTime: number;
}>;
};
};
Architecture & Implementation
Pipeline Flow
User Input (N segments, each with prompt + image)
│
▼
┌─────────────────────────────────┐
│ 1. Validation │
│ - Validate each segment has │
│ prompt and image │
│ - Enforce segment limit (≤10) │
└─────────────────┬───────────────┘
│
▼
┌─────────────────────────────────┐
│ 2. Parallel Clip Generation │
│ - Generate N clips via │
│ generateVideoWithVertex() │
│ - Respect concurrency limit │
└─────────────────┬───────────────┘
│
▼
┌─────────────────────────────────┐
│ 3. Frame Extraction │
│ - Extract last frame from │
│ clip[i] (decode MP4 → JPEG) │
│ - Extract first frame from │
│ clip[i+1] │
└─────────────────┬───────────────┘
│
▼
┌─────────────────────────────────┐
│ 4. Transition Generation │
│ - For each pair (i, i+1): │
│ Call Veo with image (last │
│ frame) + lastFrame (first │
│ frame of next clip) │
│ - Per-transition duration │
│ - Sequential (depends on step │
│ 3 output) │
└─────────────────┬───────────────┘
│
▼
┌─────────────────────────────────┐
│ 5. Video Merge │
│ - Concatenate: │
│ clip1 + trans1 + clip2 + │
│ trans2 + ... + clipN │
│ - Re-mux to single MP4 │
└───────────── ────┬───────────────┘
│
▼
VideoGenerationResult
(merged buffer + metadata)
Dependency DAG
The pipeline has a per-pair dependency structure — each transition depends only on its two adjacent clips, not on all clips globally. Understanding this DAG is essential for maximizing parallelism without race conditions:
Phase 1 – Clip Generation (parallel, subject to concurrency limit):
Clip₁ Clip₂ Clip₃ ... ClipN
│ │ │ │
▼ ▼ ▼ ▼
Phase 2 – Frame Extraction (per-clip, runs as soon as a clip completes):
lastFrame₁ firstFrame₂ lastFrame₂ firstFrame₃ ... firstFrameN
│ │ │ │ │
└──────┬──────┘ └──────┬──────┘ │
▼ ▼ │
Phase 3 – Transition Generation (per-pair, each depends on its two boundary frames):
(once Clip₁ & Clip₂ done) (once Clip₂ & Clip₃ done)
Trans₁₋₂ Trans₂₋₃ ... Trans₍N₋₁₎₋N
│ │ │
└───────────┬───────────────┘───────────────────────┘
▼
Phase 4 – Sequential Merge (must wait for ALL clips and transitions):
Clip₁ → Trans₁₋₂ → Clip₂ → Trans₂₋₃ → Clip₃ → ... → ClipN
│
▼
Final Merged Video
Key constraint: Each transition Trans₍ᵢ₎₋₍ᵢ₊₁₎ depends only on Clip₍ᵢ₎ and Clip₍ᵢ₊₁₎ — specifically, the last frame of Clip₍ᵢ₎ and the first frame of Clip₍ᵢ₊₁₎. Once those two clips complete and their boundary frames are extracted (Phase 2), the transition may begin independently, without waiting for other clips to finish. Multiple transitions whose input clips are ready can run in parallel (subject to the concurrency limit). Phase 4 (merge) remains strictly sequential and must wait for all clips and transitions to complete before concatenation.
Technology Dependencies
Frame Extraction (frameExtractor.ts)
Frame extraction supports two strategies — a native FFmpeg binary (recommended) and an experimental @ffmpeg/ffmpeg WASM path:
- Native binary (recommended): Delegates to a system-installed or bundled
ffmpegbinary. This is the most reliable option for traditional servers, containers, and most serverless runtimes (e.g., AWS Lambda layers). Detection is configurable via:FFMPEG_PATHenvironment variable (explicit path to the binary), or- NeuroLink config option
video.ffmpegPath, or - Automatic PATH lookup as a last resort.
- WASM path (experimental): Uses
@ffmpeg/ffmpeg(FFmpeg compiled to WebAssembly) for environments where a native binary is unavailable. Caveats:- Bundle size: The WASM binary adds ~25-30 MB to deployment artifacts.
- Startup overhead: First invocation incurs a cold-start penalty (~1-3s) for WASM compilation.
- Runtime compatibility: Not all edge runtimes support WASM with sufficient memory (e.g., Cloudflare Workers has a 128 MB limit). Verify compatibility with your target platform before relying on this path.
- Node.js threading:
@ffmpeg/ffmpeg≥0.12 requiresSharedArrayBuffer, which needs--experimental-shared-memoryor specific HTTP headers (Cross-Origin-Embedder-Policy) in some environments.
- Operation: Decodes MP4 → seeks to target frame → encodes to JPEG.
- Performance: First/last frame extraction from a 4-8s clip completes in <100ms (native) or <500ms (WASM).
Video Merging (videoMerger.ts)
Video concatenation uses the same dual-strategy approach (native binary preferred, WASM experimental):
- Method: FFmpeg concat demuxer for lossless MP4 concatenation (no re-encoding when codecs match).
- Operation: Creates a concat list → runs
ffmpeg -f concat -safe 0 -i list.txt -c copy output.mp4. - Re-encoding fallback: If clips have mismatched codecs (unlikely since all come from Veo), falls back to re-encoding with H.264.
- WASM note: The WASM runtime handles file I/O via an in-memory filesystem (
FS), avoiding disk writes — but memory usage scales with total video size, which can be significant for multi-segment merges.
Dependency:
@ffmpeg/ffmpegand@ffmpeg/utilare optional peer dependencies for the experimental WASM path. Install withpnpm add @ffmpeg/ffmpeg @ffmpeg/utilonly if you cannot provide a native FFmpeg binary. A nativeffmpegbinary (installed via your OS package manager, Docker layer, or Lambda layer) is the recommended approach for production deployments.Before committing to an FFmpeg strategy, validate your chosen implementation with real-world testing on your target runtime (Lambda, ECS, edge, etc.) to confirm compatibility, cold-start behavior, and memory limits.
### Transition Generation: Veo API Request
Each transition clip uses the **first-and-last-frame** Veo endpoint. The request body includes both `image` (first frame = last frame of previous clip) and `lastFrame` (last frame = first frame of next clip):
```typescript
const transitionRequestBody = {
instances: [
{
prompt: transitionPrompts[i], // i-th transition prompt (0-indexed)
image: {
bytesBase64Encoded: lastFrameOfClipN, // Last frame of preceding clip
mimeType: "image/jpeg",
},
lastFrame: {
bytesBase64Encoded: firstFrameOfClipN1, // First frame of following clip
mimeType: "image/jpeg",
},
},
],
parameters: {
sampleCount: 1,
durationSeconds: transitionDurations[i], // Per-transition duration (4, 6, or 8)
aspectRatio: "16:9", // Matches main clips
resolution: "720p", // Matches main clips
generateAudio: true, // Matches main clips
},
};
This uses the same predictLongRunning → fetchPredictOperation polling flow as standard video generation, with the addition of the lastFrame field.
Implementation Files
| File | Purpose |
|---|---|
src/lib/adapters/video/vertexVideoHandler.ts | Extended with generateTransitionWithVertex() and lastFrame support |
src/lib/adapters/video/directorPipeline.ts | Director Mode orchestrator: clip generation, frame extraction, transition generation, merging |
src/lib/adapters/video/frameExtractor.ts | Extract first/last frames from MP4 buffers via @ffmpeg/ffmpeg (WASM); falls back to native FFmpeg binary when available |
src/lib/adapters/video/videoMerger.ts | Concatenate video buffers into single MP4 via @ffmpeg/ffmpeg concat demuxer (lossless when codecs match) |
src/lib/types/multimodal.ts | DirectorSegment, DirectorModeOptions type definitions |
src/lib/types/generateTypes.ts | Extended GenerateOptions input with segments field |
src/lib/core/baseProvider.ts | Director Mode detection and routing in handleVideoGeneration() |
src/lib/utils/parameterValidation.ts | validateDirectorModeInput() validation |
Key Functions
generateTransitionWithVertex(firstFrame, lastFrame, prompt, options)– Generates a transition clip using Veo 3.1 Fast's first-and-last-frame APIextractFirstFrame(videoBuffer)– Extracts the first frame from a video buffer as JPEGextractLastFrame(videoBuffer)– Extracts the last frame from a video buffer as JPEGmergeVideoBuffers(buffers)– Concatenates multiple MP4 buffers into oneexecuteDirectorPipeline(segments, options)– Full Director Mode orchestratorvalidateDirectorModeInput(options)– Validates segment structure, count, image types, etc.
Configuration & Best Practices
Duration Calculation
Note: In Director Mode,
output.video.lengthcontrols the duration of each main segment clip. There is no separatesegmentDurationSecondsfield — the existingVideoOutputOptions.lengthis reused to avoid duplication. All segments share the same clip duration; per-segment duration variance is not currently supported (use different Director Mode calls if needed).
Each transition can have its own duration, so the total is the sum of all clip durations plus the sum of all individual transition durations:
| Segments | Clip Duration | Transition Durations | Total Duration |
|---|---|---|---|
| 2 | 6s | [4s] | 16s (2×6 + 4) |
| 3 | 8s | [4s, 6s] | 34s (3×8 + 4 + 6) |
| 4 | 4s | [4s, 6s, 8s] | 34s (4×4 + 4 + 6 + 8) |
| 5 | 6s | [4s, 6s, 4s, 8s] | 52s (5×6 + 4 + 6 + 4 + 8) |
| N | Ds | [T₁, T₂, …, T₍ₙ₋₁₎] | N×D + Σ Tᵢ |
API Call Count
| Segments | Main Clips | Transition Clips | Total API Calls |
|---|---|---|---|
| 2 | 2 | 1 | 3 |
| 3 | 3 | 2 | 5 |
| 5 | 5 | 4 | 9 |
| 10 | 10 | 9 | 19 |
Worst-Case Analysis (10 Segments at Maximum Settings)
The 10-segment limit balances capability with practical constraints:
| Metric | Value | Calculation |
|---|---|---|
| Total API calls | 19 | 10 clips + 9 transitions |
| Wall-clock time (concurrency=3) | ~18 minutes | ceil(10/3) × 3min (clips) + ceil(9/3) × 2min (transitions) |
| Wall-clock time (concurrency=1) | ~48 minutes | 10 × 3min + 9 × 2min (fully sequential) |
| Total video duration | ~152 seconds | 10 × 8s (clips) + 9 × 8s (transitions, worst case) |
| Burst quota required | 3-10 concurrent | Depends on concurrency setting |
Why 10? Beyond 10 segments, single-pipeline wall-clock time exceeds 30 minutes and costs grow proportionally. For longer productions, chain multiple Director Mode calls and concatenate the outputs externally, or use the upcoming Batch Director API.
Best Practices
1. Prompt Engineering for Transitions
// ❌ Generic / unhelpful
const badTransition = "Make a transition";
// ✅ Describe the camera movement or visual flow
const goodTransition =
"Camera smoothly drifts right, transitioning between scenes";
// ✅ Match the scene context
const contextualTransition =
"Focus shifts from foreground to background as light changes";
// ✅ Use per-transition prompts for narrative coherence
const perTransition = [
"Camera follows a path from the garden into the house",
"Time-lapse of light changing from day to dusk",
];
2. Image Preparation for Smooth Transitions
// Best results when adjacent segments share visual characteristics:
// - Similar color palette
// - Compatible lighting conditions
// - Related subject matter
// The Veo interpolation works best when:
// 1. Both frames have similar aspect ratios to the target
// 2. The visual distance between frames is moderate (not extreme jumps)
// 3. Images are well-exposed and sharp
3. Timeout Configuration
// Rule of thumb: ~2-3 minutes per video generation call
// For N segments: timeout ≥ (N + (N-1)) × 180000 ms (worst case sequential)
// With concurrency=3: timeout ≥ ceil(N/3) × 180000 + (N-1) × 180000
// 3 segments: ~10 minutes
const threeSegments = { timeout: 600000 };
// 5 segments: ~15 minutes
const fiveSegments = { timeout: 900000 };
// 10 segments: ~30 minutes
const tenSegments = { timeout: 1800000 };
4. Cost Optimization
| Strategy | Impact | Trade-off |
|---|---|---|
| Use 720p for drafts | ~20% lower cost | Lower visual quality |
| Use 4s clips for previews | ~50% lower cost | Shorter segments |
| Limit to 3-5 segments | Fewer API calls | Shorter total video |
Use veo-3.1-fast for main clips too | Faster generation | Slightly lower quality |
| Reduce concurrency to 1 | Lower burst quota | Longer wall-clock time |
Pricing reference: Look up current per-second rates for Veo 3.1 (main clips) and Veo 3.1 Fast (transitions) on the Vertex AI Generative AI pricing page. Rates vary by resolution (720p vs 1080p) and model variant.
Error Handling & Validation
Director Mode Validation Rules
| Parameter | Validation | Error Message |
|---|---|---|
input.segments | Must be array with 2-10 entries | Director Mode requires 2-10 segments |
input.segments[i].prompt | Must be a non-empty string | Segment X requires a non-empty prompt |
input.segments[i].image | Must be Buffer, string (URL/path), or ImageWithAltText | Segment X requires a valid image (Buffer, URL, path, or ImageWithAltText) |
transitionPrompts | Optional; if provided, length must be N-1 | Expected X transition prompts, got Y |
transitionDurations | Optional; if provided, array of N-1 values, each 4, 6, or 8 | Expected N-1 transition durations, got X / Invalid transition duration at index X. Use 4, 6, or 8 |
director.concurrency | Must be 1-3 | Concurrency must be between 1 and 3 |
| Segment limit | Max 10 segments | Director Mode supports up to 10 segments |
Partial Failure Handling
Director Mode uses a differentiated failure strategy depending on which pipeline stage fails:
| Failure Type | Behavior | Rationale |
|---|---|---|
| Main clip generation fails | Pipeline fails immediately with DIRECTOR_CLIP_FAILED error. Returns metadata about which segments succeeded (for debugging). | A missing segment cannot be meaningfully recovered — the final video would have a gap. |
| Frame extraction fails | Retry extraction once. If retry fails, skip the affected transition and fall back to a hard cut. | Frame extraction is a local CPU operation; transient failures are rare but possible with corrupted buffers. |
| Transition generation fails | Skip the failed transition and concatenate adjacent clips directly (hard cut). Log a warning. | A missing transition degrades quality but produces a valid video. The user can re-run with a simpler transition prompt. |
| Video merge fails | Pipeline fails with DIRECTOR_MERGE_FAILED error. Returns individual clip buffers in error.context.clipBuffers for manual recovery. | Merge failure is non-recoverable within the pipeline, but individual clips are still valuable. |
Design rationale: Main clip failures are fatal because there's no sensible way to fill a segment gap. Transition failures are non-fatal because a hard cut (direct concatenation) is a valid — if less polished — editing technique. This mirrors how professional video editors treat transitions as optional polish, not structural requirements.
Partial result metadata: When transitions fall back to hard cuts, the result metadata indicates which transitions were skipped:
// If a transition clip fails, the pipeline can:
// 1. Skip the transition (hard cut between clips)
// 2. Retry the transition with a simpler prompt
// 3. Fail the entire operation (default for main clip failures)
const result = await neurolink.generate({
input: {
segments: [
{ prompt: "Scene 1", image: img1 },
{ prompt: "Scene 2", image: img2 },
{ prompt: "Scene 3", image: img3 },
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "720p", length: 6 },
director: {
transitionPrompts: [
"Smooth cinematic transition between scenes",
"Gentle camera drift connecting the moments",
],
},
},
});
// If transition 1→2 fails but clips succeed, the result may contain
// a hard cut at that point. Check metadata for details:
if (result.video?.metadata?.transitions) {
for (const tx of result.video.metadata.transitions) {
if (!tx.duration) {
console.warn(
`Transition ${tx.fromSegment}→${tx.toSegment} used hard cut`,
);
}
}
}
Error Types
// Director-specific error codes (extend VIDEO_ERROR_CODES)
const DIRECTOR_ERROR_CODES = {
/** Invalid segment structure (missing prompt or image) */
SEGMENT_MISMATCH: "DIRECTOR_SEGMENT_MISMATCH",
/** Too many segments requested */
SEGMENT_LIMIT_EXCEEDED: "DIRECTOR_SEGMENT_LIMIT_EXCEEDED",
/** A main clip generation call failed (fatal) */
CLIP_FAILED: "DIRECTOR_CLIP_FAILED",
/** Frame extraction from clip failed */
FRAME_EXTRACTION_FAILED: "DIRECTOR_FRAME_EXTRACTION_FAILED",
/** Transition clip generation failed (non-fatal, falls back to hard cut) */
TRANSITION_FAILED: "DIRECTOR_TRANSITION_FAILED",
/** Video merge/concatenation failed */
MERGE_FAILED: "DIRECTOR_MERGE_FAILED",
/** Pipeline timeout (overall) */
PIPELINE_TIMEOUT: "DIRECTOR_PIPELINE_TIMEOUT",
};
Comparison: Standard vs Director Mode
| Feature | Standard Video Generation | Director Mode |
|---|---|---|
| Input format | input.text + input.images | input.segments array (2-10 DirectorSegment objects) |
| Output | Single clip (4-8s) | Merged multi-segment video |
| Transitions | N/A | AI-generated clips with per-transition duration (4-8s) |
| API calls | 1 | N + (N-1) calls |
| Veo API feature | image only | image + lastFrame |
| Processing time | 1-3 minutes | 5-30 minutes (depends on segment count) |
| Max duration | 8 seconds | ~152s (10×8s clips + 9×8s transitions max) |
| Concurrency | N/A | Up to 3 parallel clip generations |
| Error recovery | All-or-nothing | Fatal for clips, fallback hard cuts for transitions |
Troubleshooting
| Symptom | Cause | Solution |
|---|---|---|
| "Segment mismatch" error | Missing prompt or image in a segment | Ensure each segment has both prompt and image |
| Transition looks jarring | Large visual gap between adjacent clips | Use visually similar images; improve transition prompt |
| Pipeline timeout | Too many segments or high resolution | Reduce segment count, use 720p, or increase timeout |
| Rate limit errors | Too many concurrent API calls | Reduce director.concurrency to 1-2 |
| Frame extraction fails | Corrupted video buffer | Retry the failed clip generation |
| Audio discontinuity at transitions | Each clip has independently generated audio | Expected behavior — transition clips bridge the audio gap |
| "Segment limit exceeded" | More than 10 segments provided | Split into multiple Director Mode calls |
| High cost | Many high-resolution segments | Use 720p and 4s clips for drafts, upgrade for final output |
Debug Mode
// Enable debug logging to trace the Director pipeline
const neurolink = new NeuroLink({
debug: true,
logLevel: "verbose",
});
// Or via environment variable
// export NEUROLINK_DEBUG=true
// Debug output shows:
// - Segment validation results
// - Per-clip generation start/completion
// - Frame extraction timing
// - Transition generation details
// - Merge operation status
Limitations
| Limitation | Description | Workaround |
|---|---|---|
| Max 10 segments | API and processing constraints | Chain multiple Director Mode calls |
| Fixed transition model | Transitions always use veo-3.1-fast; not configurable | N/A |
| No custom audio | Audio is AI-generated for each clip independently | Post-process with external audio editing tools |
| Sequential transitions | Transitions must wait for clip frames to be extracted | Inherent to the pipeline (frames depend on clips) |
| MP4 output only | Merged output is always MP4 | Convert with ffmpeg post-generation if needed |
| Vertex AI only | Veo models are Vertex-exclusive | No alternative providers currently |
| Processing time | Multi-segment is inherently slower | Use concurrency and lower settings for drafts |
Related Features
- Video Generation – Single-clip video generation (Director Mode builds on this)
- Multimodal Chat – Image and file input capabilities
- Video Analysis – Analyze existing video content
Next: Video Generation | Multimodal Chat