Video Director Mode
Director Mode extends NeuroLink's video generation capability to produce multi-segment videos with seamless AI-generated transitions. Instead of a single clip, you define an array of segments — each with its own prompt and image — and NeuroLink orchestrates the full pipeline: generating each clip, extracting boundary frames, producing transition videos (with individually configurable durations) using Veo 3.1's first-and-last-frame interpolation, and merging everything into one continuous video.
Overview
Director Mode is triggered automatically when you supply an input.segments array to the video generation API. Each segment is a self-documenting { prompt, image } object, mapping cleanly to the pipeline concept of ordered video segments.
graph TD
A["Segment 1: Image A + Prompt 1"] --> G1["Generate Clip 1 (4-8s)"]
B["Segment 2: Image B + Prompt 2"] --> G2["Generate Clip 2 (4-8s)"]
C["Segment 3: Image C + Prompt 3"] --> G3["Generate Clip 3 (4-8s)"]
G1 --> F1["Extract Last Frame of Clip 1"]
G2 --> F2a["Extract First Frame of Clip 2"]
G2 --> F2b["Extract Last Frame of Clip 2"]
G3 --> F3["Extract First Frame of Clip 3"]
F1 --> T1["Generate Transition 1→2"]
F2a --> T1
F2b --> T2["Generate Transition 2→3"]
F3 --> T2
G1 --> M["Merge: Clip1 + Trans1 + Clip2 + Trans2 + Clip3"]
T1 --> M
G2 --> M
T2 --> M
G3 --> M
M --> O["Final Merged Video (MP4)"]
How It Works
- Parallel clip generation – All main clips are generated concurrently (fixed concurrency of 2) via Veo 3.1's image-to-video endpoint, with a circuit breaker that trips after 2 consecutive failures to avoid wasted API calls
- Frame extraction – The last frame of clip N and first frame of clip N+1 are extracted from generated video buffers (with MP4 ftyp header validation)
- Parallel transition generation – Veo 3.1 Fast's first-and-last-frame interpolation API generates transitions between each pair of adjacent clips in parallel (same concurrency limit), with individually configurable duration (4, 6, or 8 seconds each)
- Sequential merge – Clips and transitions are concatenated:
Clip₁ → Trans₁₋₂ → Clip₂ → Trans₂₋₃ → Clip₃ → … - Single output – The merged result is returned as one
VideoGenerationResultbuffer
Key Technology: Veo First-and-Last-Frame Interpolation
The transition clips use Veo 3.1's native lastFrame parameter in the predictLongRunning API. Instead of generating from a single image, you provide two images — the first frame and the last frame — and Veo generates a video that smoothly interpolates between them:
{
"instances": [
{
"prompt": "Smooth cinematic transition",
"image": {
"bytesBase64Encoded": "<LAST_FRAME_OF_CLIP_N>",
"mimeType": "image/jpeg"
},
"lastFrame": {
"bytesBase64Encoded": "<FIRST_FRAME_OF_CLIP_N+1>",
"mimeType": "image/jpeg"
}
}
],
"parameters": {
"sampleCount": 1,
"durationSeconds": 6,
"aspectRatio": "16:9",
"resolution": "720p"
}
}
This produces a physically coherent, AI-generated morph — far superior to simple crossfade or dissolve effects. The durationSeconds value is set independently for each transition (from the transitionDurations array), allowing shorter or longer interpolations per segment boundary.
What You Get
- Multi-segment video – Chain any number of video segments into a single continuous output
- AI transitions – Per-transition configurable duration (4, 6, or 8 seconds each) generated by Veo 3.1 frame interpolation (not simple crossfades)
- Parallel generation – Both main clips and transitions are generated concurrently (fixed concurrency of 2) with a circuit breaker for clip failures
- Mixed image inputs – Each segment's
imagefield accepts a Buffer, file path, URL, orImageWithAltText - Consistent settings – Resolution, aspect ratio, and audio settings apply uniformly across all segments and transitions
- Per-segment customization – Each segment is a self-contained
{ prompt, image }object - Buffer validation – All video buffers are validated for MP4 ftyp headers before frame extraction and merging
- SDK only – Use programmatically via
generate()(CLI not supported for Director Mode)
Supported Provider & Model
| Provider | Model | Interpolation Support | Transition Duration | Max Segments | Concurrency |
|---|---|---|---|---|---|
vertex | veo-3.1 (clips) / veo-3.1-fast (transitions) | First + Last Frame | 4-8s per transition | 10 | 2 (fixed) |
Note: The
lastFrameparameter is supported byveo-2.0-generate-001,veo-3.1-generate-001, andveo-3.1-fast-generate-001. NeuroLink usesveo-3.1-generate-001for main clips andveo-3.1-fast-generate-001for transition clips (faster generation with minimal quality difference for short interpolations).
Prerequisites
Same as Video Generation prerequisites, plus:
- Sufficient quota – Director Mode generates
N + (N-1)video operations (N clips + N-1 transitions). Ensure your Vertex AI project has adequate quota. - Adequate timeout – Multi-segment generation takes proportionally longer. Set
timeoutaccordingly (recommended: 5-10 minutes for 3+ segments).
Quick Start
SDK Usage
import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";
const neurolink = new NeuroLink();
// Director Mode: define segments → merged video
const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Camera slowly pans across the product on a white table",
image: await readFile("./scene1.jpg"),
},
{
prompt: "Dynamic zoom into product details with dramatic lighting",
image: await readFile("./scene2-detail.jpg"),
},
{
prompt: "Wide shot pulling back to reveal the full scene",
image: await readFile("./scene3-wide.jpg"),
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: {
resolution: "720p",
length: 6, // Per-segment clip duration (reused from VideoOutputOptions)
aspectRatio: "16:9",
audio: true,
},
},
timeout: 600000, // 10 minutes for multi-segment
});
if (result.video) {
await writeFile("director-output.mp4", result.video.data);
console.log(`Total duration: ${result.video.metadata?.duration}s`);
console.log(`Segments: ${result.video.metadata?.segmentCount}`);
}
Using Image URLs
import { NeuroLink } from "@juspay/neurolink";
import { writeFile } from "fs/promises";
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Serene sunrise over calm waters",
image: "https://example.com/sunrise.jpg",
},
{
prompt: "Waves crashing on a rocky coastline",
image: "https://example.com/coastline.jpg",
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "1080p", length: 8 },
},
timeout: 600000,
});
if (result.video) {
await writeFile("ocean-director.mp4", result.video.data);
}
Mixed Input Types
Each segment's image field accepts a Buffer, file path, URL, or ImageWithAltText:
const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Product reveal from shadow to light",
image: await readFile("./product-dark.jpg"), // Buffer
},
{
prompt: "360-degree rotation showcasing all angles",
image: "https://cdn.example.com/product-turntable.png", // URL
},
{
prompt: "Final hero shot with brand overlay",
image: { data: await readFile("./hero.jpg"), altText: "Hero" }, // ImageWithAltText
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "1080p", length: 6, aspectRatio: "16:9" },
},
});
Note: Director Mode is SDK-only. CLI support is not available for this generation type. Use the standard
--outputMode videoCLI flags for single-clip video generation.
Comprehensive Examples
Example 1: Product Commercial (3 Segments)
import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: {
segments: [
{
prompt:
"Dramatic reveal: camera sweeps up from a dark surface to unveil the product under a spotlight",
image: await readFile("./product-dark.jpg"),
},
{
prompt:
"Close-up detail shot: camera slowly orbits the product, focusing on texture and craftsmanship",
image: await readFile("./product-detail.jpg"),
},
{
prompt:
"Lifestyle context: camera pulls back to show the product in an elegant room setting",
image: await readFile("./product-lifestyle.jpg"),
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: {
resolution: "1080p",
length: 8,
aspectRatio: "16:9",
audio: true,
},
// Director-specific options
director: {
transitionPrompts: [
"Elegant dissolve with subtle camera drift",
"Smooth pull-back revealing the wider scene",
],
transitionDurations: [4, 6], // Per-transition: first transition 4s, second 6s
},
},
timeout: 600000,
});
if (result.video) {
await writeFile("product-commercial.mp4", result.video.data);
console.log("Director Mode output:", {
totalDuration: result.video.metadata?.duration, // ~34s (3×8s + 4s + 6s)
segmentCount: result.video.metadata?.segmentCount, // 3
transitionCount: result.video.metadata?.transitionCount, // 2
resolution: result.video.metadata?.dimensions,
fileSize: `${(result.video.data.length / 1024 / 1024).toFixed(1)} MB`,
});
}
Example 2: Social Media Story (Portrait, 4 Segments)
import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Morning coffee being poured in slow motion",
image: await readFile("./coffee.jpg"),
},
{
prompt: "Hands wrapping a gift box with a ribbon",
image: await readFile("./wrapping.jpg"),
},
{
prompt: "Gift box placed on a doorstep, camera tilts up",
image: await readFile("./doorstep.jpg"),
},
{
prompt: "Recipient opens door, reaction shot",
image: await readFile("./reaction.jpg"),
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: {
resolution: "1080p",
length: 4,
aspectRatio: "9:16", // Portrait for stories/reels
audio: true,
},
director: {
transitionPrompts: [
"Quick, energetic swipe transition",
"Fast zoom through a blur into the next scene",
"Snap cut with motion blur connecting the moments",
],
transitionDurations: [4, 6, 8], // Each transition can have its own duration
},
},
timeout: 900000,
});
if (result.video) {
await writeFile("story.mp4", result.video.data);
// Total: 4×4s clips + transitions (4s + 6s + 8s) = 34 seconds
}
Example 3: AI-Driven Storyboard
import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";
const neurolink = new NeuroLink();
// Step 1: Use AI to generate a storyboard from a concept
const storyboard = await neurolink.generate({
input: {
text: `Create a 3-scene storyboard for a 30-second product commercial for a luxury watch.
Return a JSON array of objects with "scene" (number), "prompt" (video generation prompt),
and "imageDescription" (what the input image should show).`,
},
provider: "vertex",
model: "gemini-3-flash-preview",
output: { format: "json" },
});
const scenes = JSON.parse(storyboard.content);
// Step 2: Generate video using the AI storyboard
const watchImages = [
await readFile("./watch-closeup.jpg"),
await readFile("./watch-wrist.jpg"),
await readFile("./watch-lifestyle.jpg"),
];
const result = await neurolink.generate({
input: {
segments: scenes.map((s: { prompt: string }, i: number) => ({
prompt: s.prompt,
image: watchImages[i],
})),
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "1080p", length: 8, aspectRatio: "16:9" },
director: {
transitionPrompts: [
"Cinematic slow dissolve with depth of field shift",
"Smooth pan transitioning to the next scene",
],
},
},
timeout: 600000,
});
if (result.video) {
await writeFile("ai-storyboard.mp4", result.video.data);
}
Example 4: Batch Director Mode
import { NeuroLink } from "@juspay/neurolink";
import { readFile, writeFile, readdir } from "fs/promises";
import path from "path";
type StoryConfig = {
name: string;
segments: Array<{ prompt: string; imagePath: string }>;
};
async function batchDirectorGenerate(stories: StoryConfig[]) {
const neurolink = new NeuroLink();
const results = [];
for (const story of stories) {
console.log(`Generating: ${story.name}`);
try {
const segments = await Promise.all(
story.segments.map(async (seg) => ({
prompt: seg.prompt,
image: await readFile(seg.imagePath),
})),
);
const result = await neurolink.generate({
input: {
segments,
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "720p", length: 6 },
},
timeout: 600000,
});
if (result.video) {
const outputPath = `./output/${story.name}.mp4`;
await writeFile(outputPath, result.video.data);
results.push({ name: story.name, output: outputPath, success: true });
}
} catch (error) {
results.push({
name: story.name,
success: false,
error: error instanceof Error ? error.message : String(error),
});
}
}
return results;
}
// Usage
const results = await batchDirectorGenerate([
{
name: "product-A",
segments: [
{ prompt: "Hero reveal", imagePath: "./a-hero.jpg" },
{ prompt: "Feature showcase", imagePath: "./a-feature.jpg" },
{ prompt: "Call to action", imagePath: "./a-cta.jpg" },
],
},
{
name: "product-B",
segments: [
{ prompt: "Unboxing experience", imagePath: "./b-unbox.jpg" },
{ prompt: "In-use demonstration", imagePath: "./b-demo.jpg" },
],
},
]);
console.table(results);
Example 5: Error Handling in Director Mode
⚠️ Full-job retry warning: The
generateWithRetryfunction below retries the entireneurolink.generate()call on any retriableVideoError. This means all segments and transitions are re-generated from scratch, incurring full cost each attempt ($10-60+ depending on settings). This is appropriate only for transient failures (e.g., rate limits) where partial results are not recoverable.Note that Director Mode already handles transition failures gracefully — failed transitions fall back to hard cuts rather than failing the pipeline (see Partial Failure Handling). Only fatal errors like
DIRECTOR_CLIP_FAILEDorDIRECTOR_MERGE_FAILEDpropagate asVideoError. Keep this in mind when deciding whether a full-job retry is warranted.Preferred approach: Once per-segment resume semantics are available, prefer retrying at the clip/transition level rather than re-running the entire pipeline. Until then, if you use full-job retry, keep
maxRetrieslow (1-2) and restrict retries to rate-limit or timeout errors to control costs.
import { NeuroLink, VideoError } from "@juspay/neurolink";
import { readFile, writeFile } from "fs/promises";
const neurolink = new NeuroLink();
// WARNING: Each retry re-runs ALL segments via neurolink.generate(),
// incurring full API cost. See note above.
async function generateWithRetry(maxRetries = 2) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const result = await neurolink.generate({
input: {
segments: [
{
prompt: "Product introduction",
image: await readFile("./intro.jpg"),
},
{
prompt: "Feature highlight",
image: await readFile("./feature.jpg"),
},
{
prompt: "Brand closing",
image: await readFile("./closing.jpg"),
},
],
},
provider: "vertex",
model: "veo-3.1",
output: {
mode: "video",
video: { resolution: "720p", length: 6 },
},
timeout: 600000,
});
if (result.video) {
await writeFile("output.mp4", result.video.data);
return result;
}
throw new Error("No video in result");
} catch (error) {
if (error instanceof VideoError) {
console.error(
`Attempt ${attempt} failed [${error.code}]:`,
error.message,
);
// Don't retry configuration or validation errors
if (
error.category === "configuration" ||
error.category === "validation" ||
error.category === "permission"
) {
throw error;
}
// Retry on rate limits and transient failures only.
// Be aware: this re-runs the full Director pipeline (all segments + transitions).
if (error.retriable && attempt < maxRetries) {
const backoff = Math.pow(2, attempt) * 5000;
console.log(
`Retrying entire Director pipeline in ${backoff / 1000}s (attempt ${attempt + 1}/${maxRetries})...`,
);
await new Promise((r) => setTimeout(r, backoff));
continue;
}
}
throw error;
}
}
}
Type Definitions
Director Mode Input (Extended GenerateOptions)
Director Mode introduces a DirectorSegment type and adds a segments field to GenerateOptions.input:
/**
* A single segment in Director Mode, representing one video clip.
*/
type DirectorSegment = {
/** Prompt describing the video content for this segment */
prompt: string;
/** Input image for this segment (Buffer, URL string, file path, or ImageWithAltText) */
image: Buffer | string | ImageWithAltText;
};
type GenerateOptions = {
input: {
/** Prompt for standard (single-clip) mode */
text: string;
/** Standard mode images */
images?: Array<Buffer | string | ImageWithAltText>;
/**
* Director Mode segments. When provided, Director Mode is activated automatically.
* Each segment contains its own prompt and image — no need for separate text/images arrays.
* Must contain 2-10 segments.
*/
segments?: DirectorSegment[];
// ... other existing fields
};
output?: {
mode?: "text" | "video" | "ppt";
/**
* Video output options. In Director Mode, `video.length` controls the
* per-segment clip duration (4, 6, or 8 seconds). There is no separate
* `segmentDurationSeconds` field — this single field applies uniformly
* to all segments to avoid duplication and ambiguity.
*/
video?: VideoOutputOptions;
/** Director Mode configuration (only used when input.segments is provided) */
director?: DirectorModeOptions;
};
// ... other existing fields
};
DirectorModeOptions
type DirectorModeOptions = {
/**
* Prompts for generating transition clips (array of N-1 entries for N segments).
* transitionPrompts[i] is used for the transition between segment i and segment i+1.
* If provided, must contain exactly N-1 prompts where N is the number of segments.
*
* **When omitted:** The pipeline auto-generates a default prompt for each transition:
* `"Smooth cinematic transition between scenes"`. This produces a generic but
* visually coherent interpolation. For narrative-driven videos, explicit prompts
* that describe the desired camera movement or visual flow are recommended.
*/
transitionPrompts?: string[];
/**
* Duration of each transition clip in seconds (array of N-1 entries for N segments).
* transitionDurations[i] sets the duration for the transition between segment i and segment i+1.
* Each value must be 4, 6, or 8 (4 recommended for seamless feel).
* If omitted, all transitions default to 4 seconds.
* @default [4, 4, ...] (all 4s)
*/
transitionDurations?: Array<4 | 6 | 8>;
};
Note: Concurrency is fixed internally at 2 parallel Vertex API calls. This is not user-configurable — it balances throughput against API rate limits and is shared across both clip generation and transition generation phases.
Extended VideoGenerationResult (Director Mode)
type VideoGenerationResult = {
data: Buffer;
mediaType: "video/mp4" | "video/webm";
metadata?: {
// Standard fields
duration?: number;
dimensions?: { width: number; height: number };
model?: string;
provider?: string;
aspectRatio?: string;
audioEnabled?: boolean;
processingTime?: number;
// Director Mode fields (present when Director Mode is used)
/** Number of main segments in the video */
segmentCount?: number;
/** Number of transition clips generated */
transitionCount?: number;
/** Duration of each main clip in seconds */
clipDuration?: number;
/** Durations of each transition in seconds (one per transition) */
transitionDurations?: number[];
/** Per-segment metadata */
segments?: Array<{
index: number;
duration: number;
processingTime: number;
}>;
/** Per-transition metadata */
transitions?: Array<{
fromSegment: number;
toSegment: number;
duration: number;
processingTime: number;
}>;
};
};
Architecture & Implementation
Pipeline Flow
User Input (N segments, each with prompt + image)
│
▼
┌─────────────────────────────────┐
│ 1. Validation │
│ - Validate each segment has │
│ prompt and image │
│ - Enforce segment limit (≤10) │
└─────────────────┬───────────────┘
│
▼
┌─────────────────────────────────┐
│ 2. Parallel Clip Generation │
│ - Generate N clips via │
│ generateVideoWithVertex() │
│ - Respect concurrency limit │
└─────────────────┬───────────────┘
│
▼
┌─────────────────────────────────┐
│ 3. Frame Extraction │
│ - Extract last frame from │
│ clip[i] (decode MP4 → JPEG) │
│ - Extract first frame from │
│ clip[i+1] │
└─────────────────┬───────────────┘
│
▼
┌─────────────────────────────────┐
│ 4. Parallel Transition Gen. │
│ - For each pair (i, i+1): │