Amazon SageMaker Provider Guide

Custom model endpoints on AWS SageMaker infrastructure

Version: 9.26.x | Status: General Availability | Streaming: Not Available (see warning below)

Overview

Amazon SageMaker provides managed infrastructure for deploying custom AI model endpoints. Unlike AWS Bedrock (which offers serverless access to foundation models), SageMaker gives you full control over the hosting environment, letting you deploy fine-tuned models, Hugging Face models, JumpStart pre-built models, or entirely custom inference containers.

Streaming Is NOT Implemented

The SageMaker provider does not support streaming via neurolink.stream(). Calling stream() will throw a SageMakerError with status code 501. Use generate() for all SageMaker requests. Streaming support is planned for a future release.

Key Benefits

Custom Models: Deploy any model you train or fine-tune
Hugging Face Hub: One-click deployment of thousands of open-source models
JumpStart: Pre-built solutions for Llama, Mistral, Falcon, and more
Full Control: Choose instance types, autoscaling policies, and networking
AWS Integration: IAM, VPC, CloudWatch, S3
Enterprise Security: PrivateLink, KMS encryption, VPC isolation
Batch Inference: Built-in support for processing multiple prompts in parallel

Supported Model Types

Model Type	Value	Description	Example Use Case
Llama	`llama`	Meta Llama models deployed via JumpStart or custom	General-purpose, cost-effective
Mistral	`mistral`	Mistral AI models on SageMaker	Coding, European compliance
Claude	`claude`	Anthropic Claude models via custom containers	Complex reasoning
Hugging Face	`huggingface`	Any Hugging Face Hub model via SageMaker containers	NLP, classification, summarization
JumpStart	`jumpstart`	AWS JumpStart pre-built model packages	Quick deployment, managed updates
Custom	`custom`	Any custom inference container or algorithm	Proprietary models, specialized

Quick Start

1. Deploy a Model Endpoint

Before using the SageMaker provider, you need a running SageMaker endpoint. You can create one through the AWS Console, AWS CLI, or SageMaker SDK.

# Example: Deploy a JumpStart Llama model via AWS CLI
aws sagemaker create-endpoint \
  --endpoint-name my-llama-endpoint \
  --endpoint-config-name my-llama-config \
  --region us-east-1

Or via the AWS Console:

Open SageMaker Console
Navigate to Inference > Endpoints
Create a new endpoint with your model
Wait for the endpoint status to become InService

2. Configure Environment Variables

# Required: AWS credentials
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key

# Required: SageMaker endpoint name
export SAGEMAKER_DEFAULT_ENDPOINT=my-llama-endpoint

# Optional: Region (defaults to us-east-1)
export SAGEMAKER_REGION=us-east-1

# Optional: Model identifier
export SAGEMAKER_MODEL=my-custom-llama

# Optional: Model type for request formatting
export SAGEMAKER_MODEL_TYPE=llama

3. Use with NeuroLink SDK

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

const result = await ai.generate({
  input: { text: "Explain quantum computing in simple terms." },
  provider: "sagemaker",
});

console.log(result.content);

4. Use with NeuroLink CLI

# Basic generation
neurolink generate "Explain quantum computing" --provider sagemaker

# With specific model name
neurolink generate "Write a haiku" --provider sagemaker --model my-custom-model

Environment Variables

AWS Credentials (Required)

Variable	Required	Description
`AWS_ACCESS_KEY_ID`	Yes	AWS access key ID for authentication
`AWS_SECRET_ACCESS_KEY`	Yes	AWS secret access key for authentication
`AWS_SESSION_TOKEN`	No	Session token for temporary credentials (STS)

Region Configuration

Region is resolved in priority order:

Constructor region parameter (highest priority)
SAGEMAKER_REGION environment variable
AWS_REGION environment variable
"us-east-1" (default)

Variable	Default	Description
`SAGEMAKER_REGION`	-	SageMaker-specific region override
`AWS_REGION`	`"us-east-1"`	General AWS region

Endpoint Configuration

Endpoint name is resolved in priority order:

SAGEMAKER_DEFAULT_ENDPOINT
SAGEMAKER_ENDPOINT_NAME
"default-endpoint" (fallback; will fail connectivity checks)

Variable	Default	Description
`SAGEMAKER_DEFAULT_ENDPOINT`	-	Primary endpoint name (recommended)
`SAGEMAKER_ENDPOINT_NAME`	-	Alternate endpoint name variable
`SAGEMAKER_ENDPOINT`	-	Custom AWS service endpoint URL (for VPC/PrivateLink)

SAGEMAKER_ENDPOINT vs SAGEMAKER_DEFAULT_ENDPOINT

SAGEMAKER_ENDPOINT sets a custom AWS service URL (e.g., a VPC endpoint), while SAGEMAKER_DEFAULT_ENDPOINT and SAGEMAKER_ENDPOINT_NAME set the name of your deployed SageMaker model endpoint.

Model Configuration

Model name is resolved in priority order:

SAGEMAKER_MODEL
SAGEMAKER_MODEL_NAME
"sagemaker-model" (default)

Variable	Default	Description
`SAGEMAKER_MODEL`	`"sagemaker-model"`	Model identifier
`SAGEMAKER_MODEL_NAME`	-	Alternate model name variable
`SAGEMAKER_MODEL_TYPE`	`"custom"`	Model type: `llama`, `mistral`, `claude`, `huggingface`, `jumpstart`, `custom`

Request Configuration

Variable	Default	Description
`SAGEMAKER_CONTENT_TYPE`	`"application/json"`	Content-Type header for requests
`SAGEMAKER_ACCEPT`	`"application/json"`	Accept header for responses
`SAGEMAKER_CUSTOM_ATTRIBUTES`	-	Custom attributes passed to the endpoint
`SAGEMAKER_INPUT_FORMAT`	`"custom"`	Input format: `huggingface`, `jumpstart`, `custom`
`SAGEMAKER_OUTPUT_FORMAT`	`"custom"`	Output format: `huggingface`, `jumpstart`, `custom`

Generation Defaults

Variable	Default	Description
`SAGEMAKER_MAX_TOKENS`	-	Maximum tokens to generate (model default if unset)
`SAGEMAKER_TEMPERATURE`	-	Temperature for sampling (0.0 - 2.0)
`SAGEMAKER_TOP_P`	-	Top-p (nucleus) sampling (0.0 - 1.0)
`SAGEMAKER_STOP_SEQUENCES`	-	Comma-separated stop sequences

Client Configuration

Variable	Default	Description
`SAGEMAKER_TIMEOUT`	`30000`	Request timeout in milliseconds (1000-300000)
`SAGEMAKER_MAX_RETRIES`	`3`	Maximum retry attempts (0-10)

SDK Usage

Basic Generation

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

const result = await ai.generate({
  input: { text: "Summarize the benefits of serverless architecture." },
  provider: "sagemaker",
});

console.log(result.content);

With Configuration Options

const result = await ai.generate({
  input: { text: "Write a technical blog post about MLOps." },
  provider: "sagemaker",
  model: "my-fine-tuned-llama",
  temperature: 0.8,
  maxTokens: 2000,
});

Testing Connectivity

import { AmazonSageMakerProvider } from "@juspay/neurolink";

const provider = new AmazonSageMakerProvider(
  "my-model", // model name
  "my-endpoint", // endpoint name
  "us-east-1", // region
);

// Test configuration validity
const connectionTest = await provider.testConnection();
console.log("Connected:", connectionTest.connected);

// Get provider info
const info = provider.getSageMakerInfo();
console.log("Endpoint:", info.endpointName);
console.log("Model type:", info.modelType);
console.log("Region:", info.region);

CLI Usage

Basic Commands

# Generate with SageMaker
neurolink generate "Your prompt here" --provider sagemaker

# Use provider alias
neurolink generate "Your prompt here" --provider aws-sagemaker

# Specify model name
neurolink generate "Your prompt here" --provider sagemaker --model my-llama-model

# With temperature
neurolink generate "Creative writing task" --provider sagemaker --temperature 0.9

Loop Mode

# Start interactive session with SageMaker
neurolink loop --provider sagemaker

# Inside loop:
# > set provider sagemaker
# > set model my-fine-tuned-model
# > Explain the transformer architecture

Feature Support

Streaming: NOT Supported

Streaming Limitation

Calling neurolink.stream() with the SageMaker provider will throw a SageMakerError:

SageMaker streaming not yet fully implemented. Coming in next phase.

Error details: Code MODEL_ERROR, HTTP status 501.

Workaround: Use neurolink.generate() instead. If you need streaming behavior in your application, consider using a different provider (e.g., Bedrock, OpenAI) or implement application-level chunking of the generate response.

Embeddings: NOT Supported

The SageMaker provider does not implement embed() or embedMany(). Calling these methods will throw an error from the base provider. For embeddings on AWS, use the AWS Bedrock provider with Amazon Titan Embeddings or Cohere Embed models.

Tool Use

The SageMaker provider includes tool calling support at the language model level. Tools are converted to a format compatible with SageMaker endpoints. However, tool calling behavior depends entirely on the model deployed behind your endpoint:

Models that support function calling (e.g., fine-tuned Llama, Claude) should work with NeuroLink's tool system
Custom models or older model versions may not understand tool call formats
Test tool calling with your specific endpoint before relying on it in production

Structured Output

The provider supports json_object and json_schema response formats for models that can produce structured JSON. Again, actual support depends on the deployed model's capabilities.

Batch Inference

The SageMaker language model supports batch processing of multiple prompts with adaptive concurrency control:

Dynamic concurrency adjustment based on endpoint response times
Automatic error recovery for individual prompts in a batch
Configurable concurrency limits

IAM Permissions

Minimum Required Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sagemaker:InvokeEndpoint",
        "sagemaker:InvokeEndpointWithResponseStream"
      ],
      "Resource": "arn:aws:sagemaker:*:ACCOUNT_ID:endpoint/*"
    }
  ]
}

Restrictive Policy (Recommended for Production)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sagemaker:InvokeEndpoint",
        "sagemaker:InvokeEndpointWithResponseStream"
      ],
      "Resource": [
        "arn:aws:sagemaker:us-east-1:ACCOUNT_ID:endpoint/my-llama-endpoint",
        "arn:aws:sagemaker:us-east-1:ACCOUNT_ID:endpoint/my-mistral-endpoint"
      ]
    }
  ]
}

Setup via AWS CLI

# Create IAM policy
cat > sagemaker-invoke-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sagemaker:InvokeEndpoint",
        "sagemaker:InvokeEndpointWithResponseStream"
      ],
      "Resource": "arn:aws:sagemaker:*:ACCOUNT_ID:endpoint/*"
    }
  ]
}
EOF

# Create the policy
aws iam create-policy \
  --policy-name SageMakerInvokePolicy \
  --policy-document file://sagemaker-invoke-policy.json

# Attach to user or role
aws iam attach-user-policy \
  --user-name my-user \
  --policy-arn arn:aws:iam::ACCOUNT_ID:policy/SageMakerInvokePolicy

EC2 Instance Role

# Create trust policy for EC2
cat > trust-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

# Create role and attach policy
aws iam create-role \
  --role-name SageMakerEC2Role \
  --assume-role-policy-document file://trust-policy.json

aws iam attach-role-policy \
  --role-name SageMakerEC2Role \
  --policy-arn arn:aws:iam::ACCOUNT_ID:policy/SageMakerInvokePolicy

Provider Aliases

The SageMaker provider can be referenced by any of these names:

Alias	Example
`sagemaker`	`--provider sagemaker`
`aws-sagemaker`	`--provider aws-sagemaker`

VPC & Private Connectivity

Custom AWS Service Endpoint

To route SageMaker API calls through a VPC endpoint (PrivateLink), set the SAGEMAKER_ENDPOINT environment variable:

export SAGEMAKER_ENDPOINT=https://vpce-12345678.sagemaker-runtime.us-east-1.vpce.amazonaws.com

VPC Endpoint Setup

# Create VPC endpoint for SageMaker Runtime
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-12345678 \
  --service-name com.amazonaws.us-east-1.sagemaker.runtime \
  --route-table-ids rtb-12345678 \
  --subnet-ids subnet-12345678 subnet-87654321 \
  --security-group-ids sg-12345678

Configuration Validation

The provider validates all configuration at initialization using Zod schemas. If required variables are missing, it throws a descriptive error listing exactly which variables need to be set.

import { checkSageMakerConfiguration } from "@juspay/neurolink";

const check = checkSageMakerConfiguration();

console.log("Configured:", check.configured);
console.log("Issues:", check.issues);
console.log("Summary:", check.summary);

You can also load configuration from a JSON file instead of environment variables:

import { loadConfigurationFromFile } from "@juspay/neurolink";

const config = await loadConfigurationFromFile("./sagemaker-config.json");

Troubleshooting

Common Issues

1. "AWS credentials not configured"

Problem: Missing AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY.

Solution:

# Set credentials
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret

# Or configure via AWS CLI
aws configure

2. "SageMaker endpoint not configured"

Problem: No endpoint name provided and the default "default-endpoint" is being used.

Solution:

# Set your endpoint name
export SAGEMAKER_DEFAULT_ENDPOINT=my-model-endpoint

3. "SageMaker streaming not yet fully implemented"

Problem: Called stream() on the SageMaker provider.

Solution: Use generate() instead:

// Will throw an error
const stream = await ai.stream({ ... provider: "sagemaker" });

// Use this instead
const result = await ai.generate({ ... provider: "sagemaker" });

4. "SageMaker request timed out"

Problem: Endpoint did not respond within the timeout period.

Solution:

# Increase timeout (default: 30000ms)
export SAGEMAKER_TIMEOUT=60000

# Also check endpoint status in AWS Console
aws sagemaker describe-endpoint --endpoint-name my-endpoint

5. "ThrottlingException"

Problem: Exceeded SageMaker invocation rate limits.

Solution:

Check your endpoint's autoscaling configuration
Increase the number of instances behind your endpoint
The provider automatically retries throttled requests with exponential backoff (up to SAGEMAKER_MAX_RETRIES attempts)

6. "Endpoint not found"

Problem: The specified endpoint does not exist or is not in the correct region.

Solution:

# List endpoints in your region
aws sagemaker list-endpoints --region us-east-1

# Verify endpoint status
aws sagemaker describe-endpoint \
  --endpoint-name my-endpoint \
  --region us-east-1

AWS Bedrock Provider - Serverless foundation models on AWS (supports streaming and embeddings)
Provider Setup - General provider configuration
Hugging Face Provider - Direct Hugging Face Inference API access

Additional Resources

SageMaker Docs - Official documentation
SageMaker Pricing - Instance and inference pricing
SageMaker Console - Manage endpoints
JumpStart Models - Pre-built model catalog
Hugging Face on SageMaker - Deploy HF models to SageMaker

Need Help? Join our GitHub Discussions or open an issue.

Overview​

Key Benefits​

Supported Model Types​

Quick Start​

1. Deploy a Model Endpoint​

2. Configure Environment Variables​

3. Use with NeuroLink SDK​

4. Use with NeuroLink CLI​

Environment Variables​

AWS Credentials (Required)​

Region Configuration​

Endpoint Configuration​

Model Configuration​

Request Configuration​

Generation Defaults​

Client Configuration​

SDK Usage​

Basic Generation​

With Configuration Options​

Testing Connectivity​

CLI Usage​

Basic Commands​

Loop Mode​

Feature Support​

Streaming: NOT Supported​

Embeddings: NOT Supported​

Tool Use​

Structured Output​

Batch Inference​

IAM Permissions​

Minimum Required Policy​

Restrictive Policy (Recommended for Production)​

Setup via AWS CLI​

EC2 Instance Role​

Provider Aliases​

VPC & Private Connectivity​

Custom AWS Service Endpoint​

VPC Endpoint Setup​

Configuration Validation​

Troubleshooting​

Common Issues​

1. "AWS credentials not configured"​

2. "SageMaker endpoint not configured"​

3. "SageMaker streaming not yet fully implemented"​

4. "SageMaker request timed out"​

5. "ThrottlingException"​

6. "Endpoint not found"​

Related Documentation​

Additional Resources​

Overview

Key Benefits

Supported Model Types

Quick Start

1. Deploy a Model Endpoint

2. Configure Environment Variables

3. Use with NeuroLink SDK

4. Use with NeuroLink CLI

Environment Variables

AWS Credentials (Required)

Region Configuration

Endpoint Configuration

Model Configuration

Request Configuration

Generation Defaults

Client Configuration

SDK Usage

Basic Generation

With Configuration Options

Testing Connectivity

CLI Usage

Basic Commands

Loop Mode

Feature Support

Streaming: NOT Supported

Embeddings: NOT Supported

Tool Use

Structured Output

Batch Inference

IAM Permissions

Minimum Required Policy

Restrictive Policy (Recommended for Production)

Setup via AWS CLI

EC2 Instance Role

Provider Aliases

VPC & Private Connectivity

Custom AWS Service Endpoint

VPC Endpoint Setup

Configuration Validation

Troubleshooting

Common Issues

1. "AWS credentials not configured"

2. "SageMaker endpoint not configured"

3. "SageMaker streaming not yet fully implemented"

4. "SageMaker request timed out"

5. "ThrottlingException"

6. "Endpoint not found"

Related Documentation

Additional Resources