LM Studio Provider Guide

Run any GGUF model privately on your own hardware — no cloud, no API key required

Overview

LM Studio is a desktop application that lets you download and run thousands of GGUF-format models (Llama, Mistral, Qwen, Phi, Gemma, and many more) locally on macOS, Windows, or Linux. When you start LM Studio's built-in server it exposes an OpenAI-compatible API at http://localhost:1234/v1.

NeuroLink's lm-studio provider connects to this server and automatically discovers the loaded model by calling /v1/models at request time. You do not need to specify a model name unless you want to pin a specific one.

Key Facts

Runs locally: No data leaves your machine
No API key needed: LM Studio's server accepts any key (NeuroLink sends a placeholder)
Auto-discovery: Omit model: and NeuroLink fetches the currently loaded model from /v1/models
Default base URL: http://localhost:1234/v1
Vision: Depends on the loaded model (e.g., LLaVA, Qwen-VL, Llama 3.2 Vision variants support images)
Streaming: Supported
Tool calling: Depends on the loaded model

Quick Start

1. Download and Start LM Studio

Download LM Studio from https://lmstudio.ai for your platform.
Open the app and search for a model in the Discover tab (e.g., Llama 3.2 3B Instruct).
Click Download and wait for it to complete.
Go to the Local Server tab (icon that looks like <->).
Select the model you downloaded and click Start Server.

The server starts on http://localhost:1234 by default.

2. Configure Environment (Optional)

No environment variables are required for a default setup. Optionally:

# Override the base URL if you run LM Studio on a non-default port or host
LM_STUDIO_BASE_URL=http://localhost:1234/v1

# Pin a specific model (default: auto-discover from /v1/models)
LM_STUDIO_MODEL=

# API key — only needed if you run LM Studio behind an auth-proxying reverse-proxy
LM_STUDIO_API_KEY=

3. Install NeuroLink

npm install @juspay/neurolink
# or
pnpm add @juspay/neurolink

4. Generate Your First Response

Auto-discovery: NeuroLink calls /v1/models and uses the first loaded model.

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

// Omit `model:` entirely — NeuroLink discovers the loaded model automatically
const result = await ai.generate({
  provider: "lm-studio",
  input: { text: "Explain the concept of entropy in information theory." },
});

console.log(result.content);

Model Auto-Discovery

When no model is specified (and LM_STUDIO_MODEL is empty), the provider calls GET /v1/models with a 5-second timeout. It picks the first model returned — whichever is currently loaded in LM Studio.

If discovery fails (server not running, no model loaded), the provider falls back to a "local-model" placeholder and logs a warning. The next call will re-attempt discovery, so there is no need to restart your Node process after starting LM Studio.

// Auto-discover — best for interactive local development
const result = await ai.generate({
  provider: "lm-studio",
  input: { text: "Hello!" },
  // No `model:` field
});

To pin a specific model, pass it explicitly:

const result = await ai.generate({
  provider: "lm-studio",
  model: "llama-3.2-3b-instruct", // must match the ID shown in LM Studio
  input: { text: "Hello!" },
});

SDK Usage

Basic Generation (Auto-Discover)

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

const result = await ai.generate({
  provider: "lm-studio",
  input: { text: "Write a haiku about type safety." },
});

console.log(result.content);

Streaming

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

const stream = await ai.stream({
  provider: "lm-studio",
  input: {
    text: "Explain monads in Haskell as if I am a TypeScript developer.",
  },
});

for await (const chunk of stream.stream) {
  process.stdout.write(chunk);
}

Per-Call Base URL Override

Useful if you run LM Studio on a different machine on your local network, or on a non-default port.

const result = await ai.generate({
  provider: "lm-studio",
  input: { text: "Hello from across the network!" },
  credentials: {
    lmStudio: {
      baseURL: "http://192.168.1.42:1234/v1",
    },
  },
});

If your LM Studio server is behind an auth-proxying reverse-proxy (rare), pass the key too:

const result = await ai.generate({
  provider: "lm-studio",
  input: { text: "Hello" },
  credentials: {
    lmStudio: {
      baseURL: "https://lmstudio.internal.example.com/v1",
      apiKey: "bearer-token-for-proxy",
    },
  },
});

CLI Usage

Basic Commands

# Auto-discover the loaded model
pnpm run cli generate "What is quantum entanglement?" --provider lm-studio

# Use provider aliases
pnpm run cli generate "Hello" --provider lmstudio
pnpm run cli generate "Hello" --provider lms

# Pin a model explicitly
pnpm run cli generate "Summarise the SOLID principles" \
  --provider lm-studio \
  --model llama-3.2-3b-instruct

# Interactive loop (auto-discovers model on each request)
pnpm run cli loop --provider lm-studio

# Point at a non-default server address
LM_STUDIO_BASE_URL=http://192.168.1.42:1234/v1 \
  pnpm run cli generate "Hello from network" --provider lm-studio

Provider Aliases

Alias	Example
`lm-studio`	`--provider lm-studio`
`lmstudio`	`--provider lmstudio`
`lms`	`--provider lms`

Configuration Reference

Environment Variable	Required	Default	Description
`LM_STUDIO_BASE_URL`	No	`http://localhost:1234/v1`	Base URL of the LM Studio server
`LM_STUDIO_MODEL`	No	`""` (auto-discover)	Specific model ID to use; leave blank for auto-discovery
`LM_STUDIO_API_KEY`	No	`lm-studio` (placeholder)	Auth token — only needed for reverse-proxy setups with auth

Feature Support

Feature	Supported	Notes
Text generation	Yes
Streaming	Yes
Tool calling	Model-dependent	Requires a model that understands function-call syntax
Vision / images	Model-dependent	Load a vision model (e.g., LLaVA, Qwen-VL)
Embeddings	No	Use OpenAI or another embeddings provider
Auto-discovery	Yes	Fetches active model from `/v1/models` at request time

Troubleshooting

"LM Studio server not reachable"

The server is not running or is on a different URL.

Open LM Studio and go to the Local Server tab.
Select a model and click Start Server.
Confirm the port shown (default: 1234) matches LM_STUDIO_BASE_URL.

# Test reachability
curl http://localhost:1234/v1/models

"Load a model in the LM Studio app"

LM Studio's server returned an empty model list. Go to the Local Server tab, select a model from the dropdown, and click the load/start button.

"LM Studio model X is not loaded"

You pinned a specific model ID (LM_STUDIO_MODEL or model: in SDK/CLI), but that model is not loaded in LM Studio. Either load the model in the app or leave the model field blank to use whatever is already loaded.

"LM Studio request timed out"

Large models on CPU-only machines can be very slow. Try:

A smaller quantised model (Q4 instead of Q8)
A model with fewer parameters
Increasing the timeout via NeuroLink's global timeout settings

Tool calls not working

Not all models support tool/function calling format. Load a model that was fine-tuned for instruction following and tool use (e.g., Llama 3.1, Mistral 7B Instruct v0.3). Check the model's documentation on Hugging Face for capability flags.

Overview​

Key Facts​

Quick Start​

1. Download and Start LM Studio​

2. Configure Environment (Optional)​

3. Install NeuroLink​

4. Generate Your First Response​

Model Auto-Discovery​

SDK Usage​

Basic Generation (Auto-Discover)​

Streaming​

Per-Call Base URL Override​

CLI Usage​

Basic Commands​

Provider Aliases​

Configuration Reference​

Feature Support​

Troubleshooting​

"LM Studio server not reachable"​

"Load a model in the LM Studio app"​

"LM Studio model X is not loaded"​

"LM Studio request timed out"​

Tool calls not working​

See Also​