PR Analysis & Commit Plan
What the PR contains
Total scope
- 28 modified files (existing core files updated)
- 6 new file groups (4 new provider files + 1 test suite + 1 shell script + docs/)
- ~700 insertions, ~80 deletions in modified files
- ~1000 LOC in new provider files
- ~870 LOC in new test file
- 13 new Markdown docs (~150KB)
Risk level: medium
- Touches public SDK API (new providers visible at runtime via the
AIProviderName.DEEPSEEKconstant or the string id"deepseek", etc.) - Modifies shared pricing logic (
pricing.ts_defaultfallback) — could affect other providers - Modifies 12 test suite files (changes shared
PROVIDER_MAX_TOKENSmap and fallback) - Test changes are backwards-compatible — same tests pass for existing providers
Files to commit
A. New provider implementations (4 files, ~1000 LOC)
src/lib/providers/deepseek.ts (≈250 lines)
src/lib/providers/nvidiaNim.ts (≈310 lines, NIM-specific extras + retry-on-400)
src/lib/providers/lmStudio.ts (≈330 lines, /v1/models auto-discovery)
src/lib/providers/llamaCpp.ts (≈330 lines, /health 3x retry + auto-discovery)
All four:
- Extend
BaseProvider - Use
createOpenAI(...).chat()for /v1/chat/completions endpoint (NOT /v1/responses) - Wrap
streamTextwithwithClientSpanfor OTEL tracing - Emit
neurolink.provider.streamspan with propergen_ai.*attrs - Use
makeLoggingFetchto capture upstream non-2xx response bodies - Implement all 5
BaseProviderabstract methods (executeStream, getProviderName, getDefaultModel, getAISDKModel, formatProviderError)
B. Core integration changes (10 modified files)
| File | Change |
|---|---|
src/lib/factories/providerRegistry.ts | 4 dynamic-import registrations (per CLAUDE.md rule #1) |
src/lib/types/providers.ts | NeurolinkCredentials extended + new NvidiaNimExtraBody type |
src/lib/constants/enums.ts | AIProviderName enum + 4 model enums |
src/lib/constants/contextWindows.ts | 4 sections with model context windows |
src/lib/utils/pricing.ts | 4 entries + _default sentinel as provider-level fallback. Local providers (lm-studio / llamacpp) _default rates are 0 (no upstream USD price) and hasPricing() returns false for zero-rate entries so callers correctly treat them as non-billable. |
src/lib/utils/providerConfig.ts | 4 createXConfig helpers |
src/lib/utils/modelChoices.ts | TOP_MODELS_CONFIG + MODEL_ENUMS entries |
src/lib/adapters/providerImageAdapter.ts | VISION_CAPABILITIES (vision unsupported) |
src/cli/factories/commandFactory.ts | provider choices in 3 spots |
src/lib/providers/index.ts | barrel exports |
C. Test infrastructure fixes (15 modified files)
The biggest fix — 12 test files had PROVIDER_MAX_TOKENS[provider] || 8192. For our local providers (8K context window), this set maxTokens = contextWindow → availableInputTokens = 0 → every memory/context/mcp test failed instantly with "Budget: 0 tokens". The bug existed for any unknown provider (silent broken test).
test/continuous-test-suite-context.ts← Budget=0 fix + Vertex Compaction tests now generictest/continuous-test-suite-evaluation.ts← Budget=0 fix + dimension-specific RAGAS judge promptstest/continuous-test-suite-evaluation-scoring.ts← Budget=0 fixtest/continuous-test-suite-mcp.ts← added new providers to maptest/continuous-test-suite-mcp-http.ts← Budget=0 fixtest/continuous-test-suite-media-gen.ts← Budget=0 fixtest/continuous-test-suite-memory.ts← Budget=0 fixtest/continuous-test-suite-observability.ts← Budget=0 fix +TEST_TIMEOUT_MSenv vartest/continuous-test-suite-ppt.ts← Budget=0 fixtest/continuous-test-suite-providers.ts← Gemini 3 DisableTools generic, Observability Spans Pipeline-A skiptest/continuous-test-suite-session-memory-bugs.ts← Budget=0 fixtest/continuous-test-suite-tts.ts← Budget=0 fixtest/continuous-test-suite-workflow.ts← Budget=0 fixtest/continuous-test-suite-client.ts← passtimeout: TEST_CONFIG.timeouttocreateServertest/continuous-test-suite-credentials.ts←credKeysextended
D. New tests + tooling
test/continuous-test-suite-new-providers.ts (NEW, 868 lines, 10 sections × 4 providers)
test/run-provider-matrix.sh (NEW, bash 3.2-compat matrix runner)
E. Config (3 files)
.env.example— 4 provider env-var sections with commentspackage.json— 4 new test scripts (test:dynamic,test:proxy,test:bugfixes,test:new-providers) + js-yaml deppnpm-lock.yaml— 43-line diff for js-yaml + @types/js-yaml
F. Documentation (15 markdown files)
docs/provider-integration/
README.md
00-architecture.md
01-shared-changes.md
02-deepseek.md, 03-nvidia-nim.md, 04-lm-studio.md, 05-llamacpp.md
06-testing.md
07-implementation-order.md
08-feature-matrix.md
09-test-suite-spec.md
10-test-results-final.md
11-test-failure-investigation.md
12-pr-analysis.md (this file)
13-code-review.md
Cleanup performed
| Item | Action |
|---|---|
README.md was clobbered to 1 line (test writeFile tool overwrote it) | Restored from origin/release |
100+ test artifact files in repo root (Turn 5 done.txt, Charles_Babbage.txt, Haskell programming language, etc.) | Deleted via git clean -fd |
test-results-v3..v13/ dirs (per-environment test outputs) | Deleted; only FINAL.md and INVESTIGATION.md kept (moved into docs/provider-integration/) |
test/run-iter{3..13}.sh debug runners | Deleted; only run-provider-matrix.sh kept |
Stray test fixtures (test/data.csv, test/output.txt, test/test.txt) | Deleted (writeFile artifacts, not fixtures) |
debug-budget.ts debug script | Deleted |
Suggested commit plan
Option A — Single atomic commit (smaller PR, faster review)
feat(providers): integrate DeepSeek, NVIDIA NIM, LM Studio, llama.cpp
- Add 4 OpenAI-compatible providers via Vercel AI SDK's createOpenAI().chat()
- Each provider extends BaseProvider, emits neurolink.provider.stream OTEL spans,
and ships with proper error formatting + validateConfiguration
- Pricing: add _default sentinel as provider-level fallback + symbolic local rates
- Tests: fix maxTokens fallback that defaulted unknown providers to context-overflow
value (8192), making every memory/context/mcp test fail instantly with "Budget: 0"
- Tests: make Gemini 3 DisableTools and Vertex Compaction tests provider-agnostic
- Tests: handle Pipeline A providers (AI SDK + Langfuse OTEL) in Observability Spans
- Tests: dimension-specific RAGAS judge prompts so context-precision actually
evaluates the context (not the answer)
- Add test:dynamic, test:proxy, test:bugfixes scripts
- Install js-yaml runtime dep (proxy Config Loading test fixture)
Option B — Atomic logical commits (cleaner history, slower review)
1. feat(providers): add DeepSeek provider [+ deepseek.ts + 1 enum + 1 pricing + 1 model + .env.example + commandFactory + barrel]
2. feat(providers): add NVIDIA NIM provider [+ nvidiaNim.ts + same supporting changes]
3. feat(providers): add LM Studio provider [+ lmStudio.ts + supporting]
4. feat(providers): add llama.cpp provider [+ llamaCpp.ts + supporting]
5. fix(pricing): support _default sentinel as fallback [pricing.ts findRates]
6. fix(test): correct maxTokens fallback (8192 → 1024) [13 test files]
7. fix(test): make Gemini 3 + Vertex tests provider-agnostic [providers.ts + context.ts]
8. fix(test): handle Pipeline A providers in observability [providers.ts]
9. fix(test): dimension-specific RAGAS judge prompts [evaluation.ts]
10. fix(test): pass Hono server timeout to client suite [client.ts]
11. chore: add test scripts + js-yaml dep [package.json + lock]
12. test: add provider matrix runner + cross-provider suite [new-providers.ts + run-provider-matrix.sh]
13. docs: provider integration architecture [docs/provider-integration/]
Recommendation: Option B for maintainability. The _default pricing fix and the test maxTokens fix are independently useful (could be backported separately if needed) and easier to revert if anything breaks.
PR description (proposed)
## Summary
- Integrate 4 new OpenAI-compatible AI providers: DeepSeek, NVIDIA NIM, LM Studio, llama.cpp
- All four use Vercel AI SDK's createOpenAI().chat() for /v1/chat/completions
- Includes pricing entries, vision capability flags, model enums, CLI choices, and full
end-to-end test coverage via test/continuous-test-suite-new-providers.ts
- Bonus: 10+ pre-existing test infrastructure bugs uncovered and fixed during validation
(the biggest: maxTokens fallback set unknown providers' max-tokens to their full context
window → 0 input budget for memory/context/mcp tests)
## What's new
- Providers: deepseek, nvidia-nim, lm-studio, llamacpp
- Pricing: _default sentinel as provider-level fallback (covers all 4 + future additions)
- Tests: full matrix via test/run-provider-matrix.sh (9 suites × 4 providers)
- Docs: docs/provider-integration/ — 15 architecture/implementation markdown files
## Test plan
- [x] `pnpm run check` (TypeScript) — 0 errors
- [x] `pnpm run lint` — 0 errors, 19 pre-existing warnings (max-lines-per-function on long
methods that already existed)
- [x] All 4 providers smoke-tested via direct SDK calls
- [x] 9 test suites × 4 providers via test/run-provider-matrix.sh — sub-test pass rate
≈99% per provider (see docs/provider-integration/10-test-results-final.md)
- [x] All 4 explicit sub-test failures investigated (see 11-test-failure-investigation.md):
- 2 were real test bugs (RAGAS prompt + Pipeline A skip)
- 1 was the same Budget=0 bug as the memory test
- 1 was env (LM Studio server crashed mid-run; reproducible PASS when up)
Verification status
| Check | Status |
|---|---|
pnpm run check | ✅ 0 errors, 0 warnings, 3632 files |
pnpm run format | ✅ all files formatted |
pnpm run lint | ✅ 0 errors, 19 pre-existing warnings (none from this PR) |
pnpm run build | ✅ dist/ regenerated successfully |
| Provider matrix run | ✅ ~99% sub-test pass rate per provider after fixes (see 10-test-results-final.md) |
| Real failure investigation | ✅ all 4 explicit sub-test fails investigated (see 11-test-failure-investigation.md) |
| Linter formatting issues | ✅ resolved via pnpm run format |
| README clobber repaired | ✅ restored from origin/release |
| Test artifacts cleaned | ✅ ~120 stray writeFile-tool outputs deleted |
git status clean | ✅ only legitimate changes remain (28 modified + 6 new groups) |
Open questions / items pending user decision
- Commit strategy: Option A (single atomic) or Option B (13 logical commits)?
- Include test-results-v13 docs in PR? (Currently moved to
docs/provider-integration/10-*and11-*. They document the iteration trail but aren't strictly needed for the implementation.) run-provider-matrix.sh: include in PR or not? It's useful for CI but adds a shell script to test/..env.example: did we pick reasonable env-var names? (DEEPSEEK_API_KEY,NVIDIA_NIM_API_KEY,LM_STUDIO_BASE_URL,LLAMACPP_BASE_URL)Should we ship theResolved. Replaced by making(this as unknown as { modelName: string }).modelName = modelToUse;cast escape in lmStudio.ts and llamaCpp.ts?BaseProvider.modelNamemutable and addingrefreshHandlersForModel(model). See "Items addressed by this PR" below.
Items deliberately deferred (NOT in this PR)
- Re-running 17 untested test suites (
test:bugfixes,test:autoresearch-*,test:issue-*, etc.) for the 4 providers. None are currently expected to fail given the test-infra fixes. - Server-keep-alive watchdog for LM Studio in test runner (operational, not code).
- Image generation / TTS support for these providers (capability gap; not supported by the providers themselves).
Items addressed by this PR (originally deferred, then included)
- The original "clamp
getOutputReserveto 80% of the context window" attempt was reverted after review: capping the reserve madegetAvailableInputTokens()advertise more headroom than the outgoing request actually allocates, letting oversized prompts pass preflight and fail upstream. The active mitigation is the per-suite test fix loweringPROVIDER_MAX_TOKENS[…] || 8192to… || 1024(12 test files +continuous-test-suite.ts:5099), which keeps unmapped providers within their context window without changing SDK behavior. BaseProvider.modelNameno longerreadonly, plus a newrefreshHandlersForModel(model)that rebuilds composed handlers (MessageBuilder,StreamHandler,TelemetryHandler,GenerationHandler,Utilities) so auto-discovery providers (lm-studio, llamacpp) propagate the resolved model into pricing / span / log metadata. Replaces the earlier(this as unknown as { modelName: string }).modelName = ...workaround.