Claude Proxy Observability
This guide explains how to read the OpenObserve dashboard used to operate the NeuroLink Claude proxy.
Source Of Truth
- Dashboard definition:
docs/assets/dashboards/neurolink-proxy-observability-dashboard.json - Live dashboard title:
NeuroLink Proxy Observability - Default time range:
Last 30 minutes
First-Time Local Setup
For a fresh local setup, use the NeuroLink-owned helper in scripts/observability/ instead of borrowing telemetry files from another repo.
- Optional: copy
scripts/observability/proxy-observability.env.exampletoscripts/observability/proxy-observability.envonly if your local ports or credentials need to differ from the defaults. - Start the local OpenObserve stack and import the dashboard:
neurolink proxy telemetry setup
- Point the proxy at the dedicated proxy collector before starting it:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:14318
The helper defaults to a dedicated collector port set (14317/14318/14333) so it does not fight with another local stack such as Curator. If you changed ports in proxy-observability.env, use the OTLP HTTP endpoint printed by the setup command.
Useful follow-up commands:
neurolink proxy telemetry start
neurolink proxy telemetry stop
neurolink proxy telemetry status
neurolink proxy telemetry logs
neurolink proxy telemetry import-dashboard
Repo-local shortcuts are also available:
pnpm run proxy:observability:setup
pnpm run proxy:observability:status
What Is Portable vs Instance-Specific
Portable:
- The dashboard query logic
- The stream names listed below
- The proxy log and trace fields used for correlation
- The helper scripts under
scripts/observability/
Instance-specific:
- OpenObserve URL, login, ports, container names, and volume names
- Compose project name if you intentionally want multiple local stacks in parallel
- Dashboard IDs and owners assigned by the target OpenObserve instance at import time
- The process manager used to run the proxy locally, such as
launchdon macOS
The helper scripts/observability/import-openobserve-dashboard.mjs strips dashboardId, owner, and created from the checked-in JSON before importing it, so the repo file can be reused on a different machine without editing those fields first.
Active OpenObserve Streams
Use these streams when validating or updating the dashboard:
- Logs:
neurolink_proxy - Traces:
neurolink_proxy - Metrics:
proxy_requests_total,proxy_errors_total,proxy_retries_total,proxy_request_duration_ms_sum,proxy_request_body_bytes_sum,proxy_cost_usd_total,proxy_tokens_cache_read,proxy_tokens_cache_creation
Do not point dashboard panels at the stale log stream neurolink_proxy_logs unless it has been intentionally revalidated.
Local Log Families And Query Rules
~/.neurolink/logs/proxy-YYYY-MM-DD.jsonlholds final request summaries. These are the rows the dashboard is built around.~/.neurolink/logs/proxy-attempts-YYYY-MM-DD.jsonlholds per-upstream-attempt diagnostics. Use it when retries or account rotation need debugging.~/.neurolink/logs/proxy-debug-YYYY-MM-DD.jsonlis the redacted index for captured request and response bodies.~/.neurolink/logs/bodies/YYYY-MM-DD/<request-id>/*.json.gzstores the corresponding redacted body artifacts.- In OpenObserve, body captures arrive in the same
neurolink_proxylog stream withevent.name=proxy.body_capture, so request panels must filter to request-summary rows, for examplehttp_method IS NOT NULL. - Attempt logs are local-only on purpose. They should help explain retries without inflating dashboard request counts.
What This Dashboard Should Answer
Use the dashboard to answer seven operational questions:
- Is proxy traffic flowing right now?
- Are users seeing failures, rate limits, or overloaded responses?
- Is latency degrading for everyone, or only for a specific model or account?
- Is fill-first routing concentrating traffic on one account as expected?
- Are OTEL metrics still exporting correctly, or are logs and metrics diverging?
- Is prompt cache reuse healthy, or are we paying too much cache creation cost?
- Which traces should you open when you need request-level debugging?
How To Read Each Tab
Traffic & Health
Read this tab first.
Requests in Rangetells you whether volume changed.Failed Request Sharegives the top-line user-facing reliability signal.Mean Request Latency (s)tells you whether users are feeling slowness.Overloaded Responseshelps separate provider saturation from generic failures.Request TrendandRequests by Modelexplain whether a spike or a model mix shift caused the change.
Failures & Rate Limits
Use this tab when reliability drops.
429 Rate-Limit Responsesmeans account or upstream rate pressure.Failures by HTTP Statusseparates auth issues (401and403), rate limits (429), and transient upstream failures (5xx).Failures by Account / Routeshows whether one account or fallback route is poisoning the pool.Failure Trendtells you whether the issue is a short burst or a sustained incident.
Latency & Throughput
Use this tab to judge user experience and saturation.
P95 Request Latency (s)is the best early warning signal for degraded UX.Throughput Trendpaired withLatency Trendtells you whether higher traffic is driving slower responses.Mean Latency by Model (s)andMean Latency by Account / Route (s)isolate whether the slowdown is model-specific or account-specific.
Accounts & Routing
Use this tab to understand fill-first routing behavior.
Requests on Busiest Account / Routeshould usually be high because the proxy intentionally fills one account before rotating.Accounts / Routes Usedshows whether the pool is spreading traffic or mostly staying on one account.Failure Share by Account / Routetells you whether one account or fallback route should be re-authenticated, disabled, or investigated.Tokens by Account / Route (k)helps explain quota pressure and uneven load.- When
account_nameis empty, these panels fall back toaccount_typeso non-Anthropic routes do not appear as blank pseudo-accounts.
Telemetry Cross-Check
Use this tab to validate the OTEL export path itself.
- These panels are shown as per-window OTEL deltas, not raw cumulative counter values.
Metric Requests in Range,Metric Errors in Range, andMetric Retries in Rangeshould broadly agree with the earlier log-derived charts.- If
Metric Request Trend (5m)is flat whileRequest Trend (5m)is moving, the metrics pipeline is broken or delayed. - If costs or request body volume stop moving here while logs keep arriving, OTEL metrics are unhealthy even if log export still works.
Tokens, Cache & Cost
Use this tab to understand workload mix and cache behavior.
Prompt Tokens (M)is prompt-side volume in millions: uncached input plus cache writes plus cache reads.Cached Prompt Tokens Reused (M)is actual cache reuse. These tokens were read from an existing prompt cache entry.Cached Prompt Tokens Written (M)is cache population. These tokens were written into a new cache entry on that request and can be reused by later requests.Cache Reuse Ratiois reused cache tokens divided by newly written cache tokens. Values above1mean reuse is outpacing cache writes.Mean Total Tokens per Requestis average prompt-side plus output token volume per request, shown as raw tokens.Input vs Output Tokens per Request (5m)compares average input and output tokens per request as raw tokens, which is easier to read than total prompt-side volume when cache reuse is large.Cache Reuse vs Cache Write Trend (5m, M)keeps cache movement on its own scale so cache traffic does not flatten the input/output chart.Token Volume by Model (M)tells you which model families are driving token volume.Top Sessions by Token Volume (M)helps identify unusually heavy sessions for trace drilldown.Input vs Output Tokens by Account / Routeshows raw token totals by real account or fallback route, with internal final rows excluded.
Trace Drilldown
Use this tab after you know there is a problem and need request-level evidence.
Slowest Operations by Mean Durationis the best starting point for deep latency debugging.Span Status Mixtells you whether failures are surfacing in traces as well as logs.Span Volume by OperationandTrace Volume Trendhelp confirm whether the trace pipeline matches traffic volume.
Key Correlation Fields
These fields matter most when moving between logs, metrics, and traces:
_timestamp: event time in OpenObserverequest_id: request-level correlation key in proxy logstrace_id: cross-signal trace correlation keyspan_id: specific span correlation keyevent.name: distinguishes request summaries fromproxy.body_capturedebug events in the shared OpenObserve log streamaccount_name: which account handled the requestai_model: which model served the requestai_input_tokens: prompt/input tokensai_output_tokens: completion/output tokensai_cache_creation_tokens: tokens spent creating cache entriesai_cache_read_tokens: tokens served from cache
When a caller injects traceparent plus x-neurolink-session-id / x-neurolink-user-id / x-neurolink-conversation-id, the proxy attaches its spans to that upstream trace and preserves session-level attribution across SDK and proxy telemetry.
ai_cache_creation_tokens means prompt tokens written into a new cache entry.
ai_cache_read_tokens means prompt tokens reused from an existing cache entry.
All latency and duration panels are shown in whole seconds for faster scanning.
Counts and token-heavy charts default to whole numbers when practical, while ratios, costs, and million/MB rollups are capped at two decimals.
Common Interpretation Patterns
- Rising
Failed Request Sharewith flat traffic usually means a real reliability regression, not just more volume. - Rising
429 Rate-Limit Responseswith high load on one account usually means the pool is exhausting the primary account as designed. - Log traffic moving while the telemetry tab is flat means the OTEL metrics path needs attention.
- Rising
Cache Creation Tokenswithout matchingCache Read Tokensmeans prompt reuse is weak or the cache is still warming. - A slow chart on the latency tab plus the same operation on the trace tab gives you the fastest path to a concrete trace investigation.