AI Agent Observability Platform
AgentWatch wraps any LangGraph agent with OpenTelemetry, streams every decision into Splunk in real time, detects loops and anomalies automatically, and explains them in plain English using Foundation-Sec.
AI agents fail silently. Without observability, a loop runs forever, tokens spiral, costs explode — and you find out from the invoice.
Three live interfaces running on Railway. No login. No setup. Click and watch your agent's brain in real time.
Four Splunk AI tools in a coherent pipeline — instrument, stream, detect, explain, query.
OpenTelemetry hooks wrap every LangGraph node — capturing LLM calls, tool invocations, token counts, step latency, and errors with full structured context.
OpenTelemetry + LangGraphEvents index in real time via HTTP Event Collector with sourcetype agentwatch:otel. Every reasoning step becomes a searchable, structured log. 2,299+ real events confirmed.
Splunk's native anomalydetection command runs on tool call frequency time-series. Caught a 139-call spike with 99.25% confidence — zero manual thresholds.
Foundation-Sec-1.1-8B reasons over the anomaly context — what happened, root cause, recommended fix, and severity score. One click from alert to actionable engineering guidance.
Foundation-Sec-1.1-8B"Show me all loops in the last hour" → auto-generated SPL → results in seconds. No SPL expertise required. The AI Assistant translates intent into precision queries.
Splunk AI AssistantFrom infinite loops to silent confidence collapse — AgentWatch surfaces what your logs can't.
Splunk's anomalydetection command monitors tool-call frequency time-series. Catches a 139-call spike at 99.25% confidence — before the API bill arrives.
Per-step token tracking catches unbounded context growth before it hits rate limits. Every LLM call tagged with token count, model, and step ID.
Step latency tracked per event in OTel. Progressive slowdowns surface in the trace timeline before they become P99 incidents.
Three.js force-directed visualization of your agent's reasoning, live. Green = healthy. Yellow = warning. Red = anomaly. Click any node to inspect tokens, latency, and trust.
Every node scores 0–100% based on call patterns, token usage, and error rates. Composite score across the agent surface reveals risk at a glance.
Unhandled exceptions, tool errors, and confidence collapses — all indexed in Splunk with full stack trace context and structured fields for instant SPL queries.
"Which tools have the lowest trust scores?" → SPL generated → results in seconds. The Splunk AI Assistant bridges intent and query without learning SPL syntax.
Foundation-Sec-1.1-8B reads the anomaly context and returns: what happened, why it happened, severity, and the specific engineering fix. Not generic advice — specific to your run.
3D network view across all agents — 48 nodes, 47 edges, 7 anomaly paths. Anomaly paths highlighted in red. See how agent hubs connect across your system at a glance.
Every number verified from the live agentwatch index. Real events. Real anomalies. Real cost.