AgentWatch — AI Agent Observability Platform

OpenTelemetry hooks wrap every LangGraph node — capturing LLM calls, tool invocations, token counts, step latency, and errors with full structured context.

OpenTelemetry + LangGraph

02

Stream to Splunk HEC

Events index in real time via HTTP Event Collector with sourcetype agentwatch:otel. Every reasoning step becomes a searchable, structured log. 2,299+ real events confirmed.

Splunk MCP Server · HEC

03

Detect Anomalies

Splunk's native anomalydetection command runs on tool call frequency time-series. Caught a 139-call spike with 99.25% confidence — zero manual thresholds.

Splunk AI Toolkit

04

Explain in Plain English

Foundation-Sec-1.1-8B reasons over the anomaly context — what happened, root cause, recommended fix, and severity score. One click from alert to actionable engineering guidance.

Foundation-Sec-1.1-8B

05

Query with Natural Language

"Show me all loops in the last hour" → auto-generated SPL → results in seconds. No SPL expertise required. The AI Assistant translates intent into precision queries.

Splunk AI Assistant

agent_runner.py — loop mode

$ python agent_runner.py --mode loop

# OTel hooks active — events streaming to Splunk HEC

step_start research trust=100%

llm_call research trust= 96%

tool_call search_tool trust= 88%

tool_call search_tool trust= 62%

tool_call search_tool trust= 44%

tool_call search_tool trust= 18%

tool_call search_tool trust= 5%

⚠ ANOMALY DETECTED [confidence: 99.25%]

search_tool called 23x in 4.1 seconds

Splunk anomalydetection fired on time-series spike

# Foundation-Sec-1.1-8B root cause analysis:

"Agent stuck at query refinement loop.

No exit condition when search returns empty.

Fix: add empty-result guard at step 3."

✓ Events indexed: 2,299

✓ Anomalies: 342 · Trust: 58.1%

✓ Tokens tracked: 279,993

// Capabilities

EVERY FAILURE
MODE, CAUGHT

From infinite loops to silent confidence collapse — AgentWatch surfaces what your logs can't.

🔁

Loop Detection

Splunk's anomalydetection command monitors tool-call frequency time-series. Catches a 139-call spike at 99.25% confidence — before the API bill arrives.

📈

Token Spike Alerts

Per-step token tracking catches unbounded context growth before it hits rate limits. Every LLM call tagged with token count, model, and step ID.

🐌

Latency Drift

Step latency tracked per event in OTel. Progressive slowdowns surface in the trace timeline before they become P99 incidents.

🧠

Live Brain Graph

Three.js force-directed visualization of your agent's reasoning, live. Green = healthy. Yellow = warning. Red = anomaly. Click any node to inspect tokens, latency, and trust.

🔢

Trust Scoring

Every node scores 0–100% based on call patterns, token usage, and error rates. Composite score across the agent surface reveals risk at a glance.

💀

Silent Failure Capture

Unhandled exceptions, tool errors, and confidence collapses — all indexed in Splunk with full stack trace context and structured fields for instant SPL queries.

🔍

Natural Language SPL

"Which tools have the lowest trust scores?" → SPL generated → results in seconds. The Splunk AI Assistant bridges intent and query without learning SPL syntax.

🔬

Foundation-Sec Explainer

Foundation-Sec-1.1-8B reads the anomaly context and returns: what happened, why it happened, severity, and the specific engineering fix. Not generic advice — specific to your run.

🕸

Multi-Agent Topology

3D network view across all agents — 48 nodes, 47 edges, 7 anomaly paths. Anomaly paths highlighted in red. See how agent hubs connect across your system at a glance.

⚠ Anomaly · search_tool

Loop detected — called 23× in 4.1s

trust_score: 0.05 · anomalydetection confidence: 99.25%

✓ Healthy · calculator_tool

28.5 × 1.43 = 40.755B — result returned

trust_score: 0.92 · 1 call · 12ms

🧠 LLM Call · research

847 tokens · gpt-4o-mini · 690ms

trust_score: 0.85

✓ Step · synthesis

duration: 700ms · step complete

trust_score: 0.90 · 0 anomalies

// Real Data — Not Simulated

LIVE RESULTS
FROM SPLUNK

Every number verified from the live agentwatch index. Real events. Real anomalies. Real cost.

2,299

Events Indexed

342

Anomalies Caught

58.1%

Avg Trust Score

280K

Tokens Tracked

<$0.01

Est. Cost

🧠

TRY IT YOURSELF

Click "Loop" on the live dashboard and watch AgentWatch catch the anomaly in under one second. No setup. No login. Runs on Railway.

🚀 Launch Live Brain 📊 Agent Ops → 🕸 Topology Map →

YOUR AGENTS ARE BEING WATCHED

AGENTS ARE
INVISIBLE

THREE VIEWS,
ONE SYSTEM

FIVE STEPS FROM
EVENT TO INSIGHT

Instrument the Agent

Stream to Splunk HEC

Detect Anomalies

Explain in Plain English

Query with Natural Language

EVERY FAILURE
MODE, CAUGHT

Loop Detection

Token Spike Alerts

Latency Drift

Live Brain Graph

Trust Scoring

Silent Failure Capture

Natural Language SPL

Foundation-Sec Explainer

Multi-Agent Topology

LIVE RESULTS
FROM SPLUNK

START
WATCHING

YOUR AGENTS ARE BEING WATCHED

AGENTS AREINVISIBLE

THREE VIEWS,ONE SYSTEM

FIVE STEPS FROMEVENT TO INSIGHT

Instrument the Agent

Stream to Splunk HEC

Detect Anomalies

Explain in Plain English

Query with Natural Language

EVERY FAILUREMODE, CAUGHT

Loop Detection

Token Spike Alerts

Latency Drift

Live Brain Graph

Trust Scoring

Silent Failure Capture

Natural Language SPL

Foundation-Sec Explainer

Multi-Agent Topology

LIVE RESULTSFROM SPLUNK

STARTWATCHING

AGENTS ARE
INVISIBLE

THREE VIEWS,
ONE SYSTEM

FIVE STEPS FROM
EVENT TO INSIGHT

EVERY FAILURE
MODE, CAUGHT

LIVE RESULTS
FROM SPLUNK

START
WATCHING