Skip to request form

Shadow Evaluator · Diagnostic Service

Offline routing audits for agent systems.

Qalbun analyzes historical agent logs to identify candidate inference savings, under-evidenced routing decisions, and validation-ready hypotheses — without changing production systems.

Request a DiagnosticSee How It Works

The problem

Common situations where Shadow Evaluator helps

If your team is spending high-cost model inference on tasks that may not require it, routing to expensive components by default because there is no evidence layer, or unable to measure whether your agent routing is cost-effective — Shadow Evaluator tells you what your logs can actually prove.

How it works

Three steps, fully offline.

  1. Step 01

    Profile your logs

    We classify your data into one of four readiness tiers — audit-only, cost-opportunity, outcome-informed, or live-shadow-ready — so every later claim has a known evidence footing.

  2. Step 02

    Run an offline routing audit

    We reconstruct the routing decisions your system actually made and separate observed facts from estimates, with each claim tied to its source data.

  3. Step 03

    Choose the next step

    You receive a written report, an explicit list of evidence gaps, and a recommendation on whether a live shadow pilot is warranted.

What you receive

A written, reproducible diagnostic.

  • Data-readiness and claims-budget classification
  • Observed routing and component usage patterns
  • Estimated cost opportunities when your logs support them
  • Under-evidenced decisions and instrumentation gaps
  • Validation-ready hypotheses for a live shadow pilot
  • Reproducibility manifest for auditability

Who it is for

Teams running real routing decisions.

Built for teams running agent systems, model routers, tool-using LLM workflows, or multi-component AI systems with recurring inference decisions.

Best fit: at least 30 days of structured logs covering routing decisions, with at minimum cost or token signals or downstream outcome labels. Partial coverage is workable — the audit will tell you exactly which claims your data can support and which it cannot.

Request

Request a Diagnostic Review.

Tell us about your system. We’ll respond with scope, data handling, and next steps within two business days.

Privacy: this form is used solely to scope and respond to your request. Submissions are not shared with third parties. Please do not include logs, secrets, credentials, or customer data in any field — we will agree on a transfer method in writing if logs need to change hands.

High-level description only — components, routing decisions, volume. Do not paste logs or sample data.

Do your logs record which component or model was selected for each decision?
Do your logs include cost, token counts, or latency for each call?
Do your logs include task outcomes, review labels, or downstream success signals?