Veval captures real production traces and turns them into a regression baseline. Ship prompt changes with confidence.
Start free →No credit card required
The Problem
"I updated my prompt and something broke in production — but my tests all passed."
LLM outputs change when prompts change
Traditional unit tests can't catch regressions
You find out when users complain
How It Works
Instrument your LLM agent with the Veval SDK. Every production run is recorded as a trace.
Mark any trace as a regression test. Build a baseline from real production behaviour.
Run your test suite in CI. Veval replays traces and flags when outputs change.
import veval
veval.init(api_key="...")
@veval.trace
def run_agent(prompt: str) -> str:
...Pricing
$0/mo
$XX/mo
$XX/mo
$XX/mo