The trace database
Every session appends its events to trace.sqlite in the session state
directory. The trace is the system's source of truth for what happened:
transcripts are renderings of it, bench scorers grade against it, and any
future learn-from-usage loop mines it. The model never sees it.
Schema
One table, append-only, WAL mode (trace/log.py):
CREATE TABLE events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL, -- UTC ISO-8601
kind TEXT NOT NULL,
payload_json TEXT NOT NULL
);
Payloads are JSON, so new event kinds and new fields are additive: no
migrations, and old traces stay readable. The TraceLog class is the
writer and reader API, though plain sqlite3 works too.
Event kinds
| Kind | Written by | Payload |
|---|---|---|
session_start |
Session.create |
session_id, simulator |
session_resume |
Session.resume |
session_id, simulator |
session_title |
Session.adopt_title |
session_id, title (from the first prompt) |
session_end |
Session.finalize |
none |
message_user |
TurnRunner |
content |
message_reasoning |
recorder middleware | content (the model's reasoning text) |
message_assistant |
recorder middleware | content |
model_usage |
recorder middleware | input_tokens, output_tokens, total_tokens, provider detail fields |
context_compaction |
summarization middleware / /compact |
messages_before, messages_after, manual (when user-triggered) |
tool_call |
recorder middleware | id, name, args |
tool_result |
recorder middleware | tool_call_id, name, content, status |
hitl_request |
TurnRunner |
the pending tool call awaiting approval |
hitl_response |
TurnRunner |
interrupt_id, the decision payload |
artifact |
plot_julia / recapture_plot |
path (relative to the session output dir), mime, caption, tool_call_id, format, size_px, dpi, slot, source_code |
attempt |
record_attempt |
id, parent_id, rationale, parameters_changed, metrics, candidate_path, plot_artifact_path, notes |
The recorder is an agent middleware (trace/recorder.py), so it observes
the same stream the model produces: every model turn and every tool
round-trip, including tool errors (a raised tool exception is recorded as
an error result, not lost).
Two kinds carry the domain structure that makes the trace more than a
chat log. artifact ties every figure to the exact code that produced it.
attempt records one step of a parameter investigation, with a
parent_id so calibration runs form a tree. The investigation report and
the bench's process scorers both read that structure.
Consumers
jutul-agent transcriptrenders the trace as HTML or markdown, with--bundlezipping the referenced artifacts alongside.- Bench scorers read tool calls, arguments, artifacts, and attempts to verify the agent did the work it claims (evaluation).
model_usageevents make token cost per turn and per workflow measurable in real sessions, not just bench runs.
Reading one
from jutul_agent.trace import TraceLog
log = TraceLog(path_to_trace)
for event in log.iter_events():
print(event.timestamp, event.kind, event.payload)
log.close()
Traces live under the state home, at
workspaces/<hash>/sessions/<id>/trace.sqlite (see
configuration), and are plain files: copy one next to a
bug report and the whole session comes with it.