Skip to content

The trace database

Every session appends its events to trace.sqlite in the session state directory. The trace is the system's source of truth for what happened: transcripts are renderings of it, bench scorers grade against it, and any future learn-from-usage loop mines it. The model never sees it.

Schema

One table, append-only, WAL mode (trace/log.py):

CREATE TABLE events (
    id           INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp    TEXT NOT NULL,      -- UTC ISO-8601
    kind         TEXT NOT NULL,
    payload_json TEXT NOT NULL
);

Payloads are JSON, so new event kinds and new fields are additive: no migrations, and old traces stay readable. The TraceLog class is the writer and reader API, though plain sqlite3 works too.

Event kinds

Kind Written by Payload
session_start Session.create session_id, simulator
session_resume Session.resume session_id, simulator
session_title Session.adopt_title session_id, title (from the first prompt)
session_end Session.finalize none
message_user TurnRunner content
message_reasoning recorder middleware content (the model's reasoning text)
message_assistant recorder middleware content
model_usage recorder middleware input_tokens, output_tokens, total_tokens, provider detail fields
context_compaction summarization middleware / /compact messages_before, messages_after, manual (when user-triggered)
tool_call recorder middleware id, name, args
tool_result recorder middleware tool_call_id, name, content, status
hitl_request TurnRunner the pending tool call awaiting approval
hitl_response TurnRunner interrupt_id, the decision payload
artifact plot_julia / recapture_plot path (relative to the session output dir), mime, caption, tool_call_id, format, size_px, dpi, slot, source_code
attempt record_attempt id, parent_id, rationale, parameters_changed, metrics, candidate_path, plot_artifact_path, notes

The recorder is an agent middleware (trace/recorder.py), so it observes the same stream the model produces: every model turn and every tool round-trip, including tool errors (a raised tool exception is recorded as an error result, not lost).

Two kinds carry the domain structure that makes the trace more than a chat log. artifact ties every figure to the exact code that produced it. attempt records one step of a parameter investigation, with a parent_id so calibration runs form a tree. The investigation report and the bench's process scorers both read that structure.

Consumers

  • jutul-agent transcript renders the trace as HTML or markdown, with --bundle zipping the referenced artifacts alongside.
  • Bench scorers read tool calls, arguments, artifacts, and attempts to verify the agent did the work it claims (evaluation).
  • model_usage events make token cost per turn and per workflow measurable in real sessions, not just bench runs.

Reading one

from jutul_agent.trace import TraceLog

log = TraceLog(path_to_trace)
for event in log.iter_events():
    print(event.timestamp, event.kind, event.payload)
log.close()

Traces live under the state home, at workspaces/<hash>/sessions/<id>/trace.sqlite (see configuration), and are plain files: copy one next to a bug report and the whole session comes with it.