The trace database

Every session appends its events to trace.sqlite in the session state directory. The trace is the system's source of truth for what happened: transcripts are renderings of it, bench scorers grade against it, and any future learn-from-usage loop mines it. The model never sees it.

Schema

One table, append-only, WAL mode (trace/log.py):

CREATE TABLE events (
    id           INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp    TEXT NOT NULL,      -- UTC ISO-8601
    kind         TEXT NOT NULL,
    payload_json TEXT NOT NULL
);

Payloads are JSON, so new event kinds and new fields are additive: no migrations, and old traces stay readable. The TraceLog class is the writer and reader API, though plain sqlite3 works too.

Event kinds

Kind	Written by	Payload
`session_start`	`Session.create`	`session_id`, `simulator`
`session_resume`	`Session.resume`	`session_id`, `simulator`
`session_title`	`Session.adopt_title`	`session_id`, `title` (from the first prompt)
`session_end`	`Session.finalize`	none
`message_user`	`TurnRunner`	`content`
`message_reasoning`	recorder middleware	`content` (the model's reasoning text)
`message_assistant`	recorder middleware	`content`
`model_usage`	recorder middleware	`input_tokens`, `output_tokens`, `total_tokens`, provider detail fields
`context_compaction`	summarization middleware / `/compact`	`messages_before`, `messages_after`, `manual` (when user-triggered)
`tool_call`	recorder middleware	`id`, `name`, `args`
`tool_result`	recorder middleware	`tool_call_id`, `name`, `content`, `status`
`hitl_request`	`TurnRunner`	the pending tool call awaiting approval
`hitl_response`	`TurnRunner`	`interrupt_id`, the decision payload
`artifact`	`plot_julia` / `recapture_plot`	`path` (relative to the session output dir), `mime`, `caption`, `tool_call_id`, `format`, `size_px`, `dpi`, `slot`, `source_code`
`attempt`	`record_attempt`	`id`, `parent_id`, `rationale`, `parameters_changed`, `metrics`, `candidate_path`, `plot_artifact_path`, `notes`

The recorder is an agent middleware (trace/recorder.py), so it observes the same stream the model produces: every model turn and every tool round-trip, including tool errors (a raised tool exception is recorded as an error result, not lost).

Two kinds carry the domain structure that makes the trace more than a chat log. artifact ties every figure to the exact code that produced it. attempt records one step of a parameter investigation, with a parent_id so calibration runs form a tree. The investigation report and the bench's process scorers both read that structure.

Consumers

jutul-agent transcript renders the trace as HTML or markdown, with --bundle zipping the referenced artifacts alongside.
Bench scorers read tool calls, arguments, artifacts, and attempts to verify the agent did the work it claims (evaluation).
model_usage events make token cost per turn and per workflow measurable in real sessions, not just bench runs.

Reading one

from jutul_agent.trace import TraceLog

log = TraceLog(path_to_trace)
for event in log.iter_events():
    print(event.timestamp, event.kind, event.payload)
log.close()

Traces live under the state home, at workspaces/<hash>/sessions/<id>/trace.sqlite (see configuration), and are plain files: copy one next to a bug report and the whole session comes with it.