The lab: testing and improving the app without a live session
The agent's UI and runtime are normally driven by a real model and a real Julia
process, which makes iterating on them slow: you have to start a session and try
each thing by hand. The lab (jutul_agent.lab) drives the same stack
headlessly with test doubles, so an agent or a developer can render every UI state,
look at it, change something, and re-render, and can profile cold start, all without
a live session.
It is built on the shared doubles in jutul_agent.lab.fakes: a scripted model
and agent, a fake Julia session, and a fake simulator adapter. The test suite uses
the same doubles.
The TUI lab
A scenario is a named recipe of an agent plus a short sequence of interactions that puts the TUI into a state worth looking at. The lab drives one through the real Textual app and captures what it renders.
python -m jutul_agent.lab.tui list
python -m jutul_agent.lab.tui run tool_call --png
python -m jutul_agent.lab.tui all --png -o ./out
Each scenario writes three artifacts under <out>/<name>/:
screen.svg: the screenshot Textual exports.screen.txt: a plain-text view of the screen, recovered from the SVG, for grep and diffs.transcript.md: the session transcript.
With --png it also rasterises the SVG to a PNG with a headless browser (Chrome,
Chromium, or Edge), so an agent can view the UI directly. If no browser is found the
SVG and text are still written.
The self-improvement loop
This is how an agent improves the UI on its own:
python -m jutul_agent.lab.tui run <scenario> --pngand open the PNG.- Spot a problem (clutter, a leaked internal value, bad wrapping, a confusing state).
- Edit the widget or stylesheet under
src/jutul_agent/interfaces/tui/. - Re-render and compare. Run
pytest tests/test_lab.pyto confirm nothing crashes.
The approval card, for example, used to print an internal interrupt id; rendering the
approval scenario made that obvious, and the fix was one line.
Adding a scenario
Add a call to scenario(...) in jutul_agent/lab/scenarios.py with a builder
that returns a scripted agent and a tuple of steps. A step is a prompt string to
submit, or an async callable taking the Textual pilot for key presses. Keep scenarios
small and deterministic.
Live mode (real model, real Julia)
The scripted scenarios are fast and deterministic but cannot show what a live model and a live solve actually render. Live mode runs one prompt through the full stack and captures the screen:
python -m jutul_agent.lab.live "compute the mean of [1.0, 2.0, 3.0] in Julia" \
--workspace testbed/jutuldarcy --model openai:gpt-5.4 --png
It costs API and needs a workspace with an instantiated Julia env. A cold first turn
pays Julia compilation, so it can be slow; raise --timeout (and the settle window)
for heavier prompts, or warm the env first. Everything is wrapped in an overall
timeout so a hung kernel cannot stall a run.
Robustness
The scenarios tagged robustness are the fault and edge cases: a long output, a
unicode-heavy result, a tool error, a tiny terminal, a wide unbroken line, an empty
answer, and approving a pending request. tests/test_lab.py renders every
scenario, so it is a standing guarantee that the TUI does not crash on any of them.
Add a scenario for any new state you want held to that guarantee.
Profiling
Two profilers, for the two kinds of slowness.
Cold start (imports and bringing the app up):
python -m jutul_agent.lab.profile_startup
python -m jutul_agent.lab.profile_startup --module jutul_agent.interfaces.tui
It runs a fresh interpreter under -X importtime (heaviest imports, rolled up per
top-level package) and times the post-import phases (creating a session, building the
agent graph) with the fakes. The cold-start cost is dominated by third-party imports,
chiefly the Anthropic SDK (pulled eagerly by deepagents), then Textual and LangSmith.
A guard test (tests/test_cli_startup.py) keeps the CLI entry from importing the
agent stack, so light commands like --version stay fast.
The hot path (our own Python during a turn):
python -m jutul_agent.lab.profile_turn --scenario long_output --repeat 5
It cProfiles rendering a scenario through the full agent runtime and TUI with the fakes, and reports the jutul-agent functions by self-time. Use it to find a slow render or middleware path, and to catch a regression that makes turns sluggish.
Run any of these before and after a change to measure the effect.