Your First Conversation
What to try, what to expect, and how to read what's happening.
You have a Huxley agent running. This page walks you through the conversations that exercise every primitive in the framework, so you can feel what Huxley is doing as you do it.
This page assumes the Quickstart is complete and your agent is responding. If not, start there.
Try these, in order
Each prompt below exercises a different part of the framework. Watching the server logs while you do them is the fastest way to develop intuition.
1. An info question
What time is it?
This routes to the system skill's get_current_time tool. The skill returns the time as JSON; the model narrates it back in the persona's voice. This is the simplest possible round trip — one turn, one tool call, no audio side effects, no proactive behavior.
In your server log you'll see something like:
coord.ptt_start turn=t-7f3
coord.tool_dispatch tool=get_current_time skill=system
coord.audio_done turn=t-7f3 duration_ms=18402. An info question with a "thinking" chime
What's in the news?
The news skill fetches headlines from Google News RSS and weather from Open-Meteo. It also returns a PlaySound side effect: a short chime that plays immediately, while the LLM is still composing its narration.
The chime is the framework's way of saying "I heard you, working on it." It's the simplest version of the audio side effect mechanism — a single PCM blob the framework drops into the audio channel before the model speaks.
3. A long-form audio stream
This exercise requires the audiobooks skill in your persona. Add audiobooks: {} to your persona.yaml's skills: block, point it at a directory of audio files, and restart the server.
Play something.
Now you're hitting the framework's hardest primitive.
What happens:
The agent asks which book
The model calls search_audiobooks (a search tool) and narrates the matches. You hear the results.
You pick one
The first one.
The agent announces it's starting
Model audio plays: "Starting..."
The book starts playing
A small chime plays first (the book_start earcon). Then the audiobook itself — pumped through ffmpeg, chunked into PCM, streamed through the same WebSocket as the model's voice. Same audio channel, same playback path on the browser.
You can interrupt
Hold the PTT button. The book stops atomically — within 30ms, audio cleared, ffmpeg killed, model interrupted. Your bookmark saves automatically. Resume any time with "Continue where we left off."
This is the factory pattern at work. The skill didn't return audio bytes — it returned an AudioStream(factory=...) side effect. The framework holds onto that factory, waits for the model to finish speaking, then invokes it. That's why the announcement always plays before the book, never overlapping it.
4. A proactive interruption
This exercise requires the timers skill. Add timers: {} to your persona.yaml's skills: block and restart.
Set a timer for 10 seconds.
The timers skill schedules a timer. Wait 10 seconds. The agent speaks without you holding the button:
"Time."
This is inject_turn — a skill making the model speak first. Internally, the timer fired, called ctx.inject_turn(message, priority=BLOCK_BEHIND_COMMS), and the framework decided when to surface it (immediately, since nothing else was speaking).
If you'd been listening to a book, the timer would have paused the book, spoken the reminder, then resumed the book. That's the focus channel system arbitrating between CONTENT (the book) and BLOCK_BEHIND_COMMS (the timer).
Reading the logs
Every event Huxley emits has a turn ID — a short UUID for the current conversation round. Structured JSONL logs are opt-in — set HUXLEY_LOG_JSON=true in your .env file first, then tail and filter by turn ID:
HUXLEY_LOG_JSON=truetail -f logs/server.jsonl | jq 'select(.turn=="t-7f3")'Without HUXLEY_LOG_JSON, logs appear as human-readable text in stdout — still useful for interactive debugging, but not grep-friendly by turn ID.
The most useful events:
| Event | What it tells you |
|---|---|
coord.ptt_start | User pressed the button |
coord.ptt_stop | User released; audio committed to OpenAI |
coord.tool_dispatch | Model called a tool |
coord.audio_done | Model finished speaking |
coord.interrupted | Atomic interrupt — drop everything, drain quickly |
focus.acquire | Some channel just took control of the speaker |
focus.patience_expired | A claim sat pending too long; getting evicted |
<skill>.* | Skill-emitted events (audiobooks.resolve, news.fetched, etc.) |
Logs aren't decoration — they're the only way to debug a real-time voice system. We use them in the docs as a teaching tool too.