Turns
The atomic unit of conversation. From button-press to silence, with everything in between.
A turn is one round of conversation. It begins when something requests speech (you press the PTT button, or a skill calls inject_turn) and ends when the agent goes quiet again. Turns are the framework's clock — every event has a turn ID, every skill dispatch happens inside a turn, every interrupt drops a turn.
Understanding turns is the difference between writing a skill that kind of works and writing one that handles interruption, latency, and proactive speech correctly.
The state machine
Every turn walks through this sequence:
Seven states. Five edges that matter to skill authors. The coordinator (server/runtime/src/huxley/turn/coordinator.py) is the sole authority — it owns the state, owns the transitions, and owns the audio sequencing.
Diagram notation: rounded nodes (([STATE])) are terminal — the system rests there. Square nodes ((STATE)) are transient — the coordinator moves through them and keeps going.
What each state means
"Speech before factories" — the guarantee that matters
If a tool returns an AudioStream side effect, the framework does not play the audio immediately. It latches the factory and waits.
When does it fire? At the terminal barrier: after response.done, after no tool needs follow-up, after the model has stopped emitting audio chunks. Only then does the framework call the factory and start streaming.
Why? Because the model's narration of what's about to happen must finish before the thing happens. Imagine:
"Comenzando el primer libro..." [book audio starts]
vs.
[book audio starts] "Comenzando el primer libro..." [overlapping]
The first is conversation. The second is a glitch. The framework's job is to make the first cheap and the second impossible.
Interruption, atomically
The interrupt path is the most carefully engineered piece of the framework. When you press PTT during model speech (or while a book is playing, or during a tool call):
Drop flag set
Any in-flight audio is marked dropped. Anything mid-write to the client gets discarded. The flag is checked before every outbound chunk, so the producer can't race the interrupt.
Pending factories cleared
If audio side effects were latched but hadn't fired yet, they're discarded. The book that would have started playing doesn't.
Audio producer stopped
If a stream was running (audiobook, radio, claim speaker source), it's cancelled. The asyncio task wrapping the factory gets cancel(). ffmpeg subprocesses get terminated.
Client buffer flushed
A clear message is sent to the browser. The browser drops everything it had queued. You hear silence within ~30ms of releasing the button.
OpenAI response cancelled
A response.cancel is sent to the Realtime session. OpenAI stops generating tokens. We don't pay for what we don't hear.
State marked INTERRUPTED, then IDLE
The turn is logged as interrupted (so analytics know what happened), state returns to IDLE, and the new PTT can begin.
This is six steps in a fixed order, and each step is independently safe to skip if its precondition is false (no producer running → step 3 is a noop). The result: interruption just works in every state from COMMITTING onward.
Skill author implications
Three things follow from how turns work, that you need internalized:
1. Your factory may run, may not, may be cancelled mid-stream
A skill that returns AudioStream(factory=play_book) is making a promise: "here's how to produce audio, you (the framework) decide when." The framework may invoke the factory, or it may discard it (if the user interrupted before the model finished narrating). The factory may run to completion, or it may get cancelled mid-stream.
Your factory must be safe in all three cases. Use try / finally for cleanup:
async def stream():
bytes_written = 0
completed = False
try:
async for chunk in audio_source():
bytes_written += len(chunk)
yield chunk
completed = True
finally:
# Save bookmark whether we finished or got cancelled mid-way.
position = 0.0 if completed else start_position + bytes_written / BPS
await self._storage.save_position(book_id, position)2. Your factory captures position via closure
Don't store "current position" eagerly. The user might tell the model "skip this chapter" — that creates a new turn, the model calls a control tool, the tool returns a new AudioStream factory pointing at a new position. The framework cancels the old factory and runs the new one. If positions were stored eagerly, the cancellation race would corrupt them.
def play_book(book_id: str, start: float):
async def stream():
async for chunk in self._player.stream(book_id, start_position=start):
yield chunk
return streamThe closure captures start. The skill object's state never gets out of sync with what's actually playing.
3. inject_turn enters the same state machine
When a skill calls ctx.inject_turn(...), the framework injects a new turn into the same coordinator. The injected turn races for the speaker against any user PTT. The priority you pass (NORMAL, BLOCK_BEHIND_COMMS, PREEMPT) controls how it races. We cover the priorities in Focus & Channels.
Turn IDs and observability
Every turn has a short UUID. Every log event the framework emits inside a turn carries that ID:
coord.ptt_start turn=t-7f3a state=LISTENING
coord.tool_dispatch turn=t-7f3a tool=play_book skill=audiobooks
coord.audio_done turn=t-7f3a duration_ms=1340
audiobooks.stream_started turn=t-7f3a book_id=el_principitoWhen something goes wrong, grep by turn ID. The whole story is reconstructable. The framework's full logging convention lives in docs/observability.md in the repo; the short version is: every event you emit from a skill should also include implicit turn context, which ctx.logger injects for you.