huxley

The atomic unit of conversation. From button-press to silence, with everything in between.

A turn is one round of conversation. It begins when something requests speech (you press the PTT button, or a skill calls inject_turn) and ends when the agent goes quiet again. Turns are the framework's clock — every event has a turn ID, every skill dispatch happens inside a turn, every interrupt drops a turn.

Understanding turns is the difference between writing a skill that kind of works and writing one that handles interruption, latency, and proactive speech correctly.

The state machine

Every turn walks through this sequence:

Seven states. Five edges that matter to skill authors. The coordinator (server/runtime/src/huxley/turn/coordinator.py) is the sole authority — it owns the state, owns the transitions, and owns the audio sequencing.

Diagram notation: rounded nodes (([STATE])) are terminal — the system rests there. Square nodes ((STATE)) are transient — the coordinator moves through them and keeps going.

What each state means

"Speech before factories" — the guarantee that matters

If a tool returns an AudioStream side effect, the framework does not play the audio immediately. It latches the factory and waits.

When does it fire? At the terminal barrier: after response.done, after no tool needs follow-up, after the model has stopped emitting audio chunks. Only then does the framework call the factory and start streaming.

Why? Because the model's narration of what's about to happen must finish before the thing happens. Imagine:

"Comenzando el primer libro..." [book audio starts]

vs.

[book audio starts] "Comenzando el primer libro..." [overlapping]

The first is conversation. The second is a glitch. The framework's job is to make the first cheap and the second impossible.

Interruption, atomically

The interrupt path is the most carefully engineered piece of the framework. When you press PTT during model speech (or while a book is playing, or during a tool call):

Drop flag set

Any in-flight audio is marked dropped. Anything mid-write to the client gets discarded. The flag is checked before every outbound chunk, so the producer can't race the interrupt.

Pending factories cleared

If audio side effects were latched but hadn't fired yet, they're discarded. The book that would have started playing doesn't.

Audio producer stopped

If a stream was running (audiobook, radio, claim speaker source), it's cancelled. The asyncio task wrapping the factory gets cancel(). ffmpeg subprocesses get terminated.

Client buffer flushed

A clear message is sent to the browser. The browser drops everything it had queued. You hear silence within ~30ms of releasing the button.

OpenAI response cancelled

A response.cancel is sent to the Realtime session. OpenAI stops generating tokens. We don't pay for what we don't hear.

State marked INTERRUPTED, then IDLE

The turn is logged as interrupted (so analytics know what happened), state returns to IDLE, and the new PTT can begin.

This is six steps in a fixed order, and each step is independently safe to skip if its precondition is false (no producer running → step 3 is a noop). The result: interruption just works in every state from COMMITTING onward.

Skill author implications

Three things follow from how turns work, that you need internalized:

1. Your factory may run, may not, may be cancelled mid-stream

A skill that returns AudioStream(factory=play_book) is making a promise: "here's how to produce audio, you (the framework) decide when." The framework may invoke the factory, or it may discard it (if the user interrupted before the model finished narrating). The factory may run to completion, or it may get cancelled mid-stream.

Your factory must be safe in all three cases. Use try / finally for cleanup:

async def stream():
    bytes_written = 0
    completed = False
    try:
        async for chunk in audio_source():
            bytes_written += len(chunk)
            yield chunk
        completed = True
    finally:
        # Save bookmark whether we finished or got cancelled mid-way.
        position = 0.0 if completed else start_position + bytes_written / BPS
        await self._storage.save_position(book_id, position)

2. Your factory captures position via closure

Don't store "current position" eagerly. The user might tell the model "skip this chapter" — that creates a new turn, the model calls a control tool, the tool returns a new AudioStream factory pointing at a new position. The framework cancels the old factory and runs the new one. If positions were stored eagerly, the cancellation race would corrupt them.

def play_book(book_id: str, start: float):
    async def stream():
        async for chunk in self._player.stream(book_id, start_position=start):
            yield chunk
    return stream

The closure captures start. The skill object's state never gets out of sync with what's actually playing.

3. inject_turn enters the same state machine

When a skill calls ctx.inject_turn(...), the framework injects a new turn into the same coordinator. The injected turn races for the speaker against any user PTT. The priority you pass (NORMAL, BLOCK_BEHIND_COMMS, PREEMPT) controls how it races. We cover the priorities in Focus & Channels.

Turn IDs and observability

Every turn has a short UUID. Every log event the framework emits inside a turn carries that ID:

coord.ptt_start turn=t-7f3a state=LISTENING
coord.tool_dispatch turn=t-7f3a tool=play_book skill=audiobooks
coord.audio_done turn=t-7f3a duration_ms=1340
audiobooks.stream_started turn=t-7f3a book_id=el_principito

When something goes wrong, grep by turn ID. The whole story is reconstructable. The framework's full logging convention lives in docs/observability.md in the repo; the short version is: every event you emit from a skill should also include implicit turn context, which ctx.logger injects for you.

Side effects

What ToolResult.side_effect actually does.

Focus & channels

The arbitration layer that decides who's speaking.

Turns

The state machine

What each state means

"Speech before factories" — the guarantee that matters

Interruption, atomically

Drop flag set

Pending factories cleared

Audio producer stopped

Client buffer flushed

OpenAI response cancelled

State marked INTERRUPTED, then IDLE

Skill author implications

1. Your factory may run, may not, may be cancelled mid-stream

2. Your factory captures position via closure

3. inject_turn enters the same state machine

Turn IDs and observability

Next

Side effects

Focus & channels

On this page

Turns

IDLE — nothing is happening

LISTENING — recording user audio

COMMITTING — audio captured, waiting for response

IN_RESPONSE — model is replying

AWAITING_NEXT_RESPONSE — info tool, model needs to narrate the result

APPLYING_FACTORIES — model done, audio side effects firing

INTERRUPTED — atomic stop

Side effects

Focus & channels

On this page