huxley
Concepts

Concepts

The vocabulary that makes everything else click.

Everything Huxley does sits on six concepts. Once you have these, the rest of the docs (and the framework's source code) will read like English.

How they fit together

A persona declares which skills to load, what voice to use, and how to behave (its constraints). When you press the button, the framework starts a turn. The model decides what to say or which tool to call. Tools return side effects — a chime, a long-form audio stream, a claim on the speaker — that the framework arbitrates through focus channels.

That's the whole thing.

A worked example, end-to-end

Take a single sentence: "Put on some music."

The persona shapes the prompt

Before any audio leaves your microphone, Huxley sent OpenAI a system prompt declared by the persona. That prompt establishes the voice ("warm, slow, Spanish"), the personality ("never refuse a request — offer alternatives"), and the available tools (drawn from each loaded skill).

A turn begins

You press the button. The framework starts a Turn with a unique ID. Audio flows from your mic through the WebSocket to OpenAI Realtime.

The model decides

OpenAI hears "put on some music," looks at the available tools, and calls play_station(id="rock_clasico") — a tool exposed by the radio skill.

The skill responds with a side effect

The radio skill returns a ToolResult whose output is "Reproduciendo Rock Clásico" (so the model can narrate it) and whose side_effect is an AudioStream (a factory that, when invoked, will pump radio audio through the WebSocket).

Focus arbitrates

Until now, the speaker was on the DIALOG channel — the model was about to speak. The audio stream side effect requests the CONTENT channel (priority 300, lower than DIALOG). The framework lets the model finish narrating ("Reproduciendo Rock Clásico"), then hands the speaker to the radio stream.

The constraint check, in the prompt

Because the persona declared never_say_no, even if the radio station were unreachable, the skill (well-written) returns alternatives in the output text and the model narrates those — never a flat "I can't do that."

That's six concepts working together for one sentence. Now learn each one properly.

On this page