huxley

Everything Huxley does sits on six concepts. Once you have these, the rest of the docs (and the framework's source code) will read like English.

Persona

Who the agent is. Voice, language, system prompt, constraints, list of skills. One YAML file.

Skill

What the agent can do. A Python class with tools the LLM can call. Reusable across personas.

Turn

One round of conversation, from button-press to silence. The framework's atomic unit.

Side effect

What a tool does after it returns text — play audio, drop a chime, claim the speaker.

Focus & channels

Who's allowed to make sound right now, and what happens when two things want to.

Constraint

Behavioral rules a persona declares — never_say_no, confirm_destructive — that skills opt into.

A persona declares which skills to load, what voice to use, and how to behave (its constraints). When you press the button, the framework starts a turn. The model decides what to say or which tool to call. Tools return side effects — a chime, a long-form audio stream, a claim on the speaker — that the framework arbitrates through focus channels.

That's the whole thing.

A worked example, end-to-end

Take a single sentence: "Put on some music."

The persona shapes the prompt

Before any audio leaves your microphone, Huxley sent OpenAI a system prompt declared by the persona. That prompt establishes the voice ("warm, slow, Spanish"), the personality ("never refuse a request — offer alternatives"), and the available tools (drawn from each loaded skill).

A turn begins

You press the button. The framework starts a Turn with a unique ID. Audio flows from your mic through the WebSocket to OpenAI Realtime.

The model decides

OpenAI hears "put on some music," looks at the available tools, and calls play_station(id="rock_clasico") — a tool exposed by the radio skill.

The skill responds with a side effect

The radio skill returns a ToolResult whose output is "Reproduciendo Rock Clásico" (so the model can narrate it) and whose side_effect is an AudioStream (a factory that, when invoked, will pump radio audio through the WebSocket).

Focus arbitrates

Until now, the speaker was on the DIALOG channel — the model was about to speak. The audio stream side effect requests the CONTENT channel (priority 300, lower than DIALOG). The framework lets the model finish narrating ("Reproduciendo Rock Clásico"), then hands the speaker to the radio stream.

The constraint check, in the prompt

Because the persona declared never_say_no, even if the radio station were unreachable, the skill (well-written) returns alternatives in the output text and the model narrates those — never a flat "I can't do that."

That's six concepts working together for one sentence. Now learn each one properly.

Start with personas →

The most user-facing concept, and the one most people change first.

Concepts