huxley

Build voice agents you own. Bring a persona and skills — Huxley handles the rest.

Huxley is a Python framework for real-time voice agents you actually own. You bring two things: a persona (who the agent is) and a set of skills (what it can do). Huxley handles everything in between — turn coordination, interruption, proactive speech, audio bridging, focus arbitration.

The result: an agent that listens, thinks, and speaks with the warmth and quickness of a real conversation, not the robotic dispatch of a chatbot reading off a screen.

Quickstart

Five minutes from git clone to talking with your agent.

Concepts

The vocabulary that makes everything else click — persona, skill, turn, side effect.

Build a skill

Teach Huxley to do something new in under fifty lines of Python.

Build a persona

Shape the voice, language, and personality of your agent through one YAML file.

What you can build with Huxley

Skills are just Python packages. Anything Python can do, your agent can do — out loud, on demand, in conversation.

Personal content

Audiobooks, music libraries, family photo descriptions, recipes — all retrievable by voice with fuzzy matching built in.

Live information

News, weather, calendars, transit, scores. Cache responses, narrate them in your persona's voice.

Communication

Telegram and similar — make calls, send messages, receive announcements that interrupt politely.

Reminders & alerts

Timers and proactive messages that survive server restarts and respect when the user is busy.

Smart home

Anything with a Python API or HTTP endpoint. Lights, thermostats, robot vacuums.

Whatever you can imagine

The skill protocol is intentionally tiny. If you can write the Python, Huxley can speak it.

Why people pick Huxley

Huxley is opinionated. The opinions matter.

You own the agent. Huxley runs on your hardware. The OpenAI Realtime API call is yours, billed to your key. Skills run in your Python process. No cloud middleman owns your assistant.

The framework names mechanisms, not use cases. Words like call, reminder, audiobook, emergency don't appear in the framework — they belong to the skills that implement them. The framework gives you turns, side effects, focus channels, and constraints. You compose products from those primitives.

Audio-first. Every event has an audible trail. Earcons mark beginnings and endings. Failures are spoken, never silent. The target user can be blind and never miss what the agent is doing.

The simplest version of every primitive. A skill is a Python class with a name, a tools list, and a handle method. A persona is a YAML file. A side effect is a dataclass. There are no abstractions for hypothetical future problems.

Pick your path

Just want to try it

Five-minute install, talk to AbuelOS, see what the framework feels like.

Want to run it as my own assistant

Pick a persona, configure it, deploy it on a Raspberry Pi or a laptop or wherever.

Want to extend it with a new skill

Write a Python package, register it via entry points, ship it to the world.

Want to make it mine

Different voice, different language, different personality. One YAML file.

A word on AbuelOS

Throughout the docs you'll see references to AbuelOS — the canonical persona Huxley ships with, designed for a 90-year-old blind Spanish-speaking user. AbuelOS is not the framework. It's one persona, the same way Vercel's example apps aren't Next.js. We use AbuelOS as the running example because it exercises every primitive: long-form audio playback, proactive messages, accessibility-driven design, multilingual prompts, persona-shaped behavior.

If you build a persona that's nothing like AbuelOS — a tutor, a DJ, a kitchen helper, a cycling coach — that's exactly the point.

Welcome