Personal voice assistants · Self-hosted Python · MIT

An open-source framework
for personal voice assistants.

Compose yours in Python. Personas in YAML, skills as `pip`-installable packages. Real-time audio, self-hosted.

Install HuxleySee how it worksgit clone github.com/ma-r-s/Huxley && uv run huxley
Held, listening for the hold.
AT REST
§ 01 — The problem

Every voice platform asks you to
give up something essential.

Ownership. Extensibility. Or the first six months of your life.
Alexa / Google Home
Walled garden
Cloud-only
Certification fees
OpenAI voice mode
One voice, one brain
No self-hosting
No custom skills
Pipecat / LiveKit
Blank-slate pipelines
You solve turns, interrupts, audio collisions
Build it yourself
Six months of plumbing before the first feature
Huxley
Opinionated on turn sequencing and audio flow
Open for skills, personas, and clients
Self-hosted Python — your box, your keys
Behavioral constraints in YAML (never_say_no, child_safe)
How it works

One coordinator. One audio channel.

The framework sequences every turn through a single speaker. Model voice, tool audio, and skill-claimed bridges never collide.
persona.yamlidentity + skillsHuxley corecoordinator · focus · registryVoice providerOpenAI RealtimeSkillPython pkgSkilllights · hueSkilltimersClientbrowser · ESP32 · phoneaudioconfigws://localhost:8765entry-points / huxley.skills
Turn coordination
Model speech always finishes before tool audio. Interrupts are atomic — drop flag, clear queue, flush, cancel.
Skill dispatch
Python packages registered via entry-points. Typed context, namespaced storage, no framework internals leaked.
Proactive speech
Skills call ctx.inject_turn() and Huxley speaks first — without waiting for the user. Timers, alerts, doorbells, inbound messages.
Audio bridging
Skills can claim the mic and speaker for full-duplex audio (calls, voice memos). The focus manager prevents collisions with model speech.
Behavioral constraints
never_say_no, confirm_destructive, child_safe — declared in persona.yaml and enforced system-wide. Build for vulnerable users without special code.
Voice provider
OpenAI Realtime today; the architecture leaves room for other providers, but Huxley itself doesn’t train or serve models.
Turn sequencing

You never hear
two voices at once.

The coordinator drains model speech before tool audio starts, holds proactive turns until the channel is free, and makes every interrupt atomic.
0s1s2s3s4s5s6s7s8sUser"Set a timer for 25 minutes"ModelSpeaking · "Setting timer…"Proactive · “Timer’s up.”Tool audiochime
T = 1.5s
User releases. Capture stops cleanly.
T = 4.2s
Model finishes its sentence before the chime plays.
T = 5.3s
Proactive inject waits for the previous turn to drain.
§ 04 — Skill system

Anything that speaks Python
is a skill.

Skills register via Python entry-points. Add a line to persona.yaml, restart — the framework never changes.
6 shipped·69 designed for· possible
Audio
Audiobooks
huxley-skill-audiobooks
Audio
Radio
huxley-skill-radio
Audio
Podcasts
huxley-skill-podcasts
Audio
Spotify
huxley-skill-spotify
Audio
YouTube Audio
huxley-skill-youtube-audio
Audio
Ambient
huxley-skill-ambient
Audio
Text-to-audio
huxley-skill-text-to-audio
§ 04.1 — Writing one

A skill is a Python package.

Declare tools. Handle calls. Return a ToolResult — optionally with an AudioStream, a PlaySound, or an InputClaim for full-duplex audio. The framework sequences the rest.

class LightsSkill:
    name = "lights"
    tools = [ToolDefinition(
        name="set_lights",
        description="Turn the lights on or off.",
        parameters={"on": "boolean"},
    )]

    async def handle(self, tool, args):
        await hue.set(args["on"])
        return ToolResult(output='{"ok": true}')

# pyproject.toml
[project.entry-points."huxley.skills"]
lights = "my_package.skill:LightsSkill"
Today

Built today,
building tomorrow.

Real numbers from the current commit. The framework is small, opinionated, and tested.
678
tests passing
15K
Python LOC
6
first-party skills
2
personas shipped
17
ADRs filed
MIT
license
What's next

Curated, not exhaustive.

The full plan lives in docs/roadmap.md. These four are the next visible moves.

01
Skill SDK cookbook
A third-party author should write a working skill in under 30 minutes with no Huxley-internals knowledge.
P1
02
Per-skill secret interpolation
${HUXLEY_TELEGRAM_TOKEN} in persona.yaml — declare the shape of the secret without storing it. Lands before stage 4.
P1
03
Voice provider abstraction
Extract OpenAI Realtime as one implementation of a VoiceProvider interface. Triggered when a credible second provider exists.
Later
04
ESP32 walky-talky client
Replaces the browser as the production client for AbuelOS. Same WebSocket protocol; firmware lives at clients/firmware/.
Firmware
Personas

One framework.
Every voice you need.

A persona is a YAML file: voice, language, skills, guardrails. AbuelOS is in production today; the others are design examples that show how far the same framework stretches.
Slow warm voice for an elderly blind user. Audiobooks, radio, Telegram calls. Never says no.
Voice
OpenAI Realtime · "alloy" at 0.85×
Language
es-ES · slow · warm
Skills
audiobooks · radio · news · timers · comms-telegram
Hardware
WebSocket client, mic + speaker
Rule
never_say_no
# personas/abuelos/persona.yaml
name: AbuelOS
voice: alloy
language_code: es
system_prompt: |
  Eres AbuelOS. Hablas despacio, con calma.
  Nunca dices "no puedo". Siempre intentas ayudar.
constraints: [never_say_no, confirm_destructive]
skills:
  audiobooks: { library: /media/audiolibros }
  radio: {}
  news: { location: Madrid, ES }
  timers: {}
  comms-telegram: { api_id: ..., api_hash: ... }
Get started

Five lines.
A voice of your own.

Runs on OpenAI Realtime: roughly $0.06/min listening, $0.24/min speaking. You pay OpenAI directly — Huxley adds no markup. Idle is free.

$ git clone https://github.com/ma-r-s/Huxley.git && cd Huxley
$ echo "HUXLEY_OPENAI_API_KEY=sk-..." > server/runtime/.env
$ uv sync && cd server/runtime && uv run huxley   # terminal 1
$ cd clients/pwa && bun install && bun dev         # terminal 2
$ open http://localhost:5174   # hold the button, speak.
GitHub ↗Read the docs