Personal voice assistants · Self-hosted Python · MIT

An open-source framework
for personal voice assistants.

Compose yours in Python. Personas in YAML, skills as `pip`-installable packages. Real-time audio, self-hosted.

Install Huxley See how it worksgit clone github.com/ma-r-s/Huxley && uv run huxley

Held, listening for the hold.

AT REST

§ 01 — The problem

Every voice platform asks you to
give up something essential.

Ownership. Extensibility. Or the first six months of your life.

Alexa / Google Home

Walled garden

Cloud-only

Certification fees

OpenAI voice mode

One voice, one brain

No self-hosting

No custom skills

Pipecat / LiveKit

Blank-slate pipelines

You solve turns, interrupts, audio collisions

Build it yourself

Six months of plumbing before the first feature

Huxley

Opinionated on turn sequencing and audio flow

Open for skills, personas, and clients

Self-hosted Python — your box, your keys

Behavioral constraints in YAML (never_say_no, child_safe)

How it works

One coordinator. One audio channel.

The framework sequences every turn through a single speaker. Model voice, tool audio, and skill-claimed bridges never collide.

Turn coordination

Model speech always finishes before tool audio. Interrupts are atomic — drop flag, clear queue, flush, cancel.

Skill dispatch

Python packages registered via entry-points. Typed context, namespaced storage, no framework internals leaked.

Proactive speech

Skills call ctx.inject_turn() and Huxley speaks first — without waiting for the user. Timers, alerts, doorbells, inbound messages.

Audio bridging

Skills can claim the mic and speaker for full-duplex audio (calls, voice memos). The focus manager prevents collisions with model speech.

Behavioral constraints

never_say_no, confirm_destructive, child_safe — declared in persona.yaml and enforced system-wide. Build for vulnerable users without special code.

Voice provider

OpenAI Realtime today; the architecture leaves room for other providers, but Huxley itself doesn’t train or serve models.

Turn sequencing

You never hear
two voices at once.

The coordinator drains model speech before tool audio starts, holds proactive turns until the channel is free, and makes every interrupt atomic.

T = 1.5s

User releases. Capture stops cleanly.

T = 4.2s

Model finishes its sentence before the chime plays.

T = 5.3s

Proactive inject waits for the previous turn to drain.

§ 04 — Skill system

Anything that speaks Python
is a skill.

Skills register via Python entry-points. Add a line to persona.yaml, restart — the framework never changes.

6 shipped·69 designed for·∞ possible

Audio

Audiobooks

huxley-skill-audiobooks

Audio

Radio

huxley-skill-radio

Audio

Podcasts

huxley-skill-podcasts

Audio

Spotify

huxley-skill-spotify

Audio

YouTube Audio

huxley-skill-youtube-audio

Audio

Ambient

huxley-skill-ambient

Audio

Text-to-audio

huxley-skill-text-to-audio

§ 04.1 — Writing one

A skill is a Python package.

Declare tools. Handle calls. Return a ToolResult — optionally with an AudioStream, a PlaySound, or an InputClaim for full-duplex audio. The framework sequences the rest.

class LightsSkill:
    name = "lights"
    tools = [ToolDefinition(
        name="set_lights",
        description="Turn the lights on or off.",
        parameters={"on": "boolean"},
    )]

    async def handle(self, tool, args):
        await hue.set(args["on"])
        return ToolResult(output='{"ok": true}')

# pyproject.toml
[project.entry-points."huxley.skills"]
lights = "my_package.skill:LightsSkill"

Today

Built today,
building tomorrow.

Real numbers from the current commit. The framework is small, opinionated, and tested.

678

tests passing

15K

Python LOC

first-party skills

personas shipped

ADRs filed

MIT

license

What's next

Curated, not exhaustive.

The full plan lives in docs/roadmap.md. These four are the next visible moves.

Skill SDK cookbook

A third-party author should write a working skill in under 30 minutes with no Huxley-internals knowledge.

Per-skill secret interpolation

${HUXLEY_TELEGRAM_TOKEN} in persona.yaml — declare the shape of the secret without storing it. Lands before stage 4.

Voice provider abstraction

Extract OpenAI Realtime as one implementation of a VoiceProvider interface. Triggered when a credible second provider exists.

Later

ESP32 walky-talky client

Replaces the browser as the production client for AbuelOS. Same WebSocket protocol; firmware lives at clients/firmware/.

Firmware

Personas

One framework.
Every voice you need.

A persona is a YAML file: voice, language, skills, guardrails. AbuelOS is in production today; the others are design examples that show how far the same framework stretches.

Slow warm voice for an elderly blind user. Audiobooks, radio, Telegram calls. Never says no.

Voice

OpenAI Realtime · "alloy" at 0.85×

Language

es-ES · slow · warm

Skills

audiobooks · radio · news · timers · comms-telegram

Hardware

WebSocket client, mic + speaker

Rule

never_say_no

# personas/abuelos/persona.yaml
name: AbuelOS
voice: alloy
language_code: es
system_prompt: |
  Eres AbuelOS. Hablas despacio, con calma.
  Nunca dices "no puedo". Siempre intentas ayudar.
constraints: [never_say_no, confirm_destructive]
skills:
  audiobooks: { library: /media/audiolibros }
  radio: {}
  news: { location: Madrid, ES }
  timers: {}
  comms-telegram: { api_id: ..., api_hash: ... }

Get started

Five lines.
A voice of your own.

Runs on OpenAI Realtime: roughly $0.06/min listening, $0.24/min speaking. You pay OpenAI directly — Huxley adds no markup. Idle is free.

$ git clone https://github.com/ma-r-s/Huxley.git && cd Huxley
$ echo "HUXLEY_OPENAI_API_KEY=sk-..." > server/runtime/.env
$ uv sync && cd server/runtime && uv run huxley   # terminal 1
$ cd clients/pwa && bun install && bun dev         # terminal 2
$ open http://localhost:5174   # hold the button, speak.

GitHub ↗Read the docs

An open-source frameworkfor personal voice assistants.

Every voice platform asks you togive up something essential.

One coordinator. One audio channel.

You never heartwo voices at once.

Anything that speaks Pythonis a skill.

A skill is a Python package.

Built today,building tomorrow.

Curated, not exhaustive.

One framework.Every voice you need.

Five lines.A voice of your own.

An open-source framework
for personal voice assistants.

Every voice platform asks you to
give up something essential.

You never hear
two voices at once.

Anything that speaks Python
is a skill.

Built today,
building tomorrow.

One framework.
Every voice you need.

Five lines.
A voice of your own.