Quickstart
From git clone to your first conversation in five minutes.
This page gets you from zero to a working voice agent on your laptop. By the end you'll be holding a button in your browser, speaking to your agent, and hearing it speak back.
Prerequisites
You need three things on your machine:
If you don't have uv yet: curl -LsSf https://astral.sh/uv/install.sh | sh. If you don't have bun: curl -fsSL https://bun.sh/install | bash. The OpenAI key needs the Realtime API enabled — most paid accounts have it.
Huxley does not run a free-tier model. The OpenAI Realtime API charges per minute of listening and speaking. Idle sessions are free. For typical hobby use, expect a few cents per conversation.
Five steps
Clone the repo
git clone https://github.com/ma-r-s/Huxley.git
cd HuxleyDrop your API key into .env
echo "HUXLEY_OPENAI_API_KEY=sk-..." > server/runtime/.envThe server loads this file at startup. Pick a persona by adding HUXLEY_PERSONA=abuelos (or basicos) to the .env — the repo ships both, and the loader requires you to choose explicitly when more than one is available.
Install Python and JavaScript dependencies
uv sync # installs the Python workspace
cd clients/pwa && bun install # installs the dev client
cd ../..uv sync installs the framework, the SDK, and every first-party skill in one shot. The PWA client is a small Vite + React app — it's the dev tool you talk to.
Start the server (terminal 1)
cd server/runtime
uv run huxleyYou should see app.starting, persona.loaded name=abuelos, and server.listening port=8765. Leave this running.
Start the client (terminal 2)
cd clients/pwa
bun devOpen the URL it prints (usually http://localhost:5174). You'll see a one-button page with an animated orb in the middle. The orb is your agent.
Your first conversation
Hold the button in the middle of the page. Speak in Spanish — AbuelOS is Spanish by default. Try one of these:
¿Qué hora es?
Cuéntame las noticias.
Quiero escuchar un libro.
Release the button. AbuelOS listens, decides what to do, then speaks back through your speakers. The orb pulses while it thinks, glows while it speaks, and dims when it's listening.
AbuelOS speaks Spanish by default because that's what its persona declares. The framework doesn't know or care about Spanish — switch the persona, and the same agent speaks English, French, or anything else OpenAI supports.
What just happened?
The browser captured your audio
The PWA client opened a microphone, captured 24kHz PCM, and streamed it over a WebSocket to the server while you held the button.
The server forwarded it to OpenAI Realtime
Huxley's turn coordinator gated the audio (only forwarded while you were holding the button), then pushed it to OpenAI Realtime over its own WebSocket.
OpenAI decided how to respond
Either it answered directly (for "what time is it?") or it called one of AbuelOS's tools (for "tell me the news"). The persona's system prompt — written in Spanish, with AbuelOS's voice — shaped how it responded.
Huxley dispatched the tool call
If a tool was called, the framework routed it to the right skill, awaited the result, and sent the result back to OpenAI for narration.
OpenAI streamed audio back
24kHz PCM, chunked, flowing through Huxley back to the browser. The browser played it through your speakers. The whole round trip is usually under a second.
If a tool returned a long-form side effect — like an audiobook stream — Huxley would have queued it to play after the model finished speaking, and routed the audio through the same WebSocket channel. We'll get to that soon.
What to read next
You're up. Now learn how it works.
Your first conversation
What to expect, what to try, how to know it's working.
Core concepts
Persona, skill, turn, side effect — the vocabulary that makes everything else click.