huxley
Get Started

Quickstart

From git clone to your first conversation in five minutes.

This page gets you from zero to a working voice agent on your laptop. By the end you'll be holding a button in your browser, speaking to your agent, and hearing it speak back.

Prerequisites

You need three things on your machine:

uv
bun
An OpenAI API key with Realtime access

If you don't have uv yet: curl -LsSf https://astral.sh/uv/install.sh | sh. If you don't have bun: curl -fsSL https://bun.sh/install | bash. The OpenAI key needs the Realtime API enabled — most paid accounts have it.

Huxley does not run a free-tier model. The OpenAI Realtime API charges per minute of listening and speaking. Idle sessions are free. For typical hobby use, expect a few cents per conversation.

Five steps

Clone the repo

git clone https://github.com/ma-r-s/Huxley.git
cd Huxley

Drop your API key into .env

echo "HUXLEY_OPENAI_API_KEY=sk-..." > server/runtime/.env

The server loads this file at startup. Pick a persona by adding HUXLEY_PERSONA=abuelos (or basicos) to the .env — the repo ships both, and the loader requires you to choose explicitly when more than one is available.

Install Python and JavaScript dependencies

uv sync                         # installs the Python workspace
cd clients/pwa && bun install   # installs the dev client
cd ../..

uv sync installs the framework, the SDK, and every first-party skill in one shot. The PWA client is a small Vite + React app — it's the dev tool you talk to.

Start the server (terminal 1)

cd server/runtime
uv run huxley

You should see app.starting, persona.loaded name=abuelos, and server.listening port=8765. Leave this running.

Start the client (terminal 2)

cd clients/pwa
bun dev

Open the URL it prints (usually http://localhost:5174). You'll see a one-button page with an animated orb in the middle. The orb is your agent.

Your first conversation

Hold the button in the middle of the page. Speak in Spanish — AbuelOS is Spanish by default. Try one of these:

¿Qué hora es?

Cuéntame las noticias.

Quiero escuchar un libro.

Release the button. AbuelOS listens, decides what to do, then speaks back through your speakers. The orb pulses while it thinks, glows while it speaks, and dims when it's listening.

AbuelOS speaks Spanish by default because that's what its persona declares. The framework doesn't know or care about Spanish — switch the persona, and the same agent speaks English, French, or anything else OpenAI supports.

What just happened?

The browser captured your audio

The PWA client opened a microphone, captured 24kHz PCM, and streamed it over a WebSocket to the server while you held the button.

The server forwarded it to OpenAI Realtime

Huxley's turn coordinator gated the audio (only forwarded while you were holding the button), then pushed it to OpenAI Realtime over its own WebSocket.

OpenAI decided how to respond

Either it answered directly (for "what time is it?") or it called one of AbuelOS's tools (for "tell me the news"). The persona's system prompt — written in Spanish, with AbuelOS's voice — shaped how it responded.

Huxley dispatched the tool call

If a tool was called, the framework routed it to the right skill, awaited the result, and sent the result back to OpenAI for narration.

OpenAI streamed audio back

24kHz PCM, chunked, flowing through Huxley back to the browser. The browser played it through your speakers. The whole round trip is usually under a second.

If a tool returned a long-form side effect — like an audiobook stream — Huxley would have queued it to play after the model finished speaking, and routed the audio through the same WebSocket channel. We'll get to that soon.

You're up. Now learn how it works.

Troubleshooting

On this page