j.foucher bbeb4294a8 Add ElevenLabs API reference doc for future Claude sessions

Covers: WebSocket protocol (all message types), Agent ID location,
Signed URL auth, REST agents API, audio format, UE5 integration notes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-19 13:35:35 +01:00

11 KiB

Raw Blame History

ElevenLabs Conversational AI – API Reference

Saved for Claude Code sessions. Auto-loaded via .claude/ directory. Last updated: 2026-02-19

1. Agent ID — Where to Find It

In the Dashboard (UI)

Go to https://elevenlabs.io/app/conversational-ai
Click on your agent to open it
The Agent ID is shown in the agent settings page — typically in the URL bar and/or in the agent's "General" settings tab
- URL pattern: https://elevenlabs.io/app/conversational-ai/agents/<AGENT_ID>
- Also visible in the "API" or "Overview" tab of the agent editor (copy button available)

Via API

GET https://api.elevenlabs.io/v1/convai/agents
xi-api-key: YOUR_API_KEY

Returns a list of all agents with their agent_id strings.

Via API (single agent)

GET https://api.elevenlabs.io/v1/convai/agents/{agent_id}
xi-api-key: YOUR_API_KEY

Agent ID Format

Type: string
Returned on agent creation via POST /v1/convai/agents/create
Used as URL path param and WebSocket query param throughout the API

2. WebSocket Conversational AI

Connection URL

wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<AGENT_ID>

Regional alternatives:

Region	URL
Default (Global)	`wss://api.elevenlabs.io/`
US	`wss://api.us.elevenlabs.io/`
EU	`wss://api.eu.residency.elevenlabs.io/`
India	`wss://api.in.residency.elevenlabs.io/`

Authentication

Public agents: No key required, just agent_id query param
Private agents: Use a Signed URL (see Section 4) instead of direct agent_id
Server-side (backend): Pass xi-api-key as an HTTP upgrade header

Headers:
  xi-api-key: YOUR_API_KEY

⚠️ Never expose your API key client-side. For browser/mobile apps, use Signed URLs.

3. WebSocket Protocol — Message Reference

Audio Format

Input (mic → server): PCM 16-bit signed, 16000 Hz, mono, little-endian, Base64-encoded
Output (server → client): Base64-encoded audio (format specified in conversation_initiation_metadata)

`conversation_initiation_metadata`

Sent immediately after connection. Contains conversation ID and audio format specs.

{
  "type": "conversation_initiation_metadata",
  "conversation_initiation_metadata_event": {
    "conversation_id": "string",
    "agent_output_audio_format": "pcm_16000 | mp3_44100 | ...",
    "user_input_audio_format": "pcm_16000"
  }
}

`audio`

Agent speech audio chunk.

{
  "type": "audio",
  "audio_event": {
    "audio_base_64": "BASE64_PCM_BYTES",
    "event_id": 42
  }
}

`user_transcript`

Transcribed text of what the user said.

{
  "type": "user_transcript",
  "user_transcription_event": {
    "user_transcript": "Hello, how are you?"
  }
}

`agent_response`

The text the agent is saying (arrives in parallel with audio).

{
  "type": "agent_response",
  "agent_response_event": {
    "agent_response": "I'm doing great, thanks!"
  }
}

`agent_response_correction`

Sent after an interruption — shows what was truncated.

{
  "type": "agent_response_correction",
  "agent_response_correction_event": {
    "original_agent_response": "string",
    "corrected_agent_response": "string"
  }
}

`interruption`

Signals that a specific audio event was interrupted.

{
  "type": "interruption",
  "interruption_event": {
    "event_id": 42
  }
}

`ping`

Keepalive ping from server. Client must reply with pong.

{
  "type": "ping",
  "ping_event": {
    "event_id": 1,
    "ping_ms": 150
  }
}

`client_tool_call`

Requests the client execute a tool (custom tools integration).

{
  "type": "client_tool_call",
  "client_tool_call": {
    "tool_name": "string",
    "tool_call_id": "string",
    "parameters": {}
  }
}

`contextual_update`

Text context added to conversation state (non-interrupting).

{
  "type": "contextual_update",
  "contextual_update_event": {
    "text": "string"
  }
}

`vad_score`

Voice Activity Detection confidence score (0.0–1.0).

{
  "type": "vad_score",
  "vad_score_event": {
    "vad_score": 0.85
  }
}

`internal_tentative_agent_response`

Preliminary agent text during LLM generation (not final).

{
  "type": "internal_tentative_agent_response",
  "tentative_agent_response_internal_event": {
    "tentative_agent_response": "string"
  }
}

Messages TO Server (Publish / Send)

`user_audio_chunk`

Microphone audio data. Send continuously during user speech.

{
  "user_audio_chunk": "BASE64_PCM_16BIT_16KHZ_MONO"
}

Audio must be: PCM 16-bit signed, 16000 Hz, mono, little-endian, then Base64-encoded.

`pong`

Reply to server ping to keep connection alive.

{
  "type": "pong",
  "event_id": 1
}

`conversation_initiation_client_data`

Override agent configuration at connection time. Send before or just after connecting.

{
  "type": "conversation_initiation_client_data",
  "conversation_config_override": {
    "agent": {
      "prompt": { "prompt": "Custom system prompt override" },
      "first_message": "Hello! How can I help?",
      "language": "en"
    },
    "tts": {
      "voice_id": "string",
      "speed": 1.0,
      "stability": 0.5,
      "similarity_boost": 0.75
    }
  },
  "dynamic_variables": {
    "user_name": "Alice",
    "session_id": 12345
  }
}

Config override ranges:

tts.speed: 0.7 – 1.2
tts.stability: 0.0 – 1.0
tts.similarity_boost: 0.0 – 1.0

`client_tool_result`

Response to a client_tool_call from the server.

{
  "type": "client_tool_result",
  "tool_call_id": "string",
  "result": "tool output string",
  "is_error": false
}

`contextual_update`

Inject context without interrupting the conversation.

{
  "type": "contextual_update",
  "text": "User just entered room 4B"
}

`user_message`

Send a text message (no mic audio needed).

{
  "type": "user_message",
  "text": "What is the weather like?"
}

`user_activity`

Signal that user is active (for turn detection in client mode).

{
  "type": "user_activity"
}

4. Signed URL (Private Agents)

Used for browser/mobile clients to authenticate without exposing the API key.

Flow

Backend calls ElevenLabs API to get a temporary signed URL
Backend returns signed URL to client
Client opens WebSocket to the signed URL (no API key needed)

Get Signed URL

GET https://api.elevenlabs.io/v1/convai/conversation/get-signed-url?agent_id=<AGENT_ID>
xi-api-key: YOUR_API_KEY

Optional query params:

include_conversation_id=true — generates unique conversation ID, prevents URL reuse
branch_id — specific agent branch

Response:

{
  "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..."
}

Client connects to signed_url directly — no headers needed.

5. Agents REST API

Base URL: https://api.elevenlabs.io Auth header: xi-api-key: YOUR_API_KEY

Create Agent

POST /v1/convai/agents/create
Content-Type: application/json

{
  "name": "My NPC Agent",
  "conversation_config": {
    "agent": {
      "first_message": "Hello adventurer!",
      "prompt": { "prompt": "You are a wise tavern keeper in a fantasy world." },
      "language": "en"
    }
  }
}

Response includes agent_id.

List Agents

GET /v1/convai/agents?page_size=30&search=&sort_by=created_at&sort_direction=desc

Response:

{
  "agents": [
    {
      "agent_id": "abc123xyz",
      "name": "My NPC Agent",
      "created_at_unix_secs": 1708300000,
      "last_call_time_unix_secs": null,
      "archived": false,
      "tags": []
    }
  ],
  "has_more": false,
  "next_cursor": null
}

Get Agent

GET /v1/convai/agents/{agent_id}

Update Agent

PATCH /v1/convai/agents/{agent_id}
Content-Type: application/json
{ "name": "Updated Name", "conversation_config": { ... } }

Delete Agent

DELETE /v1/convai/agents/{agent_id}

6. Turn Modes

Server VAD (Default / Recommended)

ElevenLabs server detects when user stops speaking
Client streams audio continuously
Server handles all turn-taking automatically

Client Turn Mode

Client explicitly signals turn boundaries
Send user_activity to indicate user is speaking
Use when you have your own VAD or push-to-talk UI

7. Audio Pipeline (UE5 Implementation Notes)

Microphone (FAudioCapture)
  → float32 samples at device rate (e.g. 44100 Hz stereo)
  → Resample to 16000 Hz mono
  → Convert float32 → int16 little-endian
  → Base64-encode
  → Send as {"user_audio_chunk": "BASE64"}

Server → {"type":"audio","audio_event":{"audio_base_64":"BASE64"}}
  → Base64-decode
  → Raw PCM bytes
  → Push to USoundWaveProcedural
  → UAudioComponent plays back

Float32 → Int16 Conversion (C++)

static TArray<uint8> FloatPCMToInt16Bytes(const TArray<float>& FloatSamples)
{
    TArray<uint8> Bytes;
    Bytes.SetNumUninitialized(FloatSamples.Num() * 2);
    for (int32 i = 0; i < FloatSamples.Num(); i++)
    {
        float Clamped = FMath::Clamp(FloatSamples[i], -1.f, 1.f);
        int16 Sample = (int16)(Clamped * 32767.f);
        Bytes[i * 2]     = (uint8)(Sample & 0xFF);        // Low byte
        Bytes[i * 2 + 1] = (uint8)((Sample >> 8) & 0xFF); // High byte
    }
    return Bytes;
}

8. Quick Integration Checklist (UE5 Plugin)

Set AgentID in UElevenLabsSettings (Project Settings → ElevenLabs AI Agent)
- Or override per-component via UElevenLabsConversationalAgentComponent::AgentID
Set API_Key in settings (or leave empty for public agents)
Add UElevenLabsConversationalAgentComponent to your NPC actor
Set TurnMode (default: Server — recommended)
Bind to events: OnAgentConnected, OnAgentTranscript, OnAgentTextResponse, OnAgentStartedSpeaking, OnAgentStoppedSpeaking
Call StartConversation() to begin
Call EndConversation() when done

9. Key API URLs Reference

Purpose	URL
Dashboard	https://elevenlabs.io/app/conversational-ai
API Keys	https://elevenlabs.io/app/settings/api-keys
WebSocket endpoint	wss://api.elevenlabs.io/v1/convai/conversation
Agents list	GET https://api.elevenlabs.io/v1/convai/agents
Agent by ID	GET https://api.elevenlabs.io/v1/convai/agents/{agent_id}
Create agent	POST https://api.elevenlabs.io/v1/convai/agents/create
Signed URL	GET https://api.elevenlabs.io/v1/convai/conversation/get-signed-url
WS protocol docs	https://elevenlabs.io/docs/eleven-agents/api-reference/eleven-agents/websocket
Quickstart	https://elevenlabs.io/docs/eleven-agents/quickstart

11 KiB Raw Blame History Unescape Escape

ElevenLabs Conversational AI – API Reference

1. Agent ID — Where to Find It

In the Dashboard (UI)

Via API

Via API (single agent)

Agent ID Format

2. WebSocket Conversational AI

Connection URL

Authentication

3. WebSocket Protocol — Message Reference

Audio Format

Messages FROM Server (Subscribe / Receive)

conversation_initiation_metadata

audio

user_transcript

agent_response

agent_response_correction

interruption

ping

client_tool_call

contextual_update

vad_score

internal_tentative_agent_response

Messages TO Server (Publish / Send)

user_audio_chunk

pong

conversation_initiation_client_data

client_tool_result

contextual_update

user_message

user_activity

4. Signed URL (Private Agents)

Flow

Get Signed URL

5. Agents REST API

Create Agent

List Agents

Get Agent

Update Agent

Delete Agent

6. Turn Modes

Server VAD (Default / Recommended)

Client Turn Mode

7. Audio Pipeline (UE5 Implementation Notes)

Float32 → Int16 Conversion (C++)

8. Quick Integration Checklist (UE5 Plugin)

9. Key API URLs Reference

11 KiB

Raw Blame History

`conversation_initiation_metadata`

`audio`

`user_transcript`

`agent_response`

`agent_response_correction`

`interruption`

`ping`

`client_tool_call`

`contextual_update`

`vad_score`

`internal_tentative_agent_response`

`user_audio_chunk`

`pong`

`conversation_initiation_client_data`

`client_tool_result`

`contextual_update`

`user_message`

`user_activity`