PS_AI_Agent/.claude/elevenlabs_api_reference.md

# ElevenLabs Conversational AI – API Reference
> Saved for Claude Code sessions. Auto-loaded via `.claude/` directory.
> Last updated: 2026-02-19

---

## 1. Agent ID — Where to Find It

### In the Dashboard (UI)
1. Go to **https://elevenlabs.io/app/conversational-ai**
2. Click on your agent to open it
3. The **Agent ID** is shown in the agent settings page — typically in the URL bar and/or in the agent's "General" settings tab
   - URL pattern: `https://elevenlabs.io/app/conversational-ai/agents/<AGENT_ID>`
   - Also visible in the "API" or "Overview" tab of the agent editor (copy button available)

### Via API
```http
GET https://api.elevenlabs.io/v1/convai/agents
xi-api-key: YOUR_API_KEY
```
Returns a list of all agents with their `agent_id` strings.

### Via API (single agent)
```http
GET https://api.elevenlabs.io/v1/convai/agents/{agent_id}
xi-api-key: YOUR_API_KEY
```

### Agent ID Format
- Type: `string`
- Returned on agent creation via `POST /v1/convai/agents/create`
- Used as URL path param and WebSocket query param throughout the API

---

## 2. WebSocket Conversational AI

### Connection URL
```
wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<AGENT_ID>
```

Regional alternatives:
| Region | URL |
|--------|-----|
| Default (Global) | `wss://api.elevenlabs.io/` |
| US | `wss://api.us.elevenlabs.io/` |
| EU | `wss://api.eu.residency.elevenlabs.io/` |
| India | `wss://api.in.residency.elevenlabs.io/` |

### Authentication
- **Public agents**: No key required, just `agent_id` query param
- **Private agents**: Use a **Signed URL** (see Section 4) instead of direct `agent_id`
- **Server-side** (backend): Pass `xi-api-key` as an HTTP upgrade header

```
Headers:
  xi-api-key: YOUR_API_KEY
```

> ⚠️ Never expose your API key client-side. For browser/mobile apps, use Signed URLs.

---

## 3. WebSocket Protocol — Message Reference

### Audio Format
- **Input (mic → server)**: PCM 16-bit signed, **16000 Hz**, mono, little-endian, Base64-encoded
- **Output (server → client)**: Base64-encoded audio (format specified in `conversation_initiation_metadata`)

---

### Messages FROM Server (Subscribe / Receive)

#### `conversation_initiation_metadata`
Sent immediately after connection. Contains conversation ID and audio format specs.
```json
{
  "type": "conversation_initiation_metadata",
  "conversation_initiation_metadata_event": {
    "conversation_id": "string",
    "agent_output_audio_format": "pcm_16000 | mp3_44100 | ...",
    "user_input_audio_format": "pcm_16000"
  }
}
```

#### `audio`
Agent speech audio chunk.
```json
{
  "type": "audio",
  "audio_event": {
    "audio_base_64": "BASE64_PCM_BYTES",
    "event_id": 42
  }
}
```

#### `user_transcript`
Transcribed text of what the user said.
```json
{
  "type": "user_transcript",
  "user_transcription_event": {
    "user_transcript": "Hello, how are you?"
  }
}
```

#### `agent_response`
The text the agent is saying (arrives in parallel with audio).
```json
{
  "type": "agent_response",
  "agent_response_event": {
    "agent_response": "I'm doing great, thanks!"
  }
}
```

#### `agent_response_correction`
Sent after an interruption — shows what was truncated.
```json
{
  "type": "agent_response_correction",
  "agent_response_correction_event": {
    "original_agent_response": "string",
    "corrected_agent_response": "string"
  }
}
```

#### `interruption`
Signals that a specific audio event was interrupted.
```json
{
  "type": "interruption",
  "interruption_event": {
    "event_id": 42
  }
}
```

#### `ping`
Keepalive ping from server. Client must reply with `pong`.
```json
{
  "type": "ping",
  "ping_event": {
    "event_id": 1,
    "ping_ms": 150
  }
}
```

#### `client_tool_call`
Requests the client execute a tool (custom tools integration).
```json
{
  "type": "client_tool_call",
  "client_tool_call": {
    "tool_name": "string",
    "tool_call_id": "string",
    "parameters": {}
  }
}
```

#### `contextual_update`
Text context added to conversation state (non-interrupting).
```json
{
  "type": "contextual_update",
  "contextual_update_event": {
    "text": "string"
  }
}
```

#### `vad_score`
Voice Activity Detection confidence score (0.0–1.0).
```json
{
  "type": "vad_score",
  "vad_score_event": {
    "vad_score": 0.85
  }
}
```

#### `internal_tentative_agent_response`
Preliminary agent text during LLM generation (not final).
```json
{
  "type": "internal_tentative_agent_response",
  "tentative_agent_response_internal_event": {
    "tentative_agent_response": "string"
  }
}
```

---

### Messages TO Server (Publish / Send)

#### `user_audio_chunk`
Microphone audio data. Send continuously during user speech.
```json
{
  "user_audio_chunk": "BASE64_PCM_16BIT_16KHZ_MONO"
}
```
Audio must be: **PCM 16-bit signed, 16000 Hz, mono, little-endian**, then Base64-encoded.

#### `pong`
Reply to server `ping` to keep connection alive.
```json
{
  "type": "pong",
  "event_id": 1
}
```

#### `conversation_initiation_client_data`
Override agent configuration at connection time. Send before or just after connecting.
```json
{
  "type": "conversation_initiation_client_data",
  "conversation_config_override": {
    "agent": {
      "prompt": { "prompt": "Custom system prompt override" },
      "first_message": "Hello! How can I help?",
      "language": "en"
    },
    "tts": {
      "voice_id": "string",
      "speed": 1.0,
      "stability": 0.5,
      "similarity_boost": 0.75
    }
  },
  "dynamic_variables": {
    "user_name": "Alice",
    "session_id": 12345
  }
}
```

Config override ranges:
- `tts.speed`: 0.7 – 1.2
- `tts.stability`: 0.0 – 1.0
- `tts.similarity_boost`: 0.0 – 1.0

#### `client_tool_result`
Response to a `client_tool_call` from the server.
```json
{
  "type": "client_tool_result",
  "tool_call_id": "string",
  "result": "tool output string",
  "is_error": false
}
```

#### `contextual_update`
Inject context without interrupting the conversation.
```json
{
  "type": "contextual_update",
  "text": "User just entered room 4B"
}
```

#### `user_message`
Send a text message (no mic audio needed).
```json
{
  "type": "user_message",
  "text": "What is the weather like?"
}
```

#### `user_activity`
Signal that user is active (for turn detection in client mode).
```json
{
  "type": "user_activity"
}
```

---

## 4. Signed URL (Private Agents)

Used for browser/mobile clients to authenticate without exposing the API key.

### Flow
1. **Backend** calls ElevenLabs API to get a temporary signed URL
2. Backend returns signed URL to client
3. **Client** opens WebSocket to the signed URL (no API key needed)

### Get Signed URL
```http
GET https://api.elevenlabs.io/v1/convai/conversation/get-signed-url?agent_id=<AGENT_ID>
xi-api-key: YOUR_API_KEY
```

Optional query params:
- `include_conversation_id=true` — generates unique conversation ID, prevents URL reuse
- `branch_id` — specific agent branch

Response:
```json
{
  "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..."
}
```

Client connects to `signed_url` directly — no headers needed.

---

## 5. Agents REST API

Base URL: `https://api.elevenlabs.io`
Auth header: `xi-api-key: YOUR_API_KEY`

### Create Agent
```http
POST /v1/convai/agents/create
Content-Type: application/json

{
  "name": "My NPC Agent",
  "conversation_config": {
    "agent": {
      "first_message": "Hello adventurer!",
      "prompt": { "prompt": "You are a wise tavern keeper in a fantasy world." },
      "language": "en"
    }
  }
}
```
Response includes `agent_id`.

### List Agents
```http
GET /v1/convai/agents?page_size=30&search=&sort_by=created_at&sort_direction=desc
```
Response:
```json
{
  "agents": [
    {
      "agent_id": "abc123xyz",
      "name": "My NPC Agent",
      "created_at_unix_secs": 1708300000,
      "last_call_time_unix_secs": null,
      "archived": false,
      "tags": []
    }
  ],
  "has_more": false,
  "next_cursor": null
}
```

### Get Agent
```http
GET /v1/convai/agents/{agent_id}
```

### Update Agent
```http
PATCH /v1/convai/agents/{agent_id}
Content-Type: application/json
{ "name": "Updated Name", "conversation_config": { ... } }
```

### Delete Agent
```http
DELETE /v1/convai/agents/{agent_id}
```

---

## 6. Turn Modes

### Server VAD (Default / Recommended)
- ElevenLabs server detects when user stops speaking
- Client streams audio continuously
- Server handles all turn-taking automatically

### Client Turn Mode
- Client explicitly signals turn boundaries
- Send `user_activity` to indicate user is speaking
- Use when you have your own VAD or push-to-talk UI

---

## 7. Audio Pipeline (UE5 Implementation Notes)

```
Microphone (FAudioCapture)
  → float32 samples at device rate (e.g. 44100 Hz stereo)
  → Resample to 16000 Hz mono
  → Convert float32 → int16 little-endian
  → Base64-encode
  → Send as {"user_audio_chunk": "BASE64"}

Server → {"type":"audio","audio_event":{"audio_base_64":"BASE64"}}
  → Base64-decode
  → Raw PCM bytes
  → Push to USoundWaveProcedural
  → UAudioComponent plays back
```

### Float32 → Int16 Conversion (C++)
```cpp
static TArray<uint8> FloatPCMToInt16Bytes(const TArray<float>& FloatSamples)
{
    TArray<uint8> Bytes;
    Bytes.SetNumUninitialized(FloatSamples.Num() * 2);
    for (int32 i = 0; i < FloatSamples.Num(); i++)
    {
        float Clamped = FMath::Clamp(FloatSamples[i], -1.f, 1.f);
        int16 Sample = (int16)(Clamped * 32767.f);
        Bytes[i * 2]     = (uint8)(Sample & 0xFF);        // Low byte
        Bytes[i * 2 + 1] = (uint8)((Sample >> 8) & 0xFF); // High byte
    }
    return Bytes;
}
```

---

## 8. Quick Integration Checklist (UE5 Plugin)

- [ ] Set `AgentID` in `UElevenLabsSettings` (Project Settings → ElevenLabs AI Agent)
  - Or override per-component via `UElevenLabsConversationalAgentComponent::AgentID`
- [ ] Set `API_Key` in settings (or leave empty for public agents)
- [ ] Add `UElevenLabsConversationalAgentComponent` to your NPC actor
- [ ] Set `TurnMode` (default: `Server` — recommended)
- [ ] Bind to events: `OnAgentConnected`, `OnAgentTranscript`, `OnAgentTextResponse`, `OnAgentStartedSpeaking`, `OnAgentStoppedSpeaking`
- [ ] Call `StartConversation()` to begin
- [ ] Call `EndConversation()` when done

---

## 9. Key API URLs Reference

| Purpose | URL |
|---------|-----|
| Dashboard | https://elevenlabs.io/app/conversational-ai |
| API Keys | https://elevenlabs.io/app/settings/api-keys |
| WebSocket endpoint | wss://api.elevenlabs.io/v1/convai/conversation |
| Agents list | GET https://api.elevenlabs.io/v1/convai/agents |
| Agent by ID | GET https://api.elevenlabs.io/v1/convai/agents/{agent_id} |
| Create agent | POST https://api.elevenlabs.io/v1/convai/agents/create |
| Signed URL | GET https://api.elevenlabs.io/v1/convai/conversation/get-signed-url |
| WS protocol docs | https://elevenlabs.io/docs/eleven-agents/api-reference/eleven-agents/websocket |
| Quickstart | https://elevenlabs.io/docs/eleven-agents/quickstart |