# PS_AI_Agent_ElevenLabs — Plugin Documentation **Engine**: Unreal Engine 5.5 **Plugin version**: 1.1.0 **Status**: Beta — tested on UE 5.5 Win64, verified connection and audio pipeline **API**: [ElevenLabs Conversational AI](https://elevenlabs.io/docs/eleven-agents/quickstart) --- ## Table of Contents 1. [Overview](#1-overview) 2. [Installation](#2-installation) 3. [Project Settings](#3-project-settings) 4. [Quick Start (Blueprint)](#4-quick-start-blueprint) 5. [Quick Start (C++)](#5-quick-start-c) 6. [Components Reference](#6-components-reference) - [UElevenLabsConversationalAgentComponent](#uelevenlabsconversationalagentcomponent) - [UElevenLabsMicrophoneCaptureComponent](#uelevenlabsmicrophonecapturecomponent) - [UElevenLabsWebSocketProxy](#uelevenlabswebsocketproxy) 7. [Data Types Reference](#7-data-types-reference) 8. [Turn Modes](#8-turn-modes) 9. [Security — Signed URL Mode](#9-security--signed-url-mode) 10. [Audio Pipeline](#10-audio-pipeline) 11. [Common Patterns](#11-common-patterns) 12. [Troubleshooting](#12-troubleshooting) 13. [Changelog](#13-changelog) --- ## 1. Overview This plugin integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5, enabling real-time voice conversations between a player and an NPC (or any Actor). ### How it works ``` Player microphone │ ▼ UElevenLabsMicrophoneCaptureComponent • Captures from default audio device • Resamples to 16 kHz mono float32 │ ▼ UElevenLabsConversationalAgentComponent • Converts float32 → int16 PCM bytes • Base64-encodes and sends via WebSocket │ (wss://api.elevenlabs.io/v1/convai/conversation) ▼ ElevenLabs Conversational AI Agent • Transcribes speech • Runs LLM • Synthesizes voice (ElevenLabs TTS) │ ▼ UElevenLabsConversationalAgentComponent • Receives raw binary PCM audio frames • Feeds USoundWaveProcedural → UAudioComponent │ ▼ Agent voice plays from the Actor's position in the world ``` ### Key properties - No gRPC, no third-party libraries — uses UE's built-in `WebSockets` and `AudioCapture` modules - Blueprint-first: all events and controls are exposed to Blueprint - Real-time bidirectional: audio streams in both directions simultaneously - Server VAD (default) or push-to-talk - Text input supported (no microphone needed for testing) ### Wire frame protocol notes ElevenLabs sends **all WebSocket frames as binary** (not text frames). The plugin handles two binary frame types automatically: - **JSON control frames** (start with `{`) — conversation init, transcripts, agent responses, ping/pong - **Raw PCM audio frames** (binary) — agent speech audio, played directly via `USoundWaveProcedural` --- ## 2. Installation The plugin lives inside the project, not the engine, so no separate install is needed. ### Verify it is enabled Open `Unreal/PS_AI_Agent/PS_AI_Agent.uproject` and confirm: ```json { "Name": "PS_AI_Agent_ElevenLabs", "Enabled": true } ``` ### First compile Open the project in the UE 5.5 Editor. It will detect the new plugin and ask to recompile — click **Yes**. Alternatively, compile from the command line: ``` "C:\Program Files\Epic Games\UE_5.5\Engine\Build\BatchFiles\Build.bat" PS_AI_AgentEditor Win64 Development "/Unreal/PS_AI_Agent/PS_AI_Agent.uproject" -WaitMutex ``` --- ## 3. Project Settings Go to **Edit → Project Settings → Plugins → ElevenLabs AI Agent**. | Setting | Description | Required | |---|---|---| | **API Key** | Your ElevenLabs API key. Find it at [elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys) | Yes (unless using Signed URL Mode or a public agent) | | **Agent ID** | Default agent ID. Find it in the URL when editing an agent: `elevenlabs.io/app/conversational-ai/agents/` | Yes (unless set per-component) | | **Signed URL Mode** | Fetch the WS URL from your own backend (keeps key off client). See [Section 9](#9-security--signed-url-mode) | No | | **Signed URL Endpoint** | Your backend URL returning `{ "signed_url": "wss://..." }` | Only if Signed URL Mode = true | | **Custom WebSocket URL** | Override the default `wss://api.elevenlabs.io/...` endpoint (debug only) | No | | **Verbose Logging** | Log every WebSocket frame type and first bytes to Output Log | No | > **Security note**: The API key set in Project Settings is saved to `DefaultEngine.ini`. **Never commit this file with the key in it** — strip the `[ElevenLabsSettings]` section before committing. Use Signed URL Mode for production builds. > **Finding your Agent ID**: Go to [elevenlabs.io/app/conversational-ai](https://elevenlabs.io/app/conversational-ai), click your agent, and copy the ID from the URL bar or the agent's Overview/API tab. --- ## 4. Quick Start (Blueprint) ### Step 1 — Add the component to an NPC 1. Open your NPC Blueprint (or any Actor Blueprint). 2. In the **Components** panel, click **Add** → search for **ElevenLabs Conversational Agent**. 3. Select the component. In the **Details** panel you can optionally set a specific **Agent ID** (overrides the project default). ### Step 2 — Set Turn Mode In the component's **Details** panel: - **Server VAD** (default): ElevenLabs automatically detects when the player stops speaking. Microphone streams continuously once connected. - **Client Controlled**: You call `Start Listening` / `Stop Listening` manually (push-to-talk). ### Step 3 — Wire up events in the Event Graph ``` Event BeginPlay └─► [ElevenLabs Agent] Start Conversation [ElevenLabs Agent] On Agent Connected └─► Print String "Connected! ConvID: " + Conversation Info → Conversation ID [ElevenLabs Agent] On Agent Text Response └─► Set Text (UI widget) ← Response Text [ElevenLabs Agent] On Agent Transcript └─► (optional) display live subtitles ← Segment → Text [ElevenLabs Agent] On Agent Started Speaking └─► Play talking animation on NPC [ElevenLabs Agent] On Agent Stopped Speaking └─► Return to idle animation [ElevenLabs Agent] On Agent Error └─► Print String "Error: " + Error Message Event EndPlay └─► [ElevenLabs Agent] End Conversation ``` ### Step 4 — Push-to-talk (Client Controlled mode only) ``` Input Action "Talk" (Pressed) └─► [ElevenLabs Agent] Start Listening Input Action "Talk" (Released) └─► [ElevenLabs Agent] Stop Listening ``` ### Step 5 — Testing without a microphone Once connected, use **Send Text Message** instead of speaking: ``` [ElevenLabs Agent] On Agent Connected └─► [ElevenLabs Agent] Send Text Message ← "Hello, who are you?" ``` The agent will reply with audio and text exactly as if it heard you speak. --- ## 5. Quick Start (C++) ### 1. Add the plugin to your module's Build.cs ```csharp PrivateDependencyModuleNames.Add("PS_AI_Agent_ElevenLabs"); ``` ### 2. Include and use ```cpp #include "ElevenLabsConversationalAgentComponent.h" #include "ElevenLabsDefinitions.h" // In your Actor's header: UPROPERTY(VisibleAnywhere) UElevenLabsConversationalAgentComponent* ElevenLabsAgent; // In the constructor: ElevenLabsAgent = CreateDefaultSubobject( TEXT("ElevenLabsAgent")); // Override Agent ID at runtime (optional): ElevenLabsAgent->AgentID = TEXT("your_agent_id_here"); ElevenLabsAgent->TurnMode = EElevenLabsTurnMode::Server; ElevenLabsAgent->bAutoStartListening = true; // Bind events: ElevenLabsAgent->OnAgentConnected.AddDynamic( this, &AMyNPC::HandleAgentConnected); ElevenLabsAgent->OnAgentTextResponse.AddDynamic( this, &AMyNPC::HandleAgentResponse); ElevenLabsAgent->OnAgentStartedSpeaking.AddDynamic( this, &AMyNPC::PlayTalkingAnimation); // Start the conversation: ElevenLabsAgent->StartConversation(); // Send a text message (useful for testing without mic): ElevenLabsAgent->SendTextMessage(TEXT("Hello, who are you?")); // Later, to end: ElevenLabsAgent->EndConversation(); ``` ### 3. Callback signatures ```cpp UFUNCTION() void HandleAgentConnected(const FElevenLabsConversationInfo& Info) { UE_LOG(LogTemp, Log, TEXT("Connected, ConvID=%s"), *Info.ConversationID); } UFUNCTION() void HandleAgentResponse(const FString& ResponseText) { // Display in UI, drive subtitles, etc. } UFUNCTION() void PlayTalkingAnimation() { // Switch to talking anim montage } ``` --- ## 6. Components Reference ### UElevenLabsConversationalAgentComponent The **main component** — attach this to any Actor that should be able to speak. **Category**: ElevenLabs **Inherits from**: `UActorComponent` #### Properties | Property | Type | Default | Description | |---|---|---|---| | `AgentID` | `FString` | `""` | Agent ID for this actor. Overrides the project-level default when non-empty. | | `TurnMode` | `EElevenLabsTurnMode` | `Server` | How speaker turns are detected. See [Section 8](#8-turn-modes). | | `bAutoStartListening` | `bool` | `true` | If true, starts mic capture automatically once the WebSocket is connected and ready. | #### Functions | Function | Blueprint | Description | |---|---|---| | `StartConversation()` | Callable | Opens the WebSocket connection. If `bAutoStartListening` is true, mic capture starts once `OnAgentConnected` fires. | | `EndConversation()` | Callable | Closes the WebSocket, stops mic, stops audio playback. | | `StartListening()` | Callable | Starts microphone capture and streams to ElevenLabs. In Client mode, also sends `user_activity`. | | `StopListening()` | Callable | Stops microphone capture. In Client mode, stops sending `user_activity`. | | `SendTextMessage(Text)` | Callable | Sends a text message to the agent without using the microphone. Agent replies with full audio + text. Useful for testing. | | `InterruptAgent()` | Callable | Stops the agent's current utterance immediately and clears the audio queue. | | `IsConnected()` | Pure | Returns true if the WebSocket is open and the conversation is active. | | `IsListening()` | Pure | Returns true if the microphone is currently capturing. | | `IsAgentSpeaking()` | Pure | Returns true if agent audio is currently playing. | | `GetConversationInfo()` | Pure | Returns `FElevenLabsConversationInfo` (ConversationID, AgentID). | | `GetWebSocketProxy()` | Pure | Returns the underlying `UElevenLabsWebSocketProxy` for advanced use. | #### Events | Event | Parameters | Fired when | |---|---|---| | `OnAgentConnected` | `FElevenLabsConversationInfo` | WebSocket handshake + agent initiation metadata received. Safe to call `SendTextMessage` here. | | `OnAgentDisconnected` | `int32 StatusCode`, `FString Reason` | WebSocket closed (graceful or remote). | | `OnAgentError` | `FString ErrorMessage` | Connection or protocol error. | | `OnAgentTranscript` | `FElevenLabsTranscriptSegment` | User speech-to-text transcript received (speaker is always `"user"`). | | `OnAgentTextResponse` | `FString ResponseText` | Final text response from the agent (mirrors the audio). | | `OnAgentStartedSpeaking` | — | First audio chunk received from the agent (audio playback begins). | | `OnAgentStoppedSpeaking` | — | Audio queue empty for ~0.5 s (heuristic — agent done speaking). | | `OnAgentInterrupted` | — | Agent speech was interrupted (by user or by `InterruptAgent()`). | --- ### UElevenLabsMicrophoneCaptureComponent A lightweight microphone capture component. Managed automatically by `UElevenLabsConversationalAgentComponent` — you only need to use this directly for advanced scenarios (e.g. custom audio routing). **Category**: ElevenLabs **Inherits from**: `UActorComponent` #### Properties | Property | Type | Default | Description | |---|---|---|---| | `VolumeMultiplier` | `float` | `1.0` | Gain applied to captured samples before resampling. Range: 0.0 – 4.0. | #### Functions | Function | Blueprint | Description | |---|---|---| | `StartCapture()` | Callable | Opens the default audio input device and begins streaming. | | `StopCapture()` | Callable | Stops streaming and closes the device. | | `IsCapturing()` | Pure | True while actively capturing. | #### Delegate `OnAudioCaptured` — fires on the **game thread** with `TArray` PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually. --- ### UElevenLabsWebSocketProxy Low-level WebSocket session manager. Used internally by `UElevenLabsConversationalAgentComponent`. Use this directly only if you need fine-grained protocol control. **Inherits from**: `UObject` **Instantiate via**: `NewObject(Outer)` #### Key functions | Function | Description | |---|---| | `Connect(AgentID, APIKey)` | Open the WS connection. Parameters override project settings when non-empty. | | `Disconnect()` | Send close frame and tear down the connection. | | `SendAudioChunk(PCMData)` | Send raw int16 LE PCM bytes as a Base64 JSON frame. Called automatically by the agent component. | | `SendTextMessage(Text)` | Send `{"type":"user_message","text":"..."}`. Agent replies as if it heard speech. | | `SendUserTurnStart()` | Client turn mode: sends `{"type":"user_activity"}` to signal user is speaking. | | `SendUserTurnEnd()` | Client turn mode: stops sending `user_activity` (no explicit message — server detects silence). | | `SendInterrupt()` | Ask the agent to stop speaking: sends `{"type":"interrupt"}`. | | `GetConnectionState()` | Returns `EElevenLabsConnectionState`. | | `GetConversationInfo()` | Returns `FElevenLabsConversationInfo`. | --- ## 7. Data Types Reference ### EElevenLabsConnectionState ``` Disconnected — No active connection Connecting — WebSocket handshake in progress / awaiting conversation_initiation_metadata Connected — Conversation active and ready (fires OnAgentConnected) Error — Connection or protocol failure ``` > Note: State remains `Connecting` until the server sends `conversation_initiation_metadata`. `OnAgentConnected` fires on transition to `Connected`. ### EElevenLabsTurnMode ``` Server — ElevenLabs Voice Activity Detection decides when the user stops speaking (recommended) Client — Your code calls StartListening/StopListening to define turns (push-to-talk) ``` ### FElevenLabsConversationInfo ``` ConversationID FString — Unique session ID assigned by ElevenLabs AgentID FString — The agent ID for this session ``` ### FElevenLabsTranscriptSegment ``` Text FString — Transcribed text Speaker FString — "user" (agent text comes via OnAgentTextResponse, not transcript) bIsFinal bool — Always true for user transcripts (ElevenLabs sends final only) ``` --- ## 8. Turn Modes ### Server VAD (default) ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking. **When to use**: Casual conversation, hands-free interaction, natural dialogue. ``` StartConversation() → mic streams continuously (if bAutoStartListening = true) ElevenLabs detects speech / silence automatically Agent replies when it detects end-of-speech ``` ### Client Controlled (push-to-talk) Your code explicitly signals turn boundaries with `StartListening()` / `StopListening()`. The plugin sends `{"type":"user_activity"}` while the user is speaking; stopping it signals end of turn. **When to use**: Noisy environments, precise control, walkie-talkie style UI. ``` Input Pressed → StartListening() → streams audio + sends user_activity Input Released → StopListening() → stops audio (no explicit end message) Server detects silence and hands turn to agent ``` --- ## 9. Security — Signed URL Mode By default, the API key is stored in Project Settings (`DefaultEngine.ini`). This is fine for development but **should not be shipped in packaged builds** as the key could be extracted. ### Production setup 1. Enable **Signed URL Mode** in Project Settings. 2. Set **Signed URL Endpoint** to a URL on your own backend (e.g. `https://your-server.com/api/elevenlabs-token`). 3. Your backend authenticates the player and calls the ElevenLabs API to generate a signed WebSocket URL, returning: ```json { "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..." } ``` 4. The plugin fetches this URL before connecting — the API key never leaves your server. ### Development workflow (API key in project settings) - Set the key in **Project Settings → Plugins → ElevenLabs AI Agent** - UE saves it to `DefaultEngine.ini` under `[/Script/PS_AI_Agent_ElevenLabs.ElevenLabsSettings]` - **Strip this section from `DefaultEngine.ini` before every git commit** - Each developer sets the key locally — it does not go in version control --- ## 10. Audio Pipeline ### Input (player → agent) ``` Device (any sample rate, any channels) ↓ FAudioCapture — UE built-in (UE 5.3+ API: OpenAudioCaptureStream) ↓ Callback: const void* → cast to float32 interleaved frames ↓ Downmix to mono (average all channels) ↓ Resample to 16000 Hz (linear interpolation) ↓ Apply VolumeMultiplier ↓ Dispatch to Game Thread (AsyncTask) ↓ Convert float32 → int16 signed, little-endian bytes ↓ Base64 encode ↓ Send as binary WebSocket frame: { "user_audio_chunk": "" } ``` ### Output (agent → player) ``` Binary WebSocket frame arrives ↓ Peek first byte: • '{' → UTF-8 JSON: parse type field, dispatch to handler • other → raw PCM audio bytes ↓ [Audio path] Raw int16 LE PCM bytes at 16000 Hz mono ↓ Enqueue in thread-safe AudioQueue (FCriticalSection) ↓ USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue ↓ UAudioComponent plays from the Actor's world position (3D spatialized) ``` **Audio format** (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian. ### Silence detection heuristic `OnAgentStoppedSpeaking` fires when the `AudioQueue` has been empty for **30 consecutive ticks** (~0.5 s at 60 fps). If the agent has natural pauses, increase `SilenceThresholdTicks` in the header: ```cpp static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s ``` --- ## 11. Common Patterns ### Test the connection without a microphone ``` BeginPlay → StartConversation() OnAgentConnected → SendTextMessage("Hello, introduce yourself") OnAgentTextResponse → Print string (confirms text pipeline works) OnAgentStartedSpeaking → (confirms audio pipeline works) ``` ### Show subtitles in UI ``` OnAgentTranscript: Segment → Text → show in player subtitle widget (speaker always "user") OnAgentTextResponse: ResponseText → show in NPC speech bubble ``` ### Interrupt the agent when the player starts speaking In Server VAD mode ElevenLabs handles this automatically. For manual control: ``` OnAgentStartedSpeaking → set "agent is speaking" flag Input Action (any) → if agent is speaking → InterruptAgent() ``` ### Multiple NPCs with different agents Each NPC Blueprint has its own `UElevenLabsConversationalAgentComponent`. Set a different `AgentID` on each component. WebSocket connections are fully independent. ### Only start the conversation when the player is nearby ``` On Begin Overlap (trigger volume around NPC) └─► [ElevenLabs Agent] Start Conversation On End Overlap └─► [ElevenLabs Agent] End Conversation ``` ### Adjust microphone volume Get the `UElevenLabsMicrophoneCaptureComponent` from the owner and set `VolumeMultiplier`: ```cpp UElevenLabsMicrophoneCaptureComponent* Mic = GetOwner()->FindComponentByClass(); if (Mic) Mic->VolumeMultiplier = 2.0f; ``` --- ## 12. Troubleshooting ### Plugin doesn't appear in Project Settings Ensure the plugin is enabled in `.uproject` and the project was recompiled after adding it. ### WebSocket connection fails immediately - Check the **API Key** is set correctly in Project Settings. - Check the **Agent ID** exists in your ElevenLabs account (find it in the dashboard URL or via `GET /v1/convai/agents`). - Enable **Verbose Logging** in Project Settings and check Output Log for the exact WS URL and error. - Ensure port 443 (WSS) is not blocked by your firewall. ### `OnAgentConnected` never fires - Connection was made but `conversation_initiation_metadata` not received yet — check Verbose Logging. - If you see `"Binary audio frame"` logs but no `"Conversation initiated"` — the initiation JSON frame may be arriving as a non-`{` binary frame. Check the hex prefix logged at Verbose level. ### No audio from the microphone - Windows may require microphone permission. Check **Settings → Privacy → Microphone**. - Try setting `VolumeMultiplier` to `2.0` on the `MicrophoneCaptureComponent`. - Check Output Log for `"Failed to open default audio capture stream"`. ### Agent audio is choppy or silent - The `USoundWaveProcedural` queue may be underflowing due to network jitter. Check latency. - Verify the audio format matches: plugin expects raw PCM 16-bit 16 kHz mono from the server. If ElevenLabs sends a different format (e.g. mp3_44100), audio will sound garbled — check `agent_output_audio_format` in the `conversation_initiation_metadata` via Verbose Logging. - Ensure no other component is using the same `UAudioComponent`. ### `OnAgentStoppedSpeaking` fires too early Increase `SilenceThresholdTicks` in `ElevenLabsConversationalAgentComponent.h`: ```cpp static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s at 60fps ``` ### Build error: "Plugin AudioCapture not found" Make sure the `AudioCapture` plugin is enabled. It should be auto-enabled via the `.uplugin` dependency, but you can add it manually to `.uproject`: ```json { "Name": "AudioCapture", "Enabled": true } ``` ### `"Received unexpected binary WebSocket frame"` in the log This warning no longer appears in v1.1.0+. If you see it, you are running an older build — recompile the plugin. --- ## 13. Changelog ### v1.1.0 — 2026-02-19 **Bug fixes:** - **Binary WebSocket frames**: ElevenLabs sends all frames as binary (not text). All frames were previously discarded. Now correctly handled — JSON control frames decoded as UTF-8, raw PCM audio frames routed directly to the audio queue. - **Transcript message**: Wrong message type (`"transcript"` → `"user_transcript"`), wrong event key (`"transcript_event"` → `"user_transcription_event"`), wrong text field (`"message"` → `"user_transcript"`). - **Pong format**: `event_id` was nested inside a `pong_event` object; corrected to top-level field per API spec. - **Client turn mode**: `user_turn_start`/`user_turn_end` are not valid API messages; replaced with `user_activity` (start) and implicit silence (end). **New features:** - `SendTextMessage(Text)` on both `UElevenLabsConversationalAgentComponent` and `UElevenLabsWebSocketProxy` — send text to the agent without a microphone. Useful for testing. - Verbose logging shows binary frame hex preview and JSON frame content prefix. - Improved JSON parse error log now shows the first 80 characters of the failing message. ### v1.0.0 — 2026-02-19 Initial implementation. Plugin compiles cleanly on UE 5.5 Win64. --- *Documentation updated 2026-02-19 — Plugin v1.1.0 — UE 5.5*