# PS_AI_Agent_ElevenLabs — Plugin Documentation **Engine**: Unreal Engine 5.5 **Plugin version**: 1.0.0 **Status**: Beta **API**: [ElevenLabs Conversational AI](https://elevenlabs.io/docs/conversational-ai) --- ## Table of Contents 1. [Overview](#1-overview) 2. [Installation](#2-installation) 3. [Project Settings](#3-project-settings) 4. [Quick Start (Blueprint)](#4-quick-start-blueprint) 5. [Quick Start (C++)](#5-quick-start-c) 6. [Components Reference](#6-components-reference) - [UElevenLabsConversationalAgentComponent](#uelevenlabsconversationalagentcomponent) - [UElevenLabsMicrophoneCaptureComponent](#uelevenlabsmicrophonecapturecomponent) - [UElevenLabsWebSocketProxy](#uelevenlabswebsocketproxy) 7. [Data Types Reference](#7-data-types-reference) 8. [Turn Modes](#8-turn-modes) 9. [Security — Signed URL Mode](#9-security--signed-url-mode) 10. [Audio Pipeline](#10-audio-pipeline) 11. [Common Patterns](#11-common-patterns) 12. [Troubleshooting](#12-troubleshooting) --- ## 1. Overview This plugin integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5, enabling real-time voice conversations between a player and an NPC (or any Actor). ### How it works ``` Player microphone │ ▼ UElevenLabsMicrophoneCaptureComponent • Captures from default audio device • Resamples to 16 kHz mono float32 │ ▼ UElevenLabsConversationalAgentComponent • Converts float32 → int16 PCM bytes • Sends via WebSocket to ElevenLabs │ (wss://api.elevenlabs.io/v1/convai/conversation) ▼ ElevenLabs Conversational AI Agent • Transcribes speech • Runs LLM • Synthesizes voice (ElevenLabs TTS) │ ▼ UElevenLabsConversationalAgentComponent • Receives Base64 PCM audio chunks • Feeds USoundWaveProcedural → UAudioComponent │ ▼ Agent voice plays from the Actor's position in the world ``` ### Key properties - No gRPC, no third-party libraries — uses UE's built-in `WebSockets` and `AudioCapture` modules - Blueprint-first: all events and controls are exposed to Blueprint - Real-time bidirectional: audio streams in both directions simultaneously - Server VAD (default) or push-to-talk --- ## 2. Installation The plugin lives inside the project, not the engine, so no separate install is needed. ### Verify it is enabled Open `Unreal/PS_AI_Agent/PS_AI_Agent.uproject` and confirm: ```json { "Name": "PS_AI_Agent_ElevenLabs", "Enabled": true } ``` ### First compile Open the project in the UE 5.5 Editor. It will detect the new plugin and ask to recompile — click **Yes**. Alternatively, compile from the command line: ``` "C:\Program Files\Epic Games\UE_5.5\Engine\Build\BatchFiles\Build.bat" PS_AI_AgentEditor Win64 Development "/Unreal/PS_AI_Agent/PS_AI_Agent.uproject" -WaitMutex ``` --- ## 3. Project Settings Go to **Edit → Project Settings → Plugins → ElevenLabs AI Agent**. | Setting | Description | Required | |---|---|---| | **API Key** | Your ElevenLabs API key from [elevenlabs.io](https://elevenlabs.io) | Yes (unless using Signed URL Mode) | | **Agent ID** | Default agent ID. Create agents at [elevenlabs.io/app/conversational-ai](https://elevenlabs.io/app/conversational-ai) | Yes (unless set per-component) | | **Signed URL Mode** | Fetch the WS URL from your own backend (keeps key off client). See [Section 9](#9-security--signed-url-mode) | No | | **Signed URL Endpoint** | Your backend URL returning `{ "signed_url": "wss://..." }` | Only if Signed URL Mode = true | | **Custom WebSocket URL** | Override the default `wss://api.elevenlabs.io/...` endpoint (debug only) | No | | **Verbose Logging** | Log every WebSocket JSON frame to Output Log | No | > **Security note**: Never ship with the API key hard-coded in a packaged build. Use Signed URL Mode for production, or load the key at runtime from a secure backend. --- ## 4. Quick Start (Blueprint) ### Step 1 — Add the component to an NPC 1. Open your NPC Blueprint (or any Actor Blueprint). 2. In the **Components** panel, click **Add** → search for **ElevenLabs Conversational Agent**. 3. Select the component. In the **Details** panel you can optionally set a specific **Agent ID** (overrides the project default). ### Step 2 — Set Turn Mode In the component's **Details** panel: - **Server VAD** (default): ElevenLabs automatically detects when the player stops speaking. Microphone streams continuously once connected. - **Client Controlled**: You call `Start Listening` / `Stop Listening` manually (push-to-talk). ### Step 3 — Wire up events in the Event Graph ``` Event BeginPlay └─► [ElevenLabs Agent] Start Conversation [ElevenLabs Agent] On Agent Connected └─► Print String "Connected! ID: " + Conversation Info → Conversation ID [ElevenLabs Agent] On Agent Text Response └─► Set Text (UI widget) ← Response Text [ElevenLabs Agent] On Agent Transcript └─► (optional) display live subtitles ← Segment → Text [ElevenLabs Agent] On Agent Started Speaking └─► Play talking animation on NPC [ElevenLabs Agent] On Agent Stopped Speaking └─► Return to idle animation [ElevenLabs Agent] On Agent Error └─► Print String "Error: " + Error Message Event EndPlay └─► [ElevenLabs Agent] End Conversation ``` ### Step 4 — Push-to-talk (Client Controlled mode only) ``` Input Action "Talk" (Pressed) └─► [ElevenLabs Agent] Start Listening Input Action "Talk" (Released) └─► [ElevenLabs Agent] Stop Listening ``` --- ## 5. Quick Start (C++) ### 1. Add the plugin to your module's Build.cs ```csharp PrivateDependencyModuleNames.Add("PS_AI_Agent_ElevenLabs"); ``` ### 2. Include and use ```cpp #include "ElevenLabsConversationalAgentComponent.h" #include "ElevenLabsDefinitions.h" // In your Actor's header: UPROPERTY(VisibleAnywhere) UElevenLabsConversationalAgentComponent* ElevenLabsAgent; // In the constructor: ElevenLabsAgent = CreateDefaultSubobject( TEXT("ElevenLabsAgent")); // Override Agent ID at runtime (optional): ElevenLabsAgent->AgentID = TEXT("your_agent_id_here"); ElevenLabsAgent->TurnMode = EElevenLabsTurnMode::Server; ElevenLabsAgent->bAutoStartListening = true; // Bind events: ElevenLabsAgent->OnAgentConnected.AddDynamic( this, &AMyNPC::HandleAgentConnected); ElevenLabsAgent->OnAgentTextResponse.AddDynamic( this, &AMyNPC::HandleAgentResponse); ElevenLabsAgent->OnAgentStartedSpeaking.AddDynamic( this, &AMyNPC::PlayTalkingAnimation); // Start the conversation: ElevenLabsAgent->StartConversation(); // Later, to end it: ElevenLabsAgent->EndConversation(); ``` ### 3. Callback signatures ```cpp UFUNCTION() void HandleAgentConnected(const FElevenLabsConversationInfo& Info) { UE_LOG(LogTemp, Log, TEXT("Connected, ConvID=%s"), *Info.ConversationID); } UFUNCTION() void HandleAgentResponse(const FString& ResponseText) { // Display in UI, drive subtitles, etc. } UFUNCTION() void PlayTalkingAnimation() { // Switch to talking anim montage } ``` --- ## 6. Components Reference ### UElevenLabsConversationalAgentComponent The **main component** — attach this to any Actor that should be able to speak. **Category**: ElevenLabs **Inherits from**: `UActorComponent` #### Properties | Property | Type | Default | Description | |---|---|---|---| | `AgentID` | `FString` | `""` | Agent ID for this actor. Overrides the project-level default when non-empty. | | `TurnMode` | `EElevenLabsTurnMode` | `Server` | How speaker turns are detected. See [Section 8](#8-turn-modes). | | `bAutoStartListening` | `bool` | `true` | If true, starts mic capture automatically once the WebSocket is ready. | #### Functions | Function | Blueprint | Description | |---|---|---| | `StartConversation()` | Callable | Opens the WebSocket connection. If `bAutoStartListening` is true, mic capture starts once connected. | | `EndConversation()` | Callable | Closes the WebSocket, stops mic, stops audio playback. | | `StartListening()` | Callable | Starts microphone capture. In Client mode, also sends `user_turn_start` to ElevenLabs. | | `StopListening()` | Callable | Stops microphone capture. In Client mode, also sends `user_turn_end`. | | `InterruptAgent()` | Callable | Stops the agent's current utterance immediately. | | `IsConnected()` | Pure | Returns true if the WebSocket is open and the conversation is active. | | `IsListening()` | Pure | Returns true if the microphone is currently capturing. | | `IsAgentSpeaking()` | Pure | Returns true if agent audio is currently playing. | | `GetConversationInfo()` | Pure | Returns `FElevenLabsConversationInfo` (ConversationID, AgentID). | | `GetWebSocketProxy()` | Pure | Returns the underlying `UElevenLabsWebSocketProxy` for advanced use. | #### Events | Event | Parameters | Fired when | |---|---|---| | `OnAgentConnected` | `FElevenLabsConversationInfo` | WebSocket handshake + agent initiation complete. | | `OnAgentDisconnected` | `int32 StatusCode`, `FString Reason` | WebSocket closed (graceful or remote). | | `OnAgentError` | `FString ErrorMessage` | Connection or protocol error. | | `OnAgentTranscript` | `FElevenLabsTranscriptSegment` | Any transcript arrives (user or agent, tentative or final). | | `OnAgentTextResponse` | `FString ResponseText` | Final text response from the agent (complements the audio). | | `OnAgentStartedSpeaking` | — | First audio chunk received from the agent. | | `OnAgentStoppedSpeaking` | — | Audio queue empty for ~0.5 s (agent done speaking). | | `OnAgentInterrupted` | — | Agent speech was interrupted (by user or by `InterruptAgent()`). | --- ### UElevenLabsMicrophoneCaptureComponent A lightweight microphone capture component. Managed automatically by `UElevenLabsConversationalAgentComponent` — you only need to use this directly for advanced scenarios (e.g. custom audio routing). **Category**: ElevenLabs **Inherits from**: `UActorComponent` #### Properties | Property | Type | Default | Description | |---|---|---|---| | `VolumeMultiplier` | `float` | `1.0` | Gain applied to captured samples. Range: 0.0 – 4.0. | #### Functions | Function | Blueprint | Description | |---|---|---| | `StartCapture()` | Callable | Opens the default audio input device and starts streaming. | | `StopCapture()` | Callable | Stops streaming and closes the device. | | `IsCapturing()` | Pure | True while actively capturing. | #### Delegate `OnAudioCaptured` — fires on the game thread with `TArray` PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually. --- ### UElevenLabsWebSocketProxy Low-level WebSocket session manager. Used internally by `UElevenLabsConversationalAgentComponent`. Use this directly only if you need fine-grained protocol control. **Inherits from**: `UObject` **Instantiate via**: `NewObject(Outer)` #### Key functions | Function | Description | |---|---| | `Connect(AgentID, APIKey)` | Open the WS connection. Parameters override project settings when non-empty. | | `Disconnect()` | Send close frame and tear down the connection. | | `SendAudioChunk(PCMData)` | Send raw int16 LE PCM bytes. Called automatically by the agent component. | | `SendUserTurnStart()` | Signal start of user speech (Client turn mode only). | | `SendUserTurnEnd()` | Signal end of user speech (Client turn mode only). | | `SendInterrupt()` | Ask the agent to stop speaking. | | `GetConnectionState()` | Returns `EElevenLabsConnectionState`. | | `GetConversationInfo()` | Returns `FElevenLabsConversationInfo`. | --- ## 7. Data Types Reference ### EElevenLabsConnectionState ``` Disconnected — No active connection Connecting — WebSocket handshake in progress Connected — Conversation active and ready Error — Connection or protocol failure ``` ### EElevenLabsTurnMode ``` Server — ElevenLabs Voice Activity Detection decides when the user stops speaking (recommended) Client — Your code calls StartListening/StopListening to define turns (push-to-talk) ``` ### FElevenLabsConversationInfo ``` ConversationID FString — Unique session ID assigned by ElevenLabs AgentID FString — The agent that responded ``` ### FElevenLabsTranscriptSegment ``` Text FString — Transcribed text Speaker FString — "user" or "agent" bIsFinal bool — false while still speaking, true when the turn is complete ``` --- ## 8. Turn Modes ### Server VAD (default) ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking. **When to use**: Casual conversation, hands-free interaction. ``` StartConversation() → mic streams continuously ElevenLabs detects speech / silence automatically Agent replies when it detects end-of-speech ``` ### Client Controlled (push-to-talk) Your code explicitly signals turn boundaries with `StartListening()` / `StopListening()`. **When to use**: Noisy environments, precise control, walkie-talkie style. ``` Input Pressed → StartListening() → sends user_turn_start + begins audio Input Released → StopListening() → stops audio + sends user_turn_end Agent replies after user_turn_end ``` --- ## 9. Security — Signed URL Mode By default, the API key is stored in Project Settings (Engine.ini). This is fine for development but **should not be shipped in packaged builds** as the key could be extracted. ### Production setup 1. Enable **Signed URL Mode** in Project Settings. 2. Set **Signed URL Endpoint** to a URL on your own backend (e.g. `https://your-server.com/api/elevenlabs-token`). 3. Your backend authenticates the player and calls the ElevenLabs API to generate a signed WebSocket URL, returning: ```json { "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..." } ``` 4. The plugin fetches this URL before connecting — the API key never leaves your server. --- ## 10. Audio Pipeline ### Input (player → agent) ``` Device (any sample rate, any channels) ↓ FAudioCapture (UE built-in) ↓ Callback: float32 interleaved frames ↓ Downmix to mono (average channels) ↓ Resample to 16000 Hz (linear interpolation) ↓ Apply VolumeMultiplier ↓ Dispatch to Game Thread ↓ Convert float32 → int16 LE bytes ↓ Base64 encode ↓ WebSocket JSON frame: { "user_audio_chunk": "" } ``` ### Output (agent → player) ``` WebSocket JSON frame: { "type": "audio", "audio_event": { "audio_base_64": "..." } } ↓ Base64 decode → int16 LE PCM bytes ↓ Enqueue in thread-safe AudioQueue ↓ USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue ↓ UAudioComponent plays from the Actor's world position (3D spatialized) ``` **Audio format** (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian. --- ## 11. Common Patterns ### Show subtitles in UI ``` OnAgentTranscript event: ├─ Segment → Speaker == "user" → show in player subtitle widget ├─ Segment → Speaker == "agent" → show in NPC speech bubble └─ Segment → bIsFinal == false → show as "..." (in-progress) ``` ### Interrupt the agent when the player starts speaking In Server VAD mode ElevenLabs handles this automatically. For manual control: ``` OnAgentStartedSpeaking → store "agent is speaking" flag Input Action (any) → if agent is speaking → InterruptAgent() ``` ### Multiple NPCs with different agents Each NPC Blueprint has its own `UElevenLabsConversationalAgentComponent`. Set a different `AgentID` on each component. Connections are fully independent. ### Only start the conversation when the player is nearby ``` On Begin Overlap (trigger volume around NPC) └─► [ElevenLabs Agent] Start Conversation On End Overlap └─► [ElevenLabs Agent] End Conversation ``` ### Adjusting microphone volume Get the `UElevenLabsMicrophoneCaptureComponent` from the owner and set `VolumeMultiplier`: ```cpp UElevenLabsMicrophoneCaptureComponent* Mic = GetOwner()->FindComponentByClass(); if (Mic) Mic->VolumeMultiplier = 2.0f; ``` --- ## 12. Troubleshooting ### Plugin doesn't appear in Project Settings Ensure the plugin is enabled in `.uproject` and the project was recompiled after adding it. ### WebSocket connection fails immediately - Check the **API Key** is set correctly in Project Settings. - Check the **Agent ID** exists in your ElevenLabs account. - Enable **Verbose Logging** in Project Settings and check the Output Log for the exact WebSocket URL and error. - Make sure your machine has internet access and port 443 (WSS) is not blocked. ### No audio from the microphone - Windows may require microphone permission. Check **Settings → Privacy → Microphone**. - Try setting `VolumeMultiplier` to `2.0` to rule out a volume issue. - Check the Output Log for `"Failed to open default audio capture stream"`. ### Agent audio is choppy or silent - The `USoundWaveProcedural` queue may be underflowing. This can happen if audio chunks arrive with long gaps. Check network latency. - Ensure no other component is consuming the same `UAudioComponent`. ### `OnAgentStoppedSpeaking` fires too early The silence detection threshold is 30 ticks (~0.5 s at 60 fps). If the agent has natural pauses in speech, increase `SilenceThresholdTicks` in `ElevenLabsConversationalAgentComponent.h`: ```cpp static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s ``` ### Build error: "Plugin AudioCapture not found" Make sure the `AudioCapture` plugin is enabled in your project. It should be auto-enabled via the `.uplugin` dependency, but you can also add it manually to `.uproject`: ```json { "Name": "AudioCapture", "Enabled": true } ``` --- *Documentation generated 2026-02-19 — Plugin v1.0.0 — UE 5.5*