Covers: installation, project settings, quick start (Blueprint + C++), full component/API reference, turn modes, security/signed URL mode, audio pipeline diagram, common patterns, and troubleshooting guide. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
18 KiB
PS_AI_Agent_ElevenLabs — Plugin Documentation
Engine: Unreal Engine 5.5 Plugin version: 1.0.0 Status: Beta API: ElevenLabs Conversational AI
Table of Contents
- Overview
- Installation
- Project Settings
- Quick Start (Blueprint)
- Quick Start (C++)
- Components Reference
- Data Types Reference
- Turn Modes
- Security — Signed URL Mode
- Audio Pipeline
- Common Patterns
- Troubleshooting
1. Overview
This plugin integrates the ElevenLabs Conversational AI Agent API into Unreal Engine 5.5, enabling real-time voice conversations between a player and an NPC (or any Actor).
How it works
Player microphone
│
▼
UElevenLabsMicrophoneCaptureComponent
• Captures from default audio device
• Resamples to 16 kHz mono float32
│
▼
UElevenLabsConversationalAgentComponent
• Converts float32 → int16 PCM bytes
• Sends via WebSocket to ElevenLabs
│ (wss://api.elevenlabs.io/v1/convai/conversation)
▼
ElevenLabs Conversational AI Agent
• Transcribes speech
• Runs LLM
• Synthesizes voice (ElevenLabs TTS)
│
▼
UElevenLabsConversationalAgentComponent
• Receives Base64 PCM audio chunks
• Feeds USoundWaveProcedural → UAudioComponent
│
▼
Agent voice plays from the Actor's position in the world
Key properties
- No gRPC, no third-party libraries — uses UE's built-in
WebSocketsandAudioCapturemodules - Blueprint-first: all events and controls are exposed to Blueprint
- Real-time bidirectional: audio streams in both directions simultaneously
- Server VAD (default) or push-to-talk
2. Installation
The plugin lives inside the project, not the engine, so no separate install is needed.
Verify it is enabled
Open Unreal/PS_AI_Agent/PS_AI_Agent.uproject and confirm:
{
"Name": "PS_AI_Agent_ElevenLabs",
"Enabled": true
}
First compile
Open the project in the UE 5.5 Editor. It will detect the new plugin and ask to recompile — click Yes. Alternatively, compile from the command line:
"C:\Program Files\Epic Games\UE_5.5\Engine\Build\BatchFiles\Build.bat"
PS_AI_AgentEditor Win64 Development
"<repo>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject"
-WaitMutex
3. Project Settings
Go to Edit → Project Settings → Plugins → ElevenLabs AI Agent.
| Setting | Description | Required |
|---|---|---|
| API Key | Your ElevenLabs API key from elevenlabs.io | Yes (unless using Signed URL Mode) |
| Agent ID | Default agent ID. Create agents at elevenlabs.io/app/conversational-ai | Yes (unless set per-component) |
| Signed URL Mode | Fetch the WS URL from your own backend (keeps key off client). See Section 9 | No |
| Signed URL Endpoint | Your backend URL returning { "signed_url": "wss://..." } |
Only if Signed URL Mode = true |
| Custom WebSocket URL | Override the default wss://api.elevenlabs.io/... endpoint (debug only) |
No |
| Verbose Logging | Log every WebSocket JSON frame to Output Log | No |
Security note: Never ship with the API key hard-coded in a packaged build. Use Signed URL Mode for production, or load the key at runtime from a secure backend.
4. Quick Start (Blueprint)
Step 1 — Add the component to an NPC
- Open your NPC Blueprint (or any Actor Blueprint).
- In the Components panel, click Add → search for ElevenLabs Conversational Agent.
- Select the component. In the Details panel you can optionally set a specific Agent ID (overrides the project default).
Step 2 — Set Turn Mode
In the component's Details panel:
- Server VAD (default): ElevenLabs automatically detects when the player stops speaking. Microphone streams continuously once connected.
- Client Controlled: You call
Start Listening/Stop Listeningmanually (push-to-talk).
Step 3 — Wire up events in the Event Graph
Event BeginPlay
└─► [ElevenLabs Agent] Start Conversation
[ElevenLabs Agent] On Agent Connected
└─► Print String "Connected! ID: " + Conversation Info → Conversation ID
[ElevenLabs Agent] On Agent Text Response
└─► Set Text (UI widget) ← Response Text
[ElevenLabs Agent] On Agent Transcript
└─► (optional) display live subtitles ← Segment → Text
[ElevenLabs Agent] On Agent Started Speaking
└─► Play talking animation on NPC
[ElevenLabs Agent] On Agent Stopped Speaking
└─► Return to idle animation
[ElevenLabs Agent] On Agent Error
└─► Print String "Error: " + Error Message
Event EndPlay
└─► [ElevenLabs Agent] End Conversation
Step 4 — Push-to-talk (Client Controlled mode only)
Input Action "Talk" (Pressed)
└─► [ElevenLabs Agent] Start Listening
Input Action "Talk" (Released)
└─► [ElevenLabs Agent] Stop Listening
5. Quick Start (C++)
1. Add the plugin to your module's Build.cs
PrivateDependencyModuleNames.Add("PS_AI_Agent_ElevenLabs");
2. Include and use
#include "ElevenLabsConversationalAgentComponent.h"
#include "ElevenLabsDefinitions.h"
// In your Actor's header:
UPROPERTY(VisibleAnywhere)
UElevenLabsConversationalAgentComponent* ElevenLabsAgent;
// In the constructor:
ElevenLabsAgent = CreateDefaultSubobject<UElevenLabsConversationalAgentComponent>(
TEXT("ElevenLabsAgent"));
// Override Agent ID at runtime (optional):
ElevenLabsAgent->AgentID = TEXT("your_agent_id_here");
ElevenLabsAgent->TurnMode = EElevenLabsTurnMode::Server;
ElevenLabsAgent->bAutoStartListening = true;
// Bind events:
ElevenLabsAgent->OnAgentConnected.AddDynamic(
this, &AMyNPC::HandleAgentConnected);
ElevenLabsAgent->OnAgentTextResponse.AddDynamic(
this, &AMyNPC::HandleAgentResponse);
ElevenLabsAgent->OnAgentStartedSpeaking.AddDynamic(
this, &AMyNPC::PlayTalkingAnimation);
// Start the conversation:
ElevenLabsAgent->StartConversation();
// Later, to end it:
ElevenLabsAgent->EndConversation();
3. Callback signatures
UFUNCTION()
void HandleAgentConnected(const FElevenLabsConversationInfo& Info)
{
UE_LOG(LogTemp, Log, TEXT("Connected, ConvID=%s"), *Info.ConversationID);
}
UFUNCTION()
void HandleAgentResponse(const FString& ResponseText)
{
// Display in UI, drive subtitles, etc.
}
UFUNCTION()
void PlayTalkingAnimation()
{
// Switch to talking anim montage
}
6. Components Reference
UElevenLabsConversationalAgentComponent
The main component — attach this to any Actor that should be able to speak.
Category: ElevenLabs
Inherits from: UActorComponent
Properties
| Property | Type | Default | Description |
|---|---|---|---|
AgentID |
FString |
"" |
Agent ID for this actor. Overrides the project-level default when non-empty. |
TurnMode |
EElevenLabsTurnMode |
Server |
How speaker turns are detected. See Section 8. |
bAutoStartListening |
bool |
true |
If true, starts mic capture automatically once the WebSocket is ready. |
Functions
| Function | Blueprint | Description |
|---|---|---|
StartConversation() |
Callable | Opens the WebSocket connection. If bAutoStartListening is true, mic capture starts once connected. |
EndConversation() |
Callable | Closes the WebSocket, stops mic, stops audio playback. |
StartListening() |
Callable | Starts microphone capture. In Client mode, also sends user_turn_start to ElevenLabs. |
StopListening() |
Callable | Stops microphone capture. In Client mode, also sends user_turn_end. |
InterruptAgent() |
Callable | Stops the agent's current utterance immediately. |
IsConnected() |
Pure | Returns true if the WebSocket is open and the conversation is active. |
IsListening() |
Pure | Returns true if the microphone is currently capturing. |
IsAgentSpeaking() |
Pure | Returns true if agent audio is currently playing. |
GetConversationInfo() |
Pure | Returns FElevenLabsConversationInfo (ConversationID, AgentID). |
GetWebSocketProxy() |
Pure | Returns the underlying UElevenLabsWebSocketProxy for advanced use. |
Events
| Event | Parameters | Fired when |
|---|---|---|
OnAgentConnected |
FElevenLabsConversationInfo |
WebSocket handshake + agent initiation complete. |
OnAgentDisconnected |
int32 StatusCode, FString Reason |
WebSocket closed (graceful or remote). |
OnAgentError |
FString ErrorMessage |
Connection or protocol error. |
OnAgentTranscript |
FElevenLabsTranscriptSegment |
Any transcript arrives (user or agent, tentative or final). |
OnAgentTextResponse |
FString ResponseText |
Final text response from the agent (complements the audio). |
OnAgentStartedSpeaking |
— | First audio chunk received from the agent. |
OnAgentStoppedSpeaking |
— | Audio queue empty for ~0.5 s (agent done speaking). |
OnAgentInterrupted |
— | Agent speech was interrupted (by user or by InterruptAgent()). |
UElevenLabsMicrophoneCaptureComponent
A lightweight microphone capture component. Managed automatically by UElevenLabsConversationalAgentComponent — you only need to use this directly for advanced scenarios (e.g. custom audio routing).
Category: ElevenLabs
Inherits from: UActorComponent
Properties
| Property | Type | Default | Description |
|---|---|---|---|
VolumeMultiplier |
float |
1.0 |
Gain applied to captured samples. Range: 0.0 – 4.0. |
Functions
| Function | Blueprint | Description |
|---|---|---|
StartCapture() |
Callable | Opens the default audio input device and starts streaming. |
StopCapture() |
Callable | Stops streaming and closes the device. |
IsCapturing() |
Pure | True while actively capturing. |
Delegate
OnAudioCaptured — fires on the game thread with TArray<float> PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually.
UElevenLabsWebSocketProxy
Low-level WebSocket session manager. Used internally by UElevenLabsConversationalAgentComponent. Use this directly only if you need fine-grained protocol control.
Inherits from: UObject
Instantiate via: NewObject<UElevenLabsWebSocketProxy>(Outer)
Key functions
| Function | Description |
|---|---|
Connect(AgentID, APIKey) |
Open the WS connection. Parameters override project settings when non-empty. |
Disconnect() |
Send close frame and tear down the connection. |
SendAudioChunk(PCMData) |
Send raw int16 LE PCM bytes. Called automatically by the agent component. |
SendUserTurnStart() |
Signal start of user speech (Client turn mode only). |
SendUserTurnEnd() |
Signal end of user speech (Client turn mode only). |
SendInterrupt() |
Ask the agent to stop speaking. |
GetConnectionState() |
Returns EElevenLabsConnectionState. |
GetConversationInfo() |
Returns FElevenLabsConversationInfo. |
7. Data Types Reference
EElevenLabsConnectionState
Disconnected — No active connection
Connecting — WebSocket handshake in progress
Connected — Conversation active and ready
Error — Connection or protocol failure
EElevenLabsTurnMode
Server — ElevenLabs Voice Activity Detection decides when the user stops speaking (recommended)
Client — Your code calls StartListening/StopListening to define turns (push-to-talk)
FElevenLabsConversationInfo
ConversationID FString — Unique session ID assigned by ElevenLabs
AgentID FString — The agent that responded
FElevenLabsTranscriptSegment
Text FString — Transcribed text
Speaker FString — "user" or "agent"
bIsFinal bool — false while still speaking, true when the turn is complete
8. Turn Modes
Server VAD (default)
ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking.
When to use: Casual conversation, hands-free interaction.
StartConversation() → mic streams continuously
ElevenLabs detects speech / silence automatically
Agent replies when it detects end-of-speech
Client Controlled (push-to-talk)
Your code explicitly signals turn boundaries with StartListening() / StopListening().
When to use: Noisy environments, precise control, walkie-talkie style.
Input Pressed → StartListening() → sends user_turn_start + begins audio
Input Released → StopListening() → stops audio + sends user_turn_end
Agent replies after user_turn_end
9. Security — Signed URL Mode
By default, the API key is stored in Project Settings (Engine.ini). This is fine for development but should not be shipped in packaged builds as the key could be extracted.
Production setup
- Enable Signed URL Mode in Project Settings.
- Set Signed URL Endpoint to a URL on your own backend (e.g.
https://your-server.com/api/elevenlabs-token). - Your backend authenticates the player and calls the ElevenLabs API to generate a signed WebSocket URL, returning:
{ "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..." } - The plugin fetches this URL before connecting — the API key never leaves your server.
10. Audio Pipeline
Input (player → agent)
Device (any sample rate, any channels)
↓ FAudioCapture (UE built-in)
↓ Callback: float32 interleaved frames
↓ Downmix to mono (average channels)
↓ Resample to 16000 Hz (linear interpolation)
↓ Apply VolumeMultiplier
↓ Dispatch to Game Thread
↓ Convert float32 → int16 LE bytes
↓ Base64 encode
↓ WebSocket JSON frame: { "user_audio_chunk": "<base64>" }
Output (agent → player)
WebSocket JSON frame: { "type": "audio", "audio_event": { "audio_base_64": "..." } }
↓ Base64 decode → int16 LE PCM bytes
↓ Enqueue in thread-safe AudioQueue
↓ USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue
↓ UAudioComponent plays from the Actor's world position (3D spatialized)
Audio format (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian.
11. Common Patterns
Show subtitles in UI
OnAgentTranscript event:
├─ Segment → Speaker == "user" → show in player subtitle widget
├─ Segment → Speaker == "agent" → show in NPC speech bubble
└─ Segment → bIsFinal == false → show as "..." (in-progress)
Interrupt the agent when the player starts speaking
In Server VAD mode ElevenLabs handles this automatically. For manual control:
OnAgentStartedSpeaking → store "agent is speaking" flag
Input Action (any) → if agent is speaking → InterruptAgent()
Multiple NPCs with different agents
Each NPC Blueprint has its own UElevenLabsConversationalAgentComponent. Set a different AgentID on each component. Connections are fully independent.
Only start the conversation when the player is nearby
On Begin Overlap (trigger volume around NPC)
└─► [ElevenLabs Agent] Start Conversation
On End Overlap
└─► [ElevenLabs Agent] End Conversation
Adjusting microphone volume
Get the UElevenLabsMicrophoneCaptureComponent from the owner and set VolumeMultiplier:
UElevenLabsMicrophoneCaptureComponent* Mic =
GetOwner()->FindComponentByClass<UElevenLabsMicrophoneCaptureComponent>();
if (Mic) Mic->VolumeMultiplier = 2.0f;
12. Troubleshooting
Plugin doesn't appear in Project Settings
Ensure the plugin is enabled in .uproject and the project was recompiled after adding it.
WebSocket connection fails immediately
- Check the API Key is set correctly in Project Settings.
- Check the Agent ID exists in your ElevenLabs account.
- Enable Verbose Logging in Project Settings and check the Output Log for the exact WebSocket URL and error.
- Make sure your machine has internet access and port 443 (WSS) is not blocked.
No audio from the microphone
- Windows may require microphone permission. Check Settings → Privacy → Microphone.
- Try setting
VolumeMultiplierto2.0to rule out a volume issue. - Check the Output Log for
"Failed to open default audio capture stream".
Agent audio is choppy or silent
- The
USoundWaveProceduralqueue may be underflowing. This can happen if audio chunks arrive with long gaps. Check network latency. - Ensure no other component is consuming the same
UAudioComponent.
OnAgentStoppedSpeaking fires too early
The silence detection threshold is 30 ticks (~0.5 s at 60 fps). If the agent has natural pauses in speech, increase SilenceThresholdTicks in ElevenLabsConversationalAgentComponent.h:
static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s
Build error: "Plugin AudioCapture not found"
Make sure the AudioCapture plugin is enabled in your project. It should be auto-enabled via the .uplugin dependency, but you can also add it manually to .uproject:
{ "Name": "AudioCapture", "Enabled": true }
Documentation generated 2026-02-19 — Plugin v1.0.0 — UE 5.5