j.foucher c833ccd66d Add plugin documentation for PS_AI_Agent_ElevenLabs

Covers: installation, project settings, quick start (Blueprint + C++),
full component/API reference, turn modes, security/signed URL mode,
audio pipeline diagram, common patterns, and troubleshooting guide.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-19 13:07:49 +01:00

18 KiB

Raw Blame History

PS_AI_Agent_ElevenLabs — Plugin Documentation

Engine: Unreal Engine 5.5 Plugin version: 1.0.0 Status: Beta API: ElevenLabs Conversational AI

Overview
Installation
Project Settings
Quick Start (Blueprint)
Quick Start (C++)
Components Reference
Data Types Reference
Turn Modes
Security — Signed URL Mode
Audio Pipeline
Common Patterns
Troubleshooting

1. Overview

This plugin integrates the ElevenLabs Conversational AI Agent API into Unreal Engine 5.5, enabling real-time voice conversations between a player and an NPC (or any Actor).

How it works

Player microphone
      │
      ▼
UElevenLabsMicrophoneCaptureComponent
  • Captures from default audio device
  • Resamples to 16 kHz mono float32
      │
      ▼
UElevenLabsConversationalAgentComponent
  • Converts float32 → int16 PCM bytes
  • Sends via WebSocket to ElevenLabs
      │  (wss://api.elevenlabs.io/v1/convai/conversation)
      ▼
ElevenLabs Conversational AI Agent
  • Transcribes speech
  • Runs LLM
  • Synthesizes voice (ElevenLabs TTS)
      │
      ▼
UElevenLabsConversationalAgentComponent
  • Receives Base64 PCM audio chunks
  • Feeds USoundWaveProcedural → UAudioComponent
      │
      ▼
Agent voice plays from the Actor's position in the world

Key properties

No gRPC, no third-party libraries — uses UE's built-in WebSockets and AudioCapture modules
Blueprint-first: all events and controls are exposed to Blueprint
Real-time bidirectional: audio streams in both directions simultaneously
Server VAD (default) or push-to-talk

2. Installation

The plugin lives inside the project, not the engine, so no separate install is needed.

Verify it is enabled

Open Unreal/PS_AI_Agent/PS_AI_Agent.uproject and confirm:

{
  "Name": "PS_AI_Agent_ElevenLabs",
  "Enabled": true
}

First compile

Open the project in the UE 5.5 Editor. It will detect the new plugin and ask to recompile — click Yes. Alternatively, compile from the command line:

"C:\Program Files\Epic Games\UE_5.5\Engine\Build\BatchFiles\Build.bat"
    PS_AI_AgentEditor Win64 Development
    "<repo>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject"
    -WaitMutex

3. Project Settings

Go to Edit → Project Settings → Plugins → ElevenLabs AI Agent.

Setting	Description	Required
API Key	Your ElevenLabs API key from elevenlabs.io	Yes (unless using Signed URL Mode)
Agent ID	Default agent ID. Create agents at elevenlabs.io/app/conversational-ai	Yes (unless set per-component)
Signed URL Mode	Fetch the WS URL from your own backend (keeps key off client). See Section 9	No
Signed URL Endpoint	Your backend URL returning `{ "signed_url": "wss://..." }`	Only if Signed URL Mode = true
Custom WebSocket URL	Override the default `wss://api.elevenlabs.io/...` endpoint (debug only)	No
Verbose Logging	Log every WebSocket JSON frame to Output Log	No

Security note: Never ship with the API key hard-coded in a packaged build. Use Signed URL Mode for production, or load the key at runtime from a secure backend.

4. Quick Start (Blueprint)

Step 1 — Add the component to an NPC

Open your NPC Blueprint (or any Actor Blueprint).
In the Components panel, click Add → search for ElevenLabs Conversational Agent.
Select the component. In the Details panel you can optionally set a specific Agent ID (overrides the project default).

Step 2 — Set Turn Mode

In the component's Details panel:

Server VAD (default): ElevenLabs automatically detects when the player stops speaking. Microphone streams continuously once connected.
Client Controlled: You call Start Listening / Stop Listening manually (push-to-talk).

Step 3 — Wire up events in the Event Graph

Event BeginPlay
    └─► [ElevenLabs Agent] Start Conversation

[ElevenLabs Agent] On Agent Connected
    └─► Print String "Connected! ID: " + Conversation Info → Conversation ID

[ElevenLabs Agent] On Agent Text Response
    └─► Set Text (UI widget) ← Response Text

[ElevenLabs Agent] On Agent Transcript
    └─► (optional) display live subtitles ← Segment → Text

[ElevenLabs Agent] On Agent Started Speaking
    └─► Play talking animation on NPC

[ElevenLabs Agent] On Agent Stopped Speaking
    └─► Return to idle animation

[ElevenLabs Agent] On Agent Error
    └─► Print String "Error: " + Error Message

Event EndPlay
    └─► [ElevenLabs Agent] End Conversation

Step 4 — Push-to-talk (Client Controlled mode only)

Input Action "Talk" (Pressed)
    └─► [ElevenLabs Agent] Start Listening

Input Action "Talk" (Released)
    └─► [ElevenLabs Agent] Stop Listening

5. Quick Start (C++)

1. Add the plugin to your module's Build.cs

PrivateDependencyModuleNames.Add("PS_AI_Agent_ElevenLabs");

2. Include and use

#include "ElevenLabsConversationalAgentComponent.h"
#include "ElevenLabsDefinitions.h"

// In your Actor's header:
UPROPERTY(VisibleAnywhere)
UElevenLabsConversationalAgentComponent* ElevenLabsAgent;

// In the constructor:
ElevenLabsAgent = CreateDefaultSubobject<UElevenLabsConversationalAgentComponent>(
    TEXT("ElevenLabsAgent"));

// Override Agent ID at runtime (optional):
ElevenLabsAgent->AgentID = TEXT("your_agent_id_here");
ElevenLabsAgent->TurnMode = EElevenLabsTurnMode::Server;
ElevenLabsAgent->bAutoStartListening = true;

// Bind events:
ElevenLabsAgent->OnAgentConnected.AddDynamic(
    this, &AMyNPC::HandleAgentConnected);
ElevenLabsAgent->OnAgentTextResponse.AddDynamic(
    this, &AMyNPC::HandleAgentResponse);
ElevenLabsAgent->OnAgentStartedSpeaking.AddDynamic(
    this, &AMyNPC::PlayTalkingAnimation);

// Start the conversation:
ElevenLabsAgent->StartConversation();

// Later, to end it:
ElevenLabsAgent->EndConversation();

3. Callback signatures

UFUNCTION()
void HandleAgentConnected(const FElevenLabsConversationInfo& Info)
{
    UE_LOG(LogTemp, Log, TEXT("Connected, ConvID=%s"), *Info.ConversationID);
}

UFUNCTION()
void HandleAgentResponse(const FString& ResponseText)
{
    // Display in UI, drive subtitles, etc.
}

UFUNCTION()
void PlayTalkingAnimation()
{
    // Switch to talking anim montage
}

6. Components Reference

UElevenLabsConversationalAgentComponent

The main component — attach this to any Actor that should be able to speak.

Category: ElevenLabs Inherits from: UActorComponent

Properties

Property	Type	Default	Description
`AgentID`	`FString`	`""`	Agent ID for this actor. Overrides the project-level default when non-empty.
`TurnMode`	`EElevenLabsTurnMode`	`Server`	How speaker turns are detected. See Section 8.
`bAutoStartListening`	`bool`	`true`	If true, starts mic capture automatically once the WebSocket is ready.

Functions

Function	Blueprint	Description
`StartConversation()`	Callable	Opens the WebSocket connection. If `bAutoStartListening` is true, mic capture starts once connected.
`EndConversation()`	Callable	Closes the WebSocket, stops mic, stops audio playback.
`StartListening()`	Callable	Starts microphone capture. In Client mode, also sends `user_turn_start` to ElevenLabs.
`StopListening()`	Callable	Stops microphone capture. In Client mode, also sends `user_turn_end`.
`InterruptAgent()`	Callable	Stops the agent's current utterance immediately.
`IsConnected()`	Pure	Returns true if the WebSocket is open and the conversation is active.
`IsListening()`	Pure	Returns true if the microphone is currently capturing.
`IsAgentSpeaking()`	Pure	Returns true if agent audio is currently playing.
`GetConversationInfo()`	Pure	Returns `FElevenLabsConversationInfo` (ConversationID, AgentID).
`GetWebSocketProxy()`	Pure	Returns the underlying `UElevenLabsWebSocketProxy` for advanced use.

Events

Event	Parameters	Fired when
`OnAgentConnected`	`FElevenLabsConversationInfo`	WebSocket handshake + agent initiation complete.
`OnAgentDisconnected`	`int32 StatusCode`, `FString Reason`	WebSocket closed (graceful or remote).
`OnAgentError`	`FString ErrorMessage`	Connection or protocol error.
`OnAgentTranscript`	`FElevenLabsTranscriptSegment`	Any transcript arrives (user or agent, tentative or final).
`OnAgentTextResponse`	`FString ResponseText`	Final text response from the agent (complements the audio).
`OnAgentStartedSpeaking`	—	First audio chunk received from the agent.
`OnAgentStoppedSpeaking`	—	Audio queue empty for ~0.5 s (agent done speaking).
`OnAgentInterrupted`	—	Agent speech was interrupted (by user or by `InterruptAgent()`).

UElevenLabsMicrophoneCaptureComponent

A lightweight microphone capture component. Managed automatically by UElevenLabsConversationalAgentComponent — you only need to use this directly for advanced scenarios (e.g. custom audio routing).

Category: ElevenLabs Inherits from: UActorComponent

Properties

Property	Type	Default	Description
`VolumeMultiplier`	`float`	`1.0`	Gain applied to captured samples. Range: 0.0 – 4.0.

Functions

Function	Blueprint	Description
`StartCapture()`	Callable	Opens the default audio input device and starts streaming.
`StopCapture()`	Callable	Stops streaming and closes the device.
`IsCapturing()`	Pure	True while actively capturing.

Delegate

OnAudioCaptured — fires on the game thread with TArray<float> PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually.

UElevenLabsWebSocketProxy

Low-level WebSocket session manager. Used internally by UElevenLabsConversationalAgentComponent. Use this directly only if you need fine-grained protocol control.

Inherits from: UObject Instantiate via: NewObject<UElevenLabsWebSocketProxy>(Outer)

Key functions

Function	Description
`Connect(AgentID, APIKey)`	Open the WS connection. Parameters override project settings when non-empty.
`Disconnect()`	Send close frame and tear down the connection.
`SendAudioChunk(PCMData)`	Send raw int16 LE PCM bytes. Called automatically by the agent component.
`SendUserTurnStart()`	Signal start of user speech (Client turn mode only).
`SendUserTurnEnd()`	Signal end of user speech (Client turn mode only).
`SendInterrupt()`	Ask the agent to stop speaking.
`GetConnectionState()`	Returns `EElevenLabsConnectionState`.
`GetConversationInfo()`	Returns `FElevenLabsConversationInfo`.

7. Data Types Reference

EElevenLabsConnectionState

Disconnected  — No active connection
Connecting    — WebSocket handshake in progress
Connected     — Conversation active and ready
Error         — Connection or protocol failure

EElevenLabsTurnMode

Server  — ElevenLabs Voice Activity Detection decides when the user stops speaking (recommended)
Client  — Your code calls StartListening/StopListening to define turns (push-to-talk)

FElevenLabsConversationInfo

ConversationID  FString  — Unique session ID assigned by ElevenLabs
AgentID         FString  — The agent that responded

FElevenLabsTranscriptSegment

Text      FString  — Transcribed text
Speaker   FString  — "user" or "agent"
bIsFinal  bool     — false while still speaking, true when the turn is complete

8. Turn Modes

Server VAD (default)

ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking.

When to use: Casual conversation, hands-free interaction.

StartConversation()  →  mic streams continuously
                        ElevenLabs detects speech / silence automatically
                        Agent replies when it detects end-of-speech

Client Controlled (push-to-talk)

Your code explicitly signals turn boundaries with StartListening() / StopListening().

When to use: Noisy environments, precise control, walkie-talkie style.

Input Pressed   →  StartListening()   →  sends user_turn_start + begins audio
Input Released  →  StopListening()    →  stops audio + sends user_turn_end
                                         Agent replies after user_turn_end

9. Security — Signed URL Mode

By default, the API key is stored in Project Settings (Engine.ini). This is fine for development but should not be shipped in packaged builds as the key could be extracted.

Production setup

Enable Signed URL Mode in Project Settings.
Set Signed URL Endpoint to a URL on your own backend (e.g. https://your-server.com/api/elevenlabs-token).
Your backend authenticates the player and calls the ElevenLabs API to generate a signed WebSocket URL, returning:
```
{ "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..." }
```
The plugin fetches this URL before connecting — the API key never leaves your server.

10. Audio Pipeline

Input (player → agent)

Device (any sample rate, any channels)
  ↓  FAudioCapture (UE built-in)
  ↓  Callback: float32 interleaved frames
  ↓  Downmix to mono (average channels)
  ↓  Resample to 16000 Hz (linear interpolation)
  ↓  Apply VolumeMultiplier
  ↓  Dispatch to Game Thread
  ↓  Convert float32 → int16 LE bytes
  ↓  Base64 encode
  ↓  WebSocket JSON frame: { "user_audio_chunk": "<base64>" }

Output (agent → player)

WebSocket JSON frame: { "type": "audio", "audio_event": { "audio_base_64": "..." } }
  ↓  Base64 decode → int16 LE PCM bytes
  ↓  Enqueue in thread-safe AudioQueue
  ↓  USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue
  ↓  UAudioComponent plays from the Actor's world position (3D spatialized)

Audio format (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian.

11. Common Patterns

Show subtitles in UI

OnAgentTranscript event:
  ├─ Segment → Speaker == "user"   → show in player subtitle widget
  ├─ Segment → Speaker == "agent"  → show in NPC speech bubble
  └─ Segment → bIsFinal == false   → show as "..." (in-progress)

Interrupt the agent when the player starts speaking

In Server VAD mode ElevenLabs handles this automatically. For manual control:

OnAgentStartedSpeaking  →  store "agent is speaking" flag
Input Action (any)      →  if agent is speaking → InterruptAgent()

Multiple NPCs with different agents

Each NPC Blueprint has its own UElevenLabsConversationalAgentComponent. Set a different AgentID on each component. Connections are fully independent.

Only start the conversation when the player is nearby

On Begin Overlap (trigger volume around NPC)
  └─► [ElevenLabs Agent] Start Conversation

On End Overlap
  └─► [ElevenLabs Agent] End Conversation

Adjusting microphone volume

Get the UElevenLabsMicrophoneCaptureComponent from the owner and set VolumeMultiplier:

UElevenLabsMicrophoneCaptureComponent* Mic =
    GetOwner()->FindComponentByClass<UElevenLabsMicrophoneCaptureComponent>();
if (Mic) Mic->VolumeMultiplier = 2.0f;

12. Troubleshooting

Plugin doesn't appear in Project Settings

Ensure the plugin is enabled in .uproject and the project was recompiled after adding it.

WebSocket connection fails immediately

Check the API Key is set correctly in Project Settings.
Check the Agent ID exists in your ElevenLabs account.
Enable Verbose Logging in Project Settings and check the Output Log for the exact WebSocket URL and error.
Make sure your machine has internet access and port 443 (WSS) is not blocked.

No audio from the microphone

Windows may require microphone permission. Check Settings → Privacy → Microphone.
Try setting VolumeMultiplier to 2.0 to rule out a volume issue.
Check the Output Log for "Failed to open default audio capture stream".

Agent audio is choppy or silent

The USoundWaveProcedural queue may be underflowing. This can happen if audio chunks arrive with long gaps. Check network latency.
Ensure no other component is consuming the same UAudioComponent.

`OnAgentStoppedSpeaking` fires too early

The silence detection threshold is 30 ticks (~0.5 s at 60 fps). If the agent has natural pauses in speech, increase SilenceThresholdTicks in ElevenLabsConversationalAgentComponent.h:

static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s

Build error: "Plugin AudioCapture not found"

Make sure the AudioCapture plugin is enabled in your project. It should be auto-enabled via the .uplugin dependency, but you can also add it manually to .uproject:

{ "Name": "AudioCapture", "Enabled": true }

Documentation generated 2026-02-19 — Plugin v1.0.0 — UE 5.5

18 KiB Raw Blame History Unescape Escape

PS_AI_Agent_ElevenLabs — Plugin Documentation

Table of Contents

1. Overview

How it works

Key properties

2. Installation

Verify it is enabled

First compile

3. Project Settings

4. Quick Start (Blueprint)

Step 1 — Add the component to an NPC

Step 2 — Set Turn Mode

Step 3 — Wire up events in the Event Graph

Step 4 — Push-to-talk (Client Controlled mode only)

5. Quick Start (C++)

1. Add the plugin to your module's Build.cs

2. Include and use

3. Callback signatures

6. Components Reference

UElevenLabsConversationalAgentComponent

Properties

Functions

Events

UElevenLabsMicrophoneCaptureComponent

Properties

Functions

Delegate

UElevenLabsWebSocketProxy

Key functions

7. Data Types Reference

EElevenLabsConnectionState

EElevenLabsTurnMode

FElevenLabsConversationInfo

FElevenLabsTranscriptSegment

8. Turn Modes

Server VAD (default)

Client Controlled (push-to-talk)

9. Security — Signed URL Mode

Production setup

10. Audio Pipeline

Input (player → agent)

Output (agent → player)

11. Common Patterns

Show subtitles in UI

Interrupt the agent when the player starts speaking

Multiple NPCs with different agents

Only start the conversation when the player is nearby

Adjusting microphone volume

12. Troubleshooting

Plugin doesn't appear in Project Settings

WebSocket connection fails immediately

No audio from the microphone

Agent audio is choppy or silent

OnAgentStoppedSpeaking fires too early

Build error: "Plugin AudioCapture not found"

18 KiB

Raw Blame History

`OnAgentStoppedSpeaking` fires too early