PS_AI_Agent/.claude/PS_AI_Agent_ElevenLabs_Documentation.md
j.foucher c833ccd66d Add plugin documentation for PS_AI_Agent_ElevenLabs
Covers: installation, project settings, quick start (Blueprint + C++),
full component/API reference, turn modes, security/signed URL mode,
audio pipeline diagram, common patterns, and troubleshooting guide.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 13:07:49 +01:00

18 KiB
Raw Blame History

PS_AI_Agent_ElevenLabs — Plugin Documentation

Engine: Unreal Engine 5.5 Plugin version: 1.0.0 Status: Beta API: ElevenLabs Conversational AI


Table of Contents

  1. Overview
  2. Installation
  3. Project Settings
  4. Quick Start (Blueprint)
  5. Quick Start (C++)
  6. Components Reference
  7. Data Types Reference
  8. Turn Modes
  9. Security — Signed URL Mode
  10. Audio Pipeline
  11. Common Patterns
  12. Troubleshooting

1. Overview

This plugin integrates the ElevenLabs Conversational AI Agent API into Unreal Engine 5.5, enabling real-time voice conversations between a player and an NPC (or any Actor).

How it works

Player microphone
      │
      ▼
UElevenLabsMicrophoneCaptureComponent
  • Captures from default audio device
  • Resamples to 16 kHz mono float32
      │
      ▼
UElevenLabsConversationalAgentComponent
  • Converts float32 → int16 PCM bytes
  • Sends via WebSocket to ElevenLabs
      │  (wss://api.elevenlabs.io/v1/convai/conversation)
      ▼
ElevenLabs Conversational AI Agent
  • Transcribes speech
  • Runs LLM
  • Synthesizes voice (ElevenLabs TTS)
      │
      ▼
UElevenLabsConversationalAgentComponent
  • Receives Base64 PCM audio chunks
  • Feeds USoundWaveProcedural → UAudioComponent
      │
      ▼
Agent voice plays from the Actor's position in the world

Key properties

  • No gRPC, no third-party libraries — uses UE's built-in WebSockets and AudioCapture modules
  • Blueprint-first: all events and controls are exposed to Blueprint
  • Real-time bidirectional: audio streams in both directions simultaneously
  • Server VAD (default) or push-to-talk

2. Installation

The plugin lives inside the project, not the engine, so no separate install is needed.

Verify it is enabled

Open Unreal/PS_AI_Agent/PS_AI_Agent.uproject and confirm:

{
  "Name": "PS_AI_Agent_ElevenLabs",
  "Enabled": true
}

First compile

Open the project in the UE 5.5 Editor. It will detect the new plugin and ask to recompile — click Yes. Alternatively, compile from the command line:

"C:\Program Files\Epic Games\UE_5.5\Engine\Build\BatchFiles\Build.bat"
    PS_AI_AgentEditor Win64 Development
    "<repo>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject"
    -WaitMutex

3. Project Settings

Go to Edit → Project Settings → Plugins → ElevenLabs AI Agent.

Setting Description Required
API Key Your ElevenLabs API key from elevenlabs.io Yes (unless using Signed URL Mode)
Agent ID Default agent ID. Create agents at elevenlabs.io/app/conversational-ai Yes (unless set per-component)
Signed URL Mode Fetch the WS URL from your own backend (keeps key off client). See Section 9 No
Signed URL Endpoint Your backend URL returning { "signed_url": "wss://..." } Only if Signed URL Mode = true
Custom WebSocket URL Override the default wss://api.elevenlabs.io/... endpoint (debug only) No
Verbose Logging Log every WebSocket JSON frame to Output Log No

Security note: Never ship with the API key hard-coded in a packaged build. Use Signed URL Mode for production, or load the key at runtime from a secure backend.


4. Quick Start (Blueprint)

Step 1 — Add the component to an NPC

  1. Open your NPC Blueprint (or any Actor Blueprint).
  2. In the Components panel, click Add → search for ElevenLabs Conversational Agent.
  3. Select the component. In the Details panel you can optionally set a specific Agent ID (overrides the project default).

Step 2 — Set Turn Mode

In the component's Details panel:

  • Server VAD (default): ElevenLabs automatically detects when the player stops speaking. Microphone streams continuously once connected.
  • Client Controlled: You call Start Listening / Stop Listening manually (push-to-talk).

Step 3 — Wire up events in the Event Graph

Event BeginPlay
    └─► [ElevenLabs Agent] Start Conversation

[ElevenLabs Agent] On Agent Connected
    └─► Print String "Connected! ID: " + Conversation Info → Conversation ID

[ElevenLabs Agent] On Agent Text Response
    └─► Set Text (UI widget) ← Response Text

[ElevenLabs Agent] On Agent Transcript
    └─► (optional) display live subtitles ← Segment → Text

[ElevenLabs Agent] On Agent Started Speaking
    └─► Play talking animation on NPC

[ElevenLabs Agent] On Agent Stopped Speaking
    └─► Return to idle animation

[ElevenLabs Agent] On Agent Error
    └─► Print String "Error: " + Error Message

Event EndPlay
    └─► [ElevenLabs Agent] End Conversation

Step 4 — Push-to-talk (Client Controlled mode only)

Input Action "Talk" (Pressed)
    └─► [ElevenLabs Agent] Start Listening

Input Action "Talk" (Released)
    └─► [ElevenLabs Agent] Stop Listening

5. Quick Start (C++)

1. Add the plugin to your module's Build.cs

PrivateDependencyModuleNames.Add("PS_AI_Agent_ElevenLabs");

2. Include and use

#include "ElevenLabsConversationalAgentComponent.h"
#include "ElevenLabsDefinitions.h"

// In your Actor's header:
UPROPERTY(VisibleAnywhere)
UElevenLabsConversationalAgentComponent* ElevenLabsAgent;

// In the constructor:
ElevenLabsAgent = CreateDefaultSubobject<UElevenLabsConversationalAgentComponent>(
    TEXT("ElevenLabsAgent"));

// Override Agent ID at runtime (optional):
ElevenLabsAgent->AgentID = TEXT("your_agent_id_here");
ElevenLabsAgent->TurnMode = EElevenLabsTurnMode::Server;
ElevenLabsAgent->bAutoStartListening = true;

// Bind events:
ElevenLabsAgent->OnAgentConnected.AddDynamic(
    this, &AMyNPC::HandleAgentConnected);
ElevenLabsAgent->OnAgentTextResponse.AddDynamic(
    this, &AMyNPC::HandleAgentResponse);
ElevenLabsAgent->OnAgentStartedSpeaking.AddDynamic(
    this, &AMyNPC::PlayTalkingAnimation);

// Start the conversation:
ElevenLabsAgent->StartConversation();

// Later, to end it:
ElevenLabsAgent->EndConversation();

3. Callback signatures

UFUNCTION()
void HandleAgentConnected(const FElevenLabsConversationInfo& Info)
{
    UE_LOG(LogTemp, Log, TEXT("Connected, ConvID=%s"), *Info.ConversationID);
}

UFUNCTION()
void HandleAgentResponse(const FString& ResponseText)
{
    // Display in UI, drive subtitles, etc.
}

UFUNCTION()
void PlayTalkingAnimation()
{
    // Switch to talking anim montage
}

6. Components Reference

UElevenLabsConversationalAgentComponent

The main component — attach this to any Actor that should be able to speak.

Category: ElevenLabs Inherits from: UActorComponent

Properties

Property Type Default Description
AgentID FString "" Agent ID for this actor. Overrides the project-level default when non-empty.
TurnMode EElevenLabsTurnMode Server How speaker turns are detected. See Section 8.
bAutoStartListening bool true If true, starts mic capture automatically once the WebSocket is ready.

Functions

Function Blueprint Description
StartConversation() Callable Opens the WebSocket connection. If bAutoStartListening is true, mic capture starts once connected.
EndConversation() Callable Closes the WebSocket, stops mic, stops audio playback.
StartListening() Callable Starts microphone capture. In Client mode, also sends user_turn_start to ElevenLabs.
StopListening() Callable Stops microphone capture. In Client mode, also sends user_turn_end.
InterruptAgent() Callable Stops the agent's current utterance immediately.
IsConnected() Pure Returns true if the WebSocket is open and the conversation is active.
IsListening() Pure Returns true if the microphone is currently capturing.
IsAgentSpeaking() Pure Returns true if agent audio is currently playing.
GetConversationInfo() Pure Returns FElevenLabsConversationInfo (ConversationID, AgentID).
GetWebSocketProxy() Pure Returns the underlying UElevenLabsWebSocketProxy for advanced use.

Events

Event Parameters Fired when
OnAgentConnected FElevenLabsConversationInfo WebSocket handshake + agent initiation complete.
OnAgentDisconnected int32 StatusCode, FString Reason WebSocket closed (graceful or remote).
OnAgentError FString ErrorMessage Connection or protocol error.
OnAgentTranscript FElevenLabsTranscriptSegment Any transcript arrives (user or agent, tentative or final).
OnAgentTextResponse FString ResponseText Final text response from the agent (complements the audio).
OnAgentStartedSpeaking First audio chunk received from the agent.
OnAgentStoppedSpeaking Audio queue empty for ~0.5 s (agent done speaking).
OnAgentInterrupted Agent speech was interrupted (by user or by InterruptAgent()).

UElevenLabsMicrophoneCaptureComponent

A lightweight microphone capture component. Managed automatically by UElevenLabsConversationalAgentComponent — you only need to use this directly for advanced scenarios (e.g. custom audio routing).

Category: ElevenLabs Inherits from: UActorComponent

Properties

Property Type Default Description
VolumeMultiplier float 1.0 Gain applied to captured samples. Range: 0.0 4.0.

Functions

Function Blueprint Description
StartCapture() Callable Opens the default audio input device and starts streaming.
StopCapture() Callable Stops streaming and closes the device.
IsCapturing() Pure True while actively capturing.

Delegate

OnAudioCaptured — fires on the game thread with TArray<float> PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually.


UElevenLabsWebSocketProxy

Low-level WebSocket session manager. Used internally by UElevenLabsConversationalAgentComponent. Use this directly only if you need fine-grained protocol control.

Inherits from: UObject Instantiate via: NewObject<UElevenLabsWebSocketProxy>(Outer)

Key functions

Function Description
Connect(AgentID, APIKey) Open the WS connection. Parameters override project settings when non-empty.
Disconnect() Send close frame and tear down the connection.
SendAudioChunk(PCMData) Send raw int16 LE PCM bytes. Called automatically by the agent component.
SendUserTurnStart() Signal start of user speech (Client turn mode only).
SendUserTurnEnd() Signal end of user speech (Client turn mode only).
SendInterrupt() Ask the agent to stop speaking.
GetConnectionState() Returns EElevenLabsConnectionState.
GetConversationInfo() Returns FElevenLabsConversationInfo.

7. Data Types Reference

EElevenLabsConnectionState

Disconnected  — No active connection
Connecting    — WebSocket handshake in progress
Connected     — Conversation active and ready
Error         — Connection or protocol failure

EElevenLabsTurnMode

Server  — ElevenLabs Voice Activity Detection decides when the user stops speaking (recommended)
Client  — Your code calls StartListening/StopListening to define turns (push-to-talk)

FElevenLabsConversationInfo

ConversationID  FString  — Unique session ID assigned by ElevenLabs
AgentID         FString  — The agent that responded

FElevenLabsTranscriptSegment

Text      FString  — Transcribed text
Speaker   FString  — "user" or "agent"
bIsFinal  bool     — false while still speaking, true when the turn is complete

8. Turn Modes

Server VAD (default)

ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking.

When to use: Casual conversation, hands-free interaction.

StartConversation()  →  mic streams continuously
                        ElevenLabs detects speech / silence automatically
                        Agent replies when it detects end-of-speech

Client Controlled (push-to-talk)

Your code explicitly signals turn boundaries with StartListening() / StopListening().

When to use: Noisy environments, precise control, walkie-talkie style.

Input Pressed   →  StartListening()   →  sends user_turn_start + begins audio
Input Released  →  StopListening()    →  stops audio + sends user_turn_end
                                         Agent replies after user_turn_end

9. Security — Signed URL Mode

By default, the API key is stored in Project Settings (Engine.ini). This is fine for development but should not be shipped in packaged builds as the key could be extracted.

Production setup

  1. Enable Signed URL Mode in Project Settings.
  2. Set Signed URL Endpoint to a URL on your own backend (e.g. https://your-server.com/api/elevenlabs-token).
  3. Your backend authenticates the player and calls the ElevenLabs API to generate a signed WebSocket URL, returning:
    { "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..." }
    
  4. The plugin fetches this URL before connecting — the API key never leaves your server.

10. Audio Pipeline

Input (player → agent)

Device (any sample rate, any channels)
  ↓  FAudioCapture (UE built-in)
  ↓  Callback: float32 interleaved frames
  ↓  Downmix to mono (average channels)
  ↓  Resample to 16000 Hz (linear interpolation)
  ↓  Apply VolumeMultiplier
  ↓  Dispatch to Game Thread
  ↓  Convert float32 → int16 LE bytes
  ↓  Base64 encode
  ↓  WebSocket JSON frame: { "user_audio_chunk": "<base64>" }

Output (agent → player)

WebSocket JSON frame: { "type": "audio", "audio_event": { "audio_base_64": "..." } }
  ↓  Base64 decode → int16 LE PCM bytes
  ↓  Enqueue in thread-safe AudioQueue
  ↓  USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue
  ↓  UAudioComponent plays from the Actor's world position (3D spatialized)

Audio format (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian.


11. Common Patterns

Show subtitles in UI

OnAgentTranscript event:
  ├─ Segment → Speaker == "user"   → show in player subtitle widget
  ├─ Segment → Speaker == "agent"  → show in NPC speech bubble
  └─ Segment → bIsFinal == false   → show as "..." (in-progress)

Interrupt the agent when the player starts speaking

In Server VAD mode ElevenLabs handles this automatically. For manual control:

OnAgentStartedSpeaking  →  store "agent is speaking" flag
Input Action (any)      →  if agent is speaking → InterruptAgent()

Multiple NPCs with different agents

Each NPC Blueprint has its own UElevenLabsConversationalAgentComponent. Set a different AgentID on each component. Connections are fully independent.

Only start the conversation when the player is nearby

On Begin Overlap (trigger volume around NPC)
  └─► [ElevenLabs Agent] Start Conversation

On End Overlap
  └─► [ElevenLabs Agent] End Conversation

Adjusting microphone volume

Get the UElevenLabsMicrophoneCaptureComponent from the owner and set VolumeMultiplier:

UElevenLabsMicrophoneCaptureComponent* Mic =
    GetOwner()->FindComponentByClass<UElevenLabsMicrophoneCaptureComponent>();
if (Mic) Mic->VolumeMultiplier = 2.0f;

12. Troubleshooting

Plugin doesn't appear in Project Settings

Ensure the plugin is enabled in .uproject and the project was recompiled after adding it.

WebSocket connection fails immediately

  • Check the API Key is set correctly in Project Settings.
  • Check the Agent ID exists in your ElevenLabs account.
  • Enable Verbose Logging in Project Settings and check the Output Log for the exact WebSocket URL and error.
  • Make sure your machine has internet access and port 443 (WSS) is not blocked.

No audio from the microphone

  • Windows may require microphone permission. Check Settings → Privacy → Microphone.
  • Try setting VolumeMultiplier to 2.0 to rule out a volume issue.
  • Check the Output Log for "Failed to open default audio capture stream".

Agent audio is choppy or silent

  • The USoundWaveProcedural queue may be underflowing. This can happen if audio chunks arrive with long gaps. Check network latency.
  • Ensure no other component is consuming the same UAudioComponent.

OnAgentStoppedSpeaking fires too early

The silence detection threshold is 30 ticks (~0.5 s at 60 fps). If the agent has natural pauses in speech, increase SilenceThresholdTicks in ElevenLabsConversationalAgentComponent.h:

static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s

Build error: "Plugin AudioCapture not found"

Make sure the AudioCapture plugin is enabled in your project. It should be auto-enabled via the .uplugin dependency, but you can also add it manually to .uproject:

{ "Name": "AudioCapture", "Enabled": true }

Documentation generated 2026-02-19 — Plugin v1.0.0 — UE 5.5