Add plugin documentation for PS_AI_Agent_ElevenLabs

Covers: installation, project settings, quick start (Blueprint + C++),
full component/API reference, turn modes, security/signed URL mode,
audio pipeline diagram, common patterns, and troubleshooting guide.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
j.foucher 2026-02-19 13:07:49 +01:00
parent 3b98edcf92
commit c833ccd66d

View File

@ -0,0 +1,531 @@
# PS_AI_Agent_ElevenLabs — Plugin Documentation
**Engine**: Unreal Engine 5.5
**Plugin version**: 1.0.0
**Status**: Beta
**API**: [ElevenLabs Conversational AI](https://elevenlabs.io/docs/conversational-ai)
---
## Table of Contents
1. [Overview](#1-overview)
2. [Installation](#2-installation)
3. [Project Settings](#3-project-settings)
4. [Quick Start (Blueprint)](#4-quick-start-blueprint)
5. [Quick Start (C++)](#5-quick-start-c)
6. [Components Reference](#6-components-reference)
- [UElevenLabsConversationalAgentComponent](#uelevenlabsconversationalagentcomponent)
- [UElevenLabsMicrophoneCaptureComponent](#uelevenlabsmicrophonecapturecomponent)
- [UElevenLabsWebSocketProxy](#uelevenlabswebsocketproxy)
7. [Data Types Reference](#7-data-types-reference)
8. [Turn Modes](#8-turn-modes)
9. [Security — Signed URL Mode](#9-security--signed-url-mode)
10. [Audio Pipeline](#10-audio-pipeline)
11. [Common Patterns](#11-common-patterns)
12. [Troubleshooting](#12-troubleshooting)
---
## 1. Overview
This plugin integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5, enabling real-time voice conversations between a player and an NPC (or any Actor).
### How it works
```
Player microphone
UElevenLabsMicrophoneCaptureComponent
• Captures from default audio device
• Resamples to 16 kHz mono float32
UElevenLabsConversationalAgentComponent
• Converts float32 → int16 PCM bytes
• Sends via WebSocket to ElevenLabs
│ (wss://api.elevenlabs.io/v1/convai/conversation)
ElevenLabs Conversational AI Agent
• Transcribes speech
• Runs LLM
• Synthesizes voice (ElevenLabs TTS)
UElevenLabsConversationalAgentComponent
• Receives Base64 PCM audio chunks
• Feeds USoundWaveProcedural → UAudioComponent
Agent voice plays from the Actor's position in the world
```
### Key properties
- No gRPC, no third-party libraries — uses UE's built-in `WebSockets` and `AudioCapture` modules
- Blueprint-first: all events and controls are exposed to Blueprint
- Real-time bidirectional: audio streams in both directions simultaneously
- Server VAD (default) or push-to-talk
---
## 2. Installation
The plugin lives inside the project, not the engine, so no separate install is needed.
### Verify it is enabled
Open `Unreal/PS_AI_Agent/PS_AI_Agent.uproject` and confirm:
```json
{
"Name": "PS_AI_Agent_ElevenLabs",
"Enabled": true
}
```
### First compile
Open the project in the UE 5.5 Editor. It will detect the new plugin and ask to recompile — click **Yes**. Alternatively, compile from the command line:
```
"C:\Program Files\Epic Games\UE_5.5\Engine\Build\BatchFiles\Build.bat"
PS_AI_AgentEditor Win64 Development
"<repo>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject"
-WaitMutex
```
---
## 3. Project Settings
Go to **Edit → Project Settings → Plugins → ElevenLabs AI Agent**.
| Setting | Description | Required |
|---|---|---|
| **API Key** | Your ElevenLabs API key from [elevenlabs.io](https://elevenlabs.io) | Yes (unless using Signed URL Mode) |
| **Agent ID** | Default agent ID. Create agents at [elevenlabs.io/app/conversational-ai](https://elevenlabs.io/app/conversational-ai) | Yes (unless set per-component) |
| **Signed URL Mode** | Fetch the WS URL from your own backend (keeps key off client). See [Section 9](#9-security--signed-url-mode) | No |
| **Signed URL Endpoint** | Your backend URL returning `{ "signed_url": "wss://..." }` | Only if Signed URL Mode = true |
| **Custom WebSocket URL** | Override the default `wss://api.elevenlabs.io/...` endpoint (debug only) | No |
| **Verbose Logging** | Log every WebSocket JSON frame to Output Log | No |
> **Security note**: Never ship with the API key hard-coded in a packaged build. Use Signed URL Mode for production, or load the key at runtime from a secure backend.
---
## 4. Quick Start (Blueprint)
### Step 1 — Add the component to an NPC
1. Open your NPC Blueprint (or any Actor Blueprint).
2. In the **Components** panel, click **Add** → search for **ElevenLabs Conversational Agent**.
3. Select the component. In the **Details** panel you can optionally set a specific **Agent ID** (overrides the project default).
### Step 2 — Set Turn Mode
In the component's **Details** panel:
- **Server VAD** (default): ElevenLabs automatically detects when the player stops speaking. Microphone streams continuously once connected.
- **Client Controlled**: You call `Start Listening` / `Stop Listening` manually (push-to-talk).
### Step 3 — Wire up events in the Event Graph
```
Event BeginPlay
└─► [ElevenLabs Agent] Start Conversation
[ElevenLabs Agent] On Agent Connected
└─► Print String "Connected! ID: " + Conversation Info → Conversation ID
[ElevenLabs Agent] On Agent Text Response
└─► Set Text (UI widget) ← Response Text
[ElevenLabs Agent] On Agent Transcript
└─► (optional) display live subtitles ← Segment → Text
[ElevenLabs Agent] On Agent Started Speaking
└─► Play talking animation on NPC
[ElevenLabs Agent] On Agent Stopped Speaking
└─► Return to idle animation
[ElevenLabs Agent] On Agent Error
└─► Print String "Error: " + Error Message
Event EndPlay
└─► [ElevenLabs Agent] End Conversation
```
### Step 4 — Push-to-talk (Client Controlled mode only)
```
Input Action "Talk" (Pressed)
└─► [ElevenLabs Agent] Start Listening
Input Action "Talk" (Released)
└─► [ElevenLabs Agent] Stop Listening
```
---
## 5. Quick Start (C++)
### 1. Add the plugin to your module's Build.cs
```csharp
PrivateDependencyModuleNames.Add("PS_AI_Agent_ElevenLabs");
```
### 2. Include and use
```cpp
#include "ElevenLabsConversationalAgentComponent.h"
#include "ElevenLabsDefinitions.h"
// In your Actor's header:
UPROPERTY(VisibleAnywhere)
UElevenLabsConversationalAgentComponent* ElevenLabsAgent;
// In the constructor:
ElevenLabsAgent = CreateDefaultSubobject<UElevenLabsConversationalAgentComponent>(
TEXT("ElevenLabsAgent"));
// Override Agent ID at runtime (optional):
ElevenLabsAgent->AgentID = TEXT("your_agent_id_here");
ElevenLabsAgent->TurnMode = EElevenLabsTurnMode::Server;
ElevenLabsAgent->bAutoStartListening = true;
// Bind events:
ElevenLabsAgent->OnAgentConnected.AddDynamic(
this, &AMyNPC::HandleAgentConnected);
ElevenLabsAgent->OnAgentTextResponse.AddDynamic(
this, &AMyNPC::HandleAgentResponse);
ElevenLabsAgent->OnAgentStartedSpeaking.AddDynamic(
this, &AMyNPC::PlayTalkingAnimation);
// Start the conversation:
ElevenLabsAgent->StartConversation();
// Later, to end it:
ElevenLabsAgent->EndConversation();
```
### 3. Callback signatures
```cpp
UFUNCTION()
void HandleAgentConnected(const FElevenLabsConversationInfo& Info)
{
UE_LOG(LogTemp, Log, TEXT("Connected, ConvID=%s"), *Info.ConversationID);
}
UFUNCTION()
void HandleAgentResponse(const FString& ResponseText)
{
// Display in UI, drive subtitles, etc.
}
UFUNCTION()
void PlayTalkingAnimation()
{
// Switch to talking anim montage
}
```
---
## 6. Components Reference
### UElevenLabsConversationalAgentComponent
The **main component** — attach this to any Actor that should be able to speak.
**Category**: ElevenLabs
**Inherits from**: `UActorComponent`
#### Properties
| Property | Type | Default | Description |
|---|---|---|---|
| `AgentID` | `FString` | `""` | Agent ID for this actor. Overrides the project-level default when non-empty. |
| `TurnMode` | `EElevenLabsTurnMode` | `Server` | How speaker turns are detected. See [Section 8](#8-turn-modes). |
| `bAutoStartListening` | `bool` | `true` | If true, starts mic capture automatically once the WebSocket is ready. |
#### Functions
| Function | Blueprint | Description |
|---|---|---|
| `StartConversation()` | Callable | Opens the WebSocket connection. If `bAutoStartListening` is true, mic capture starts once connected. |
| `EndConversation()` | Callable | Closes the WebSocket, stops mic, stops audio playback. |
| `StartListening()` | Callable | Starts microphone capture. In Client mode, also sends `user_turn_start` to ElevenLabs. |
| `StopListening()` | Callable | Stops microphone capture. In Client mode, also sends `user_turn_end`. |
| `InterruptAgent()` | Callable | Stops the agent's current utterance immediately. |
| `IsConnected()` | Pure | Returns true if the WebSocket is open and the conversation is active. |
| `IsListening()` | Pure | Returns true if the microphone is currently capturing. |
| `IsAgentSpeaking()` | Pure | Returns true if agent audio is currently playing. |
| `GetConversationInfo()` | Pure | Returns `FElevenLabsConversationInfo` (ConversationID, AgentID). |
| `GetWebSocketProxy()` | Pure | Returns the underlying `UElevenLabsWebSocketProxy` for advanced use. |
#### Events
| Event | Parameters | Fired when |
|---|---|---|
| `OnAgentConnected` | `FElevenLabsConversationInfo` | WebSocket handshake + agent initiation complete. |
| `OnAgentDisconnected` | `int32 StatusCode`, `FString Reason` | WebSocket closed (graceful or remote). |
| `OnAgentError` | `FString ErrorMessage` | Connection or protocol error. |
| `OnAgentTranscript` | `FElevenLabsTranscriptSegment` | Any transcript arrives (user or agent, tentative or final). |
| `OnAgentTextResponse` | `FString ResponseText` | Final text response from the agent (complements the audio). |
| `OnAgentStartedSpeaking` | — | First audio chunk received from the agent. |
| `OnAgentStoppedSpeaking` | — | Audio queue empty for ~0.5 s (agent done speaking). |
| `OnAgentInterrupted` | — | Agent speech was interrupted (by user or by `InterruptAgent()`). |
---
### UElevenLabsMicrophoneCaptureComponent
A lightweight microphone capture component. Managed automatically by `UElevenLabsConversationalAgentComponent` — you only need to use this directly for advanced scenarios (e.g. custom audio routing).
**Category**: ElevenLabs
**Inherits from**: `UActorComponent`
#### Properties
| Property | Type | Default | Description |
|---|---|---|---|
| `VolumeMultiplier` | `float` | `1.0` | Gain applied to captured samples. Range: 0.0 4.0. |
#### Functions
| Function | Blueprint | Description |
|---|---|---|
| `StartCapture()` | Callable | Opens the default audio input device and starts streaming. |
| `StopCapture()` | Callable | Stops streaming and closes the device. |
| `IsCapturing()` | Pure | True while actively capturing. |
#### Delegate
`OnAudioCaptured` — fires on the game thread with `TArray<float>` PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually.
---
### UElevenLabsWebSocketProxy
Low-level WebSocket session manager. Used internally by `UElevenLabsConversationalAgentComponent`. Use this directly only if you need fine-grained protocol control.
**Inherits from**: `UObject`
**Instantiate via**: `NewObject<UElevenLabsWebSocketProxy>(Outer)`
#### Key functions
| Function | Description |
|---|---|
| `Connect(AgentID, APIKey)` | Open the WS connection. Parameters override project settings when non-empty. |
| `Disconnect()` | Send close frame and tear down the connection. |
| `SendAudioChunk(PCMData)` | Send raw int16 LE PCM bytes. Called automatically by the agent component. |
| `SendUserTurnStart()` | Signal start of user speech (Client turn mode only). |
| `SendUserTurnEnd()` | Signal end of user speech (Client turn mode only). |
| `SendInterrupt()` | Ask the agent to stop speaking. |
| `GetConnectionState()` | Returns `EElevenLabsConnectionState`. |
| `GetConversationInfo()` | Returns `FElevenLabsConversationInfo`. |
---
## 7. Data Types Reference
### EElevenLabsConnectionState
```
Disconnected — No active connection
Connecting — WebSocket handshake in progress
Connected — Conversation active and ready
Error — Connection or protocol failure
```
### EElevenLabsTurnMode
```
Server — ElevenLabs Voice Activity Detection decides when the user stops speaking (recommended)
Client — Your code calls StartListening/StopListening to define turns (push-to-talk)
```
### FElevenLabsConversationInfo
```
ConversationID FString — Unique session ID assigned by ElevenLabs
AgentID FString — The agent that responded
```
### FElevenLabsTranscriptSegment
```
Text FString — Transcribed text
Speaker FString — "user" or "agent"
bIsFinal bool — false while still speaking, true when the turn is complete
```
---
## 8. Turn Modes
### Server VAD (default)
ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking.
**When to use**: Casual conversation, hands-free interaction.
```
StartConversation() → mic streams continuously
ElevenLabs detects speech / silence automatically
Agent replies when it detects end-of-speech
```
### Client Controlled (push-to-talk)
Your code explicitly signals turn boundaries with `StartListening()` / `StopListening()`.
**When to use**: Noisy environments, precise control, walkie-talkie style.
```
Input Pressed → StartListening() → sends user_turn_start + begins audio
Input Released → StopListening() → stops audio + sends user_turn_end
Agent replies after user_turn_end
```
---
## 9. Security — Signed URL Mode
By default, the API key is stored in Project Settings (Engine.ini). This is fine for development but **should not be shipped in packaged builds** as the key could be extracted.
### Production setup
1. Enable **Signed URL Mode** in Project Settings.
2. Set **Signed URL Endpoint** to a URL on your own backend (e.g. `https://your-server.com/api/elevenlabs-token`).
3. Your backend authenticates the player and calls the ElevenLabs API to generate a signed WebSocket URL, returning:
```json
{ "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..." }
```
4. The plugin fetches this URL before connecting — the API key never leaves your server.
---
## 10. Audio Pipeline
### Input (player → agent)
```
Device (any sample rate, any channels)
↓ FAudioCapture (UE built-in)
↓ Callback: float32 interleaved frames
↓ Downmix to mono (average channels)
↓ Resample to 16000 Hz (linear interpolation)
↓ Apply VolumeMultiplier
↓ Dispatch to Game Thread
↓ Convert float32 → int16 LE bytes
↓ Base64 encode
↓ WebSocket JSON frame: { "user_audio_chunk": "<base64>" }
```
### Output (agent → player)
```
WebSocket JSON frame: { "type": "audio", "audio_event": { "audio_base_64": "..." } }
↓ Base64 decode → int16 LE PCM bytes
↓ Enqueue in thread-safe AudioQueue
↓ USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue
↓ UAudioComponent plays from the Actor's world position (3D spatialized)
```
**Audio format** (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian.
---
## 11. Common Patterns
### Show subtitles in UI
```
OnAgentTranscript event:
├─ Segment → Speaker == "user" → show in player subtitle widget
├─ Segment → Speaker == "agent" → show in NPC speech bubble
└─ Segment → bIsFinal == false → show as "..." (in-progress)
```
### Interrupt the agent when the player starts speaking
In Server VAD mode ElevenLabs handles this automatically. For manual control:
```
OnAgentStartedSpeaking → store "agent is speaking" flag
Input Action (any) → if agent is speaking → InterruptAgent()
```
### Multiple NPCs with different agents
Each NPC Blueprint has its own `UElevenLabsConversationalAgentComponent`. Set a different `AgentID` on each component. Connections are fully independent.
### Only start the conversation when the player is nearby
```
On Begin Overlap (trigger volume around NPC)
└─► [ElevenLabs Agent] Start Conversation
On End Overlap
└─► [ElevenLabs Agent] End Conversation
```
### Adjusting microphone volume
Get the `UElevenLabsMicrophoneCaptureComponent` from the owner and set `VolumeMultiplier`:
```cpp
UElevenLabsMicrophoneCaptureComponent* Mic =
GetOwner()->FindComponentByClass<UElevenLabsMicrophoneCaptureComponent>();
if (Mic) Mic->VolumeMultiplier = 2.0f;
```
---
## 12. Troubleshooting
### Plugin doesn't appear in Project Settings
Ensure the plugin is enabled in `.uproject` and the project was recompiled after adding it.
### WebSocket connection fails immediately
- Check the **API Key** is set correctly in Project Settings.
- Check the **Agent ID** exists in your ElevenLabs account.
- Enable **Verbose Logging** in Project Settings and check the Output Log for the exact WebSocket URL and error.
- Make sure your machine has internet access and port 443 (WSS) is not blocked.
### No audio from the microphone
- Windows may require microphone permission. Check **Settings → Privacy → Microphone**.
- Try setting `VolumeMultiplier` to `2.0` to rule out a volume issue.
- Check the Output Log for `"Failed to open default audio capture stream"`.
### Agent audio is choppy or silent
- The `USoundWaveProcedural` queue may be underflowing. This can happen if audio chunks arrive with long gaps. Check network latency.
- Ensure no other component is consuming the same `UAudioComponent`.
### `OnAgentStoppedSpeaking` fires too early
The silence detection threshold is 30 ticks (~0.5 s at 60 fps). If the agent has natural pauses in speech, increase `SilenceThresholdTicks` in `ElevenLabsConversationalAgentComponent.h`:
```cpp
static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s
```
### Build error: "Plugin AudioCapture not found"
Make sure the `AudioCapture` plugin is enabled in your project. It should be auto-enabled via the `.uplugin` dependency, but you can also add it manually to `.uproject`:
```json
{ "Name": "AudioCapture", "Enabled": true }
```
---
*Documentation generated 2026-02-19 — Plugin v1.0.0 — UE 5.5*