Add plugin documentation for PS_AI_Agent_ElevenLabs

Covers: installation, project settings, quick start (Blueprint + C++), full component/API reference, turn modes, security/signed URL mode, audio pipeline diagram, common patterns, and troubleshooting guide. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 13:07:49 +01:00
parent 3b98edcf92
commit c833ccd66d
1 changed files with 531 additions and 0 deletions
--- a/.claude/PS_AI_Agent_ElevenLabs_Documentation.md
+++ b/.claude/PS_AI_Agent_ElevenLabs_Documentation.md
@@ -0,0 +1,531 @@
+# PS_AI_Agent_ElevenLabs — Plugin Documentation
+
+**Engine**: Unreal Engine 5.5
+**Plugin version**: 1.0.0
+**Status**: Beta
+**API**: [ElevenLabs Conversational AI](https://elevenlabs.io/docs/conversational-ai)
+
+---
+
+## Table of Contents
+
+1. [Overview](#1-overview)
+2. [Installation](#2-installation)
+3. [Project Settings](#3-project-settings)
+4. [Quick Start (Blueprint)](#4-quick-start-blueprint)
+5. [Quick Start (C++)](#5-quick-start-c)
+6. [Components Reference](#6-components-reference)
+   - [UElevenLabsConversationalAgentComponent](#uelevenlabsconversationalagentcomponent)
+   - [UElevenLabsMicrophoneCaptureComponent](#uelevenlabsmicrophonecapturecomponent)
+   - [UElevenLabsWebSocketProxy](#uelevenlabswebsocketproxy)
+7. [Data Types Reference](#7-data-types-reference)
+8. [Turn Modes](#8-turn-modes)
+9. [Security — Signed URL Mode](#9-security--signed-url-mode)
+10. [Audio Pipeline](#10-audio-pipeline)
+11. [Common Patterns](#11-common-patterns)
+12. [Troubleshooting](#12-troubleshooting)
+
+---
+
+## 1. Overview
+
+This plugin integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5, enabling real-time voice conversations between a player and an NPC (or any Actor).
+
+### How it works
+
+```
+Player microphone
+      │
+      ▼
+UElevenLabsMicrophoneCaptureComponent
+  • Captures from default audio device
+  • Resamples to 16 kHz mono float32
+      │
+      ▼
+UElevenLabsConversationalAgentComponent
+  • Converts float32 → int16 PCM bytes
+  • Sends via WebSocket to ElevenLabs
+      │  (wss://api.elevenlabs.io/v1/convai/conversation)
+      ▼
+ElevenLabs Conversational AI Agent
+  • Transcribes speech
+  • Runs LLM
+  • Synthesizes voice (ElevenLabs TTS)
+      │
+      ▼
+UElevenLabsConversationalAgentComponent
+  • Receives Base64 PCM audio chunks
+  • Feeds USoundWaveProcedural → UAudioComponent
+      │
+      ▼
+Agent voice plays from the Actor's position in the world
+```
+
+### Key properties
+- No gRPC, no third-party libraries — uses UE's built-in `WebSockets` and `AudioCapture` modules
+- Blueprint-first: all events and controls are exposed to Blueprint
+- Real-time bidirectional: audio streams in both directions simultaneously
+- Server VAD (default) or push-to-talk
+
+---
+
+## 2. Installation
+
+The plugin lives inside the project, not the engine, so no separate install is needed.
+
+### Verify it is enabled
+
+Open `Unreal/PS_AI_Agent/PS_AI_Agent.uproject` and confirm:
+
+```json
+{
+  "Name": "PS_AI_Agent_ElevenLabs",
+  "Enabled": true
+}
+```
+
+### First compile
+
+Open the project in the UE 5.5 Editor. It will detect the new plugin and ask to recompile — click **Yes**. Alternatively, compile from the command line:
+
+```
+"C:\Program Files\Epic Games\UE_5.5\Engine\Build\BatchFiles\Build.bat"
+    PS_AI_AgentEditor Win64 Development
+    "<repo>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject"
+    -WaitMutex
+```
+
+---
+
+## 3. Project Settings
+
+Go to **Edit → Project Settings → Plugins → ElevenLabs AI Agent**.
+
+| Setting | Description | Required |
+|---|---|---|
+| **API Key** | Your ElevenLabs API key from [elevenlabs.io](https://elevenlabs.io) | Yes (unless using Signed URL Mode) |
+| **Agent ID** | Default agent ID. Create agents at [elevenlabs.io/app/conversational-ai](https://elevenlabs.io/app/conversational-ai) | Yes (unless set per-component) |
+| **Signed URL Mode** | Fetch the WS URL from your own backend (keeps key off client). See [Section 9](#9-security--signed-url-mode) | No |
+| **Signed URL Endpoint** | Your backend URL returning `{ "signed_url": "wss://..." }` | Only if Signed URL Mode = true |
+| **Custom WebSocket URL** | Override the default `wss://api.elevenlabs.io/...` endpoint (debug only) | No |
+| **Verbose Logging** | Log every WebSocket JSON frame to Output Log | No |
+
+> **Security note**: Never ship with the API key hard-coded in a packaged build. Use Signed URL Mode for production, or load the key at runtime from a secure backend.
+
+---
+
+## 4. Quick Start (Blueprint)
+
+### Step 1 — Add the component to an NPC
+
+1. Open your NPC Blueprint (or any Actor Blueprint).
+2. In the **Components** panel, click **Add** → search for **ElevenLabs Conversational Agent**.
+3. Select the component. In the **Details** panel you can optionally set a specific **Agent ID** (overrides the project default).
+
+### Step 2 — Set Turn Mode
+
+In the component's **Details** panel:
+- **Server VAD** (default): ElevenLabs automatically detects when the player stops speaking. Microphone streams continuously once connected.
+- **Client Controlled**: You call `Start Listening` / `Stop Listening` manually (push-to-talk).
+
+### Step 3 — Wire up events in the Event Graph
+
+```
+Event BeginPlay
+    └─► [ElevenLabs Agent] Start Conversation
+
+[ElevenLabs Agent] On Agent Connected
+    └─► Print String "Connected! ID: " + Conversation Info → Conversation ID
+
+[ElevenLabs Agent] On Agent Text Response
+    └─► Set Text (UI widget) ← Response Text
+
+[ElevenLabs Agent] On Agent Transcript
+    └─► (optional) display live subtitles ← Segment → Text
+
+[ElevenLabs Agent] On Agent Started Speaking
+    └─► Play talking animation on NPC
+
+[ElevenLabs Agent] On Agent Stopped Speaking
+    └─► Return to idle animation
+
+[ElevenLabs Agent] On Agent Error
+    └─► Print String "Error: " + Error Message
+
+Event EndPlay
+    └─► [ElevenLabs Agent] End Conversation
+```
+
+### Step 4 — Push-to-talk (Client Controlled mode only)
+
+```
+Input Action "Talk" (Pressed)
+    └─► [ElevenLabs Agent] Start Listening
+
+Input Action "Talk" (Released)
+    └─► [ElevenLabs Agent] Stop Listening
+```
+
+---
+
+## 5. Quick Start (C++)
+
+### 1. Add the plugin to your module's Build.cs
+
+```csharp
+PrivateDependencyModuleNames.Add("PS_AI_Agent_ElevenLabs");
+```
+
+### 2. Include and use
+
+```cpp
+#include "ElevenLabsConversationalAgentComponent.h"
+#include "ElevenLabsDefinitions.h"
+
+// In your Actor's header:
+UPROPERTY(VisibleAnywhere)
+UElevenLabsConversationalAgentComponent* ElevenLabsAgent;
+
+// In the constructor:
+ElevenLabsAgent = CreateDefaultSubobject<UElevenLabsConversationalAgentComponent>(
+    TEXT("ElevenLabsAgent"));
+
+// Override Agent ID at runtime (optional):
+ElevenLabsAgent->AgentID = TEXT("your_agent_id_here");
+ElevenLabsAgent->TurnMode = EElevenLabsTurnMode::Server;
+ElevenLabsAgent->bAutoStartListening = true;
+
+// Bind events:
+ElevenLabsAgent->OnAgentConnected.AddDynamic(
+    this, &AMyNPC::HandleAgentConnected);
+ElevenLabsAgent->OnAgentTextResponse.AddDynamic(
+    this, &AMyNPC::HandleAgentResponse);
+ElevenLabsAgent->OnAgentStartedSpeaking.AddDynamic(
+    this, &AMyNPC::PlayTalkingAnimation);
+
+// Start the conversation:
+ElevenLabsAgent->StartConversation();
+
+// Later, to end it:
+ElevenLabsAgent->EndConversation();
+```
+
+### 3. Callback signatures
+
+```cpp
+UFUNCTION()
+void HandleAgentConnected(const FElevenLabsConversationInfo& Info)
+{
+    UE_LOG(LogTemp, Log, TEXT("Connected, ConvID=%s"), *Info.ConversationID);
+}
+
+UFUNCTION()
+void HandleAgentResponse(const FString& ResponseText)
+{
+    // Display in UI, drive subtitles, etc.
+}
+
+UFUNCTION()
+void PlayTalkingAnimation()
+{
+    // Switch to talking anim montage
+}
+```
+
+---
+
+## 6. Components Reference
+
+### UElevenLabsConversationalAgentComponent
+
+The **main component** — attach this to any Actor that should be able to speak.
+
+**Category**: ElevenLabs
+**Inherits from**: `UActorComponent`
+
+#### Properties
+
+| Property | Type | Default | Description |
+|---|---|---|---|
+| `AgentID` | `FString` | `""` | Agent ID for this actor. Overrides the project-level default when non-empty. |
+| `TurnMode` | `EElevenLabsTurnMode` | `Server` | How speaker turns are detected. See [Section 8](#8-turn-modes). |
+| `bAutoStartListening` | `bool` | `true` | If true, starts mic capture automatically once the WebSocket is ready. |
+
+#### Functions
+
+| Function | Blueprint | Description |
+|---|---|---|
+| `StartConversation()` | Callable | Opens the WebSocket connection. If `bAutoStartListening` is true, mic capture starts once connected. |
+| `EndConversation()` | Callable | Closes the WebSocket, stops mic, stops audio playback. |
+| `StartListening()` | Callable | Starts microphone capture. In Client mode, also sends `user_turn_start` to ElevenLabs. |
+| `StopListening()` | Callable | Stops microphone capture. In Client mode, also sends `user_turn_end`. |
+| `InterruptAgent()` | Callable | Stops the agent's current utterance immediately. |
+| `IsConnected()` | Pure | Returns true if the WebSocket is open and the conversation is active. |
+| `IsListening()` | Pure | Returns true if the microphone is currently capturing. |
+| `IsAgentSpeaking()` | Pure | Returns true if agent audio is currently playing. |
+| `GetConversationInfo()` | Pure | Returns `FElevenLabsConversationInfo` (ConversationID, AgentID). |
+| `GetWebSocketProxy()` | Pure | Returns the underlying `UElevenLabsWebSocketProxy` for advanced use. |
+
+#### Events
+
+| Event | Parameters | Fired when |
+|---|---|---|
+| `OnAgentConnected` | `FElevenLabsConversationInfo` | WebSocket handshake + agent initiation complete. |
+| `OnAgentDisconnected` | `int32 StatusCode`, `FString Reason` | WebSocket closed (graceful or remote). |
+| `OnAgentError` | `FString ErrorMessage` | Connection or protocol error. |
+| `OnAgentTranscript` | `FElevenLabsTranscriptSegment` | Any transcript arrives (user or agent, tentative or final). |
+| `OnAgentTextResponse` | `FString ResponseText` | Final text response from the agent (complements the audio). |
+| `OnAgentStartedSpeaking` | — | First audio chunk received from the agent. |
+| `OnAgentStoppedSpeaking` | — | Audio queue empty for ~0.5 s (agent done speaking). |
+| `OnAgentInterrupted` | — | Agent speech was interrupted (by user or by `InterruptAgent()`). |
+
+---
+
+### UElevenLabsMicrophoneCaptureComponent
+
+A lightweight microphone capture component. Managed automatically by `UElevenLabsConversationalAgentComponent` — you only need to use this directly for advanced scenarios (e.g. custom audio routing).
+
+**Category**: ElevenLabs
+**Inherits from**: `UActorComponent`
+
+#### Properties
+
+| Property | Type | Default | Description |
+|---|---|---|---|
+| `VolumeMultiplier` | `float` | `1.0` | Gain applied to captured samples. Range: 0.0 – 4.0. |
+
+#### Functions
+
+| Function | Blueprint | Description |
+|---|---|---|
+| `StartCapture()` | Callable | Opens the default audio input device and starts streaming. |
+| `StopCapture()` | Callable | Stops streaming and closes the device. |
+| `IsCapturing()` | Pure | True while actively capturing. |
+
+#### Delegate
+
+`OnAudioCaptured` — fires on the game thread with `TArray<float>` PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually.
+
+---
+
+### UElevenLabsWebSocketProxy
+
+Low-level WebSocket session manager. Used internally by `UElevenLabsConversationalAgentComponent`. Use this directly only if you need fine-grained protocol control.
+
+**Inherits from**: `UObject`
+**Instantiate via**: `NewObject<UElevenLabsWebSocketProxy>(Outer)`
+
+#### Key functions
+
+| Function | Description |
+|---|---|
+| `Connect(AgentID, APIKey)` | Open the WS connection. Parameters override project settings when non-empty. |
+| `Disconnect()` | Send close frame and tear down the connection. |
+| `SendAudioChunk(PCMData)` | Send raw int16 LE PCM bytes. Called automatically by the agent component. |
+| `SendUserTurnStart()` | Signal start of user speech (Client turn mode only). |
+| `SendUserTurnEnd()` | Signal end of user speech (Client turn mode only). |
+| `SendInterrupt()` | Ask the agent to stop speaking. |
+| `GetConnectionState()` | Returns `EElevenLabsConnectionState`. |
+| `GetConversationInfo()` | Returns `FElevenLabsConversationInfo`. |
+
+---
+
+## 7. Data Types Reference
+
+### EElevenLabsConnectionState
+
+```
+Disconnected  — No active connection
+Connecting    — WebSocket handshake in progress
+Connected     — Conversation active and ready
+Error         — Connection or protocol failure
+```
+
+### EElevenLabsTurnMode
+
+```
+Server  — ElevenLabs Voice Activity Detection decides when the user stops speaking (recommended)
+Client  — Your code calls StartListening/StopListening to define turns (push-to-talk)
+```
+
+### FElevenLabsConversationInfo
+
+```
+ConversationID  FString  — Unique session ID assigned by ElevenLabs
+AgentID         FString  — The agent that responded
+```
+
+### FElevenLabsTranscriptSegment
+
+```
+Text      FString  — Transcribed text
+Speaker   FString  — "user" or "agent"
+bIsFinal  bool     — false while still speaking, true when the turn is complete
+```
+
+---
+
+## 8. Turn Modes
+
+### Server VAD (default)
+
+ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking.
+
+**When to use**: Casual conversation, hands-free interaction.
+
+```
+StartConversation()  →  mic streams continuously
+                        ElevenLabs detects speech / silence automatically
+                        Agent replies when it detects end-of-speech
+```
+
+### Client Controlled (push-to-talk)
+
+Your code explicitly signals turn boundaries with `StartListening()` / `StopListening()`.
+
+**When to use**: Noisy environments, precise control, walkie-talkie style.
+
+```
+Input Pressed   →  StartListening()   →  sends user_turn_start + begins audio
+Input Released  →  StopListening()    →  stops audio + sends user_turn_end
+                                         Agent replies after user_turn_end
+```
+
+---
+
+## 9. Security — Signed URL Mode
+
+By default, the API key is stored in Project Settings (Engine.ini). This is fine for development but **should not be shipped in packaged builds** as the key could be extracted.
+
+### Production setup
+
+1. Enable **Signed URL Mode** in Project Settings.
+2. Set **Signed URL Endpoint** to a URL on your own backend (e.g. `https://your-server.com/api/elevenlabs-token`).
+3. Your backend authenticates the player and calls the ElevenLabs API to generate a signed WebSocket URL, returning:
+   ```json
+   { "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..." }
+   ```
+4. The plugin fetches this URL before connecting — the API key never leaves your server.
+
+---
+
+## 10. Audio Pipeline
+
+### Input (player → agent)
+
+```
+Device (any sample rate, any channels)
+  ↓  FAudioCapture (UE built-in)
+  ↓  Callback: float32 interleaved frames
+  ↓  Downmix to mono (average channels)
+  ↓  Resample to 16000 Hz (linear interpolation)
+  ↓  Apply VolumeMultiplier
+  ↓  Dispatch to Game Thread
+  ↓  Convert float32 → int16 LE bytes
+  ↓  Base64 encode
+  ↓  WebSocket JSON frame: { "user_audio_chunk": "<base64>" }
+```
+
+### Output (agent → player)
+
+```
+WebSocket JSON frame: { "type": "audio", "audio_event": { "audio_base_64": "..." } }
+  ↓  Base64 decode → int16 LE PCM bytes
+  ↓  Enqueue in thread-safe AudioQueue
+  ↓  USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue
+  ↓  UAudioComponent plays from the Actor's world position (3D spatialized)
+```
+
+**Audio format** (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian.
+
+---
+
+## 11. Common Patterns
+
+### Show subtitles in UI
+
+```
+OnAgentTranscript event:
+  ├─ Segment → Speaker == "user"   → show in player subtitle widget
+  ├─ Segment → Speaker == "agent"  → show in NPC speech bubble
+  └─ Segment → bIsFinal == false   → show as "..." (in-progress)
+```
+
+### Interrupt the agent when the player starts speaking
+
+In Server VAD mode ElevenLabs handles this automatically. For manual control:
+
+```
+OnAgentStartedSpeaking  →  store "agent is speaking" flag
+Input Action (any)      →  if agent is speaking → InterruptAgent()
+```
+
+### Multiple NPCs with different agents
+
+Each NPC Blueprint has its own `UElevenLabsConversationalAgentComponent`. Set a different `AgentID` on each component. Connections are fully independent.
+
+### Only start the conversation when the player is nearby
+
+```
+On Begin Overlap (trigger volume around NPC)
+  └─► [ElevenLabs Agent] Start Conversation
+
+On End Overlap
+  └─► [ElevenLabs Agent] End Conversation
+```
+
+### Adjusting microphone volume
+
+Get the `UElevenLabsMicrophoneCaptureComponent` from the owner and set `VolumeMultiplier`:
+
+```cpp
+UElevenLabsMicrophoneCaptureComponent* Mic =
+    GetOwner()->FindComponentByClass<UElevenLabsMicrophoneCaptureComponent>();
+if (Mic) Mic->VolumeMultiplier = 2.0f;
+```
+
+---
+
+## 12. Troubleshooting
+
+### Plugin doesn't appear in Project Settings
+
+Ensure the plugin is enabled in `.uproject` and the project was recompiled after adding it.
+
+### WebSocket connection fails immediately
+
+- Check the **API Key** is set correctly in Project Settings.
+- Check the **Agent ID** exists in your ElevenLabs account.
+- Enable **Verbose Logging** in Project Settings and check the Output Log for the exact WebSocket URL and error.
+- Make sure your machine has internet access and port 443 (WSS) is not blocked.
+
+### No audio from the microphone
+
+- Windows may require microphone permission. Check **Settings → Privacy → Microphone**.
+- Try setting `VolumeMultiplier` to `2.0` to rule out a volume issue.
+- Check the Output Log for `"Failed to open default audio capture stream"`.
+
+### Agent audio is choppy or silent
+
+- The `USoundWaveProcedural` queue may be underflowing. This can happen if audio chunks arrive with long gaps. Check network latency.
+- Ensure no other component is consuming the same `UAudioComponent`.
+
+### `OnAgentStoppedSpeaking` fires too early
+
+The silence detection threshold is 30 ticks (~0.5 s at 60 fps). If the agent has natural pauses in speech, increase `SilenceThresholdTicks` in `ElevenLabsConversationalAgentComponent.h`:
+
+```cpp
+static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s
+```
+
+### Build error: "Plugin AudioCapture not found"
+
+Make sure the `AudioCapture` plugin is enabled in your project. It should be auto-enabled via the `.uplugin` dependency, but you can also add it manually to `.uproject`:
+
+```json
+{ "Name": "AudioCapture", "Enabled": true }
+```
+
+---
+
+*Documentation generated 2026-02-19 — Plugin v1.0.0 — UE 5.5*