diff --git a/.claude/PS_AI_Agent_ElevenLabs_Documentation.md b/.claude/PS_AI_Agent_ElevenLabs_Documentation.md new file mode 100644 index 0000000..1ba2038 --- /dev/null +++ b/.claude/PS_AI_Agent_ElevenLabs_Documentation.md @@ -0,0 +1,531 @@ +# PS_AI_Agent_ElevenLabs — Plugin Documentation + +**Engine**: Unreal Engine 5.5 +**Plugin version**: 1.0.0 +**Status**: Beta +**API**: [ElevenLabs Conversational AI](https://elevenlabs.io/docs/conversational-ai) + +--- + +## Table of Contents + +1. [Overview](#1-overview) +2. [Installation](#2-installation) +3. [Project Settings](#3-project-settings) +4. [Quick Start (Blueprint)](#4-quick-start-blueprint) +5. [Quick Start (C++)](#5-quick-start-c) +6. [Components Reference](#6-components-reference) + - [UElevenLabsConversationalAgentComponent](#uelevenlabsconversationalagentcomponent) + - [UElevenLabsMicrophoneCaptureComponent](#uelevenlabsmicrophonecapturecomponent) + - [UElevenLabsWebSocketProxy](#uelevenlabswebsocketproxy) +7. [Data Types Reference](#7-data-types-reference) +8. [Turn Modes](#8-turn-modes) +9. [Security — Signed URL Mode](#9-security--signed-url-mode) +10. [Audio Pipeline](#10-audio-pipeline) +11. [Common Patterns](#11-common-patterns) +12. [Troubleshooting](#12-troubleshooting) + +--- + +## 1. Overview + +This plugin integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5, enabling real-time voice conversations between a player and an NPC (or any Actor). + +### How it works + +``` +Player microphone + │ + ▼ +UElevenLabsMicrophoneCaptureComponent + • Captures from default audio device + • Resamples to 16 kHz mono float32 + │ + ▼ +UElevenLabsConversationalAgentComponent + • Converts float32 → int16 PCM bytes + • Sends via WebSocket to ElevenLabs + │ (wss://api.elevenlabs.io/v1/convai/conversation) + ▼ +ElevenLabs Conversational AI Agent + • Transcribes speech + • Runs LLM + • Synthesizes voice (ElevenLabs TTS) + │ + ▼ +UElevenLabsConversationalAgentComponent + • Receives Base64 PCM audio chunks + • Feeds USoundWaveProcedural → UAudioComponent + │ + ▼ +Agent voice plays from the Actor's position in the world +``` + +### Key properties +- No gRPC, no third-party libraries — uses UE's built-in `WebSockets` and `AudioCapture` modules +- Blueprint-first: all events and controls are exposed to Blueprint +- Real-time bidirectional: audio streams in both directions simultaneously +- Server VAD (default) or push-to-talk + +--- + +## 2. Installation + +The plugin lives inside the project, not the engine, so no separate install is needed. + +### Verify it is enabled + +Open `Unreal/PS_AI_Agent/PS_AI_Agent.uproject` and confirm: + +```json +{ + "Name": "PS_AI_Agent_ElevenLabs", + "Enabled": true +} +``` + +### First compile + +Open the project in the UE 5.5 Editor. It will detect the new plugin and ask to recompile — click **Yes**. Alternatively, compile from the command line: + +``` +"C:\Program Files\Epic Games\UE_5.5\Engine\Build\BatchFiles\Build.bat" + PS_AI_AgentEditor Win64 Development + "/Unreal/PS_AI_Agent/PS_AI_Agent.uproject" + -WaitMutex +``` + +--- + +## 3. Project Settings + +Go to **Edit → Project Settings → Plugins → ElevenLabs AI Agent**. + +| Setting | Description | Required | +|---|---|---| +| **API Key** | Your ElevenLabs API key from [elevenlabs.io](https://elevenlabs.io) | Yes (unless using Signed URL Mode) | +| **Agent ID** | Default agent ID. Create agents at [elevenlabs.io/app/conversational-ai](https://elevenlabs.io/app/conversational-ai) | Yes (unless set per-component) | +| **Signed URL Mode** | Fetch the WS URL from your own backend (keeps key off client). See [Section 9](#9-security--signed-url-mode) | No | +| **Signed URL Endpoint** | Your backend URL returning `{ "signed_url": "wss://..." }` | Only if Signed URL Mode = true | +| **Custom WebSocket URL** | Override the default `wss://api.elevenlabs.io/...` endpoint (debug only) | No | +| **Verbose Logging** | Log every WebSocket JSON frame to Output Log | No | + +> **Security note**: Never ship with the API key hard-coded in a packaged build. Use Signed URL Mode for production, or load the key at runtime from a secure backend. + +--- + +## 4. Quick Start (Blueprint) + +### Step 1 — Add the component to an NPC + +1. Open your NPC Blueprint (or any Actor Blueprint). +2. In the **Components** panel, click **Add** → search for **ElevenLabs Conversational Agent**. +3. Select the component. In the **Details** panel you can optionally set a specific **Agent ID** (overrides the project default). + +### Step 2 — Set Turn Mode + +In the component's **Details** panel: +- **Server VAD** (default): ElevenLabs automatically detects when the player stops speaking. Microphone streams continuously once connected. +- **Client Controlled**: You call `Start Listening` / `Stop Listening` manually (push-to-talk). + +### Step 3 — Wire up events in the Event Graph + +``` +Event BeginPlay + └─► [ElevenLabs Agent] Start Conversation + +[ElevenLabs Agent] On Agent Connected + └─► Print String "Connected! ID: " + Conversation Info → Conversation ID + +[ElevenLabs Agent] On Agent Text Response + └─► Set Text (UI widget) ← Response Text + +[ElevenLabs Agent] On Agent Transcript + └─► (optional) display live subtitles ← Segment → Text + +[ElevenLabs Agent] On Agent Started Speaking + └─► Play talking animation on NPC + +[ElevenLabs Agent] On Agent Stopped Speaking + └─► Return to idle animation + +[ElevenLabs Agent] On Agent Error + └─► Print String "Error: " + Error Message + +Event EndPlay + └─► [ElevenLabs Agent] End Conversation +``` + +### Step 4 — Push-to-talk (Client Controlled mode only) + +``` +Input Action "Talk" (Pressed) + └─► [ElevenLabs Agent] Start Listening + +Input Action "Talk" (Released) + └─► [ElevenLabs Agent] Stop Listening +``` + +--- + +## 5. Quick Start (C++) + +### 1. Add the plugin to your module's Build.cs + +```csharp +PrivateDependencyModuleNames.Add("PS_AI_Agent_ElevenLabs"); +``` + +### 2. Include and use + +```cpp +#include "ElevenLabsConversationalAgentComponent.h" +#include "ElevenLabsDefinitions.h" + +// In your Actor's header: +UPROPERTY(VisibleAnywhere) +UElevenLabsConversationalAgentComponent* ElevenLabsAgent; + +// In the constructor: +ElevenLabsAgent = CreateDefaultSubobject( + TEXT("ElevenLabsAgent")); + +// Override Agent ID at runtime (optional): +ElevenLabsAgent->AgentID = TEXT("your_agent_id_here"); +ElevenLabsAgent->TurnMode = EElevenLabsTurnMode::Server; +ElevenLabsAgent->bAutoStartListening = true; + +// Bind events: +ElevenLabsAgent->OnAgentConnected.AddDynamic( + this, &AMyNPC::HandleAgentConnected); +ElevenLabsAgent->OnAgentTextResponse.AddDynamic( + this, &AMyNPC::HandleAgentResponse); +ElevenLabsAgent->OnAgentStartedSpeaking.AddDynamic( + this, &AMyNPC::PlayTalkingAnimation); + +// Start the conversation: +ElevenLabsAgent->StartConversation(); + +// Later, to end it: +ElevenLabsAgent->EndConversation(); +``` + +### 3. Callback signatures + +```cpp +UFUNCTION() +void HandleAgentConnected(const FElevenLabsConversationInfo& Info) +{ + UE_LOG(LogTemp, Log, TEXT("Connected, ConvID=%s"), *Info.ConversationID); +} + +UFUNCTION() +void HandleAgentResponse(const FString& ResponseText) +{ + // Display in UI, drive subtitles, etc. +} + +UFUNCTION() +void PlayTalkingAnimation() +{ + // Switch to talking anim montage +} +``` + +--- + +## 6. Components Reference + +### UElevenLabsConversationalAgentComponent + +The **main component** — attach this to any Actor that should be able to speak. + +**Category**: ElevenLabs +**Inherits from**: `UActorComponent` + +#### Properties + +| Property | Type | Default | Description | +|---|---|---|---| +| `AgentID` | `FString` | `""` | Agent ID for this actor. Overrides the project-level default when non-empty. | +| `TurnMode` | `EElevenLabsTurnMode` | `Server` | How speaker turns are detected. See [Section 8](#8-turn-modes). | +| `bAutoStartListening` | `bool` | `true` | If true, starts mic capture automatically once the WebSocket is ready. | + +#### Functions + +| Function | Blueprint | Description | +|---|---|---| +| `StartConversation()` | Callable | Opens the WebSocket connection. If `bAutoStartListening` is true, mic capture starts once connected. | +| `EndConversation()` | Callable | Closes the WebSocket, stops mic, stops audio playback. | +| `StartListening()` | Callable | Starts microphone capture. In Client mode, also sends `user_turn_start` to ElevenLabs. | +| `StopListening()` | Callable | Stops microphone capture. In Client mode, also sends `user_turn_end`. | +| `InterruptAgent()` | Callable | Stops the agent's current utterance immediately. | +| `IsConnected()` | Pure | Returns true if the WebSocket is open and the conversation is active. | +| `IsListening()` | Pure | Returns true if the microphone is currently capturing. | +| `IsAgentSpeaking()` | Pure | Returns true if agent audio is currently playing. | +| `GetConversationInfo()` | Pure | Returns `FElevenLabsConversationInfo` (ConversationID, AgentID). | +| `GetWebSocketProxy()` | Pure | Returns the underlying `UElevenLabsWebSocketProxy` for advanced use. | + +#### Events + +| Event | Parameters | Fired when | +|---|---|---| +| `OnAgentConnected` | `FElevenLabsConversationInfo` | WebSocket handshake + agent initiation complete. | +| `OnAgentDisconnected` | `int32 StatusCode`, `FString Reason` | WebSocket closed (graceful or remote). | +| `OnAgentError` | `FString ErrorMessage` | Connection or protocol error. | +| `OnAgentTranscript` | `FElevenLabsTranscriptSegment` | Any transcript arrives (user or agent, tentative or final). | +| `OnAgentTextResponse` | `FString ResponseText` | Final text response from the agent (complements the audio). | +| `OnAgentStartedSpeaking` | — | First audio chunk received from the agent. | +| `OnAgentStoppedSpeaking` | — | Audio queue empty for ~0.5 s (agent done speaking). | +| `OnAgentInterrupted` | — | Agent speech was interrupted (by user or by `InterruptAgent()`). | + +--- + +### UElevenLabsMicrophoneCaptureComponent + +A lightweight microphone capture component. Managed automatically by `UElevenLabsConversationalAgentComponent` — you only need to use this directly for advanced scenarios (e.g. custom audio routing). + +**Category**: ElevenLabs +**Inherits from**: `UActorComponent` + +#### Properties + +| Property | Type | Default | Description | +|---|---|---|---| +| `VolumeMultiplier` | `float` | `1.0` | Gain applied to captured samples. Range: 0.0 – 4.0. | + +#### Functions + +| Function | Blueprint | Description | +|---|---|---| +| `StartCapture()` | Callable | Opens the default audio input device and starts streaming. | +| `StopCapture()` | Callable | Stops streaming and closes the device. | +| `IsCapturing()` | Pure | True while actively capturing. | + +#### Delegate + +`OnAudioCaptured` — fires on the game thread with `TArray` PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually. + +--- + +### UElevenLabsWebSocketProxy + +Low-level WebSocket session manager. Used internally by `UElevenLabsConversationalAgentComponent`. Use this directly only if you need fine-grained protocol control. + +**Inherits from**: `UObject` +**Instantiate via**: `NewObject(Outer)` + +#### Key functions + +| Function | Description | +|---|---| +| `Connect(AgentID, APIKey)` | Open the WS connection. Parameters override project settings when non-empty. | +| `Disconnect()` | Send close frame and tear down the connection. | +| `SendAudioChunk(PCMData)` | Send raw int16 LE PCM bytes. Called automatically by the agent component. | +| `SendUserTurnStart()` | Signal start of user speech (Client turn mode only). | +| `SendUserTurnEnd()` | Signal end of user speech (Client turn mode only). | +| `SendInterrupt()` | Ask the agent to stop speaking. | +| `GetConnectionState()` | Returns `EElevenLabsConnectionState`. | +| `GetConversationInfo()` | Returns `FElevenLabsConversationInfo`. | + +--- + +## 7. Data Types Reference + +### EElevenLabsConnectionState + +``` +Disconnected — No active connection +Connecting — WebSocket handshake in progress +Connected — Conversation active and ready +Error — Connection or protocol failure +``` + +### EElevenLabsTurnMode + +``` +Server — ElevenLabs Voice Activity Detection decides when the user stops speaking (recommended) +Client — Your code calls StartListening/StopListening to define turns (push-to-talk) +``` + +### FElevenLabsConversationInfo + +``` +ConversationID FString — Unique session ID assigned by ElevenLabs +AgentID FString — The agent that responded +``` + +### FElevenLabsTranscriptSegment + +``` +Text FString — Transcribed text +Speaker FString — "user" or "agent" +bIsFinal bool — false while still speaking, true when the turn is complete +``` + +--- + +## 8. Turn Modes + +### Server VAD (default) + +ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking. + +**When to use**: Casual conversation, hands-free interaction. + +``` +StartConversation() → mic streams continuously + ElevenLabs detects speech / silence automatically + Agent replies when it detects end-of-speech +``` + +### Client Controlled (push-to-talk) + +Your code explicitly signals turn boundaries with `StartListening()` / `StopListening()`. + +**When to use**: Noisy environments, precise control, walkie-talkie style. + +``` +Input Pressed → StartListening() → sends user_turn_start + begins audio +Input Released → StopListening() → stops audio + sends user_turn_end + Agent replies after user_turn_end +``` + +--- + +## 9. Security — Signed URL Mode + +By default, the API key is stored in Project Settings (Engine.ini). This is fine for development but **should not be shipped in packaged builds** as the key could be extracted. + +### Production setup + +1. Enable **Signed URL Mode** in Project Settings. +2. Set **Signed URL Endpoint** to a URL on your own backend (e.g. `https://your-server.com/api/elevenlabs-token`). +3. Your backend authenticates the player and calls the ElevenLabs API to generate a signed WebSocket URL, returning: + ```json + { "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..." } + ``` +4. The plugin fetches this URL before connecting — the API key never leaves your server. + +--- + +## 10. Audio Pipeline + +### Input (player → agent) + +``` +Device (any sample rate, any channels) + ↓ FAudioCapture (UE built-in) + ↓ Callback: float32 interleaved frames + ↓ Downmix to mono (average channels) + ↓ Resample to 16000 Hz (linear interpolation) + ↓ Apply VolumeMultiplier + ↓ Dispatch to Game Thread + ↓ Convert float32 → int16 LE bytes + ↓ Base64 encode + ↓ WebSocket JSON frame: { "user_audio_chunk": "" } +``` + +### Output (agent → player) + +``` +WebSocket JSON frame: { "type": "audio", "audio_event": { "audio_base_64": "..." } } + ↓ Base64 decode → int16 LE PCM bytes + ↓ Enqueue in thread-safe AudioQueue + ↓ USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue + ↓ UAudioComponent plays from the Actor's world position (3D spatialized) +``` + +**Audio format** (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian. + +--- + +## 11. Common Patterns + +### Show subtitles in UI + +``` +OnAgentTranscript event: + ├─ Segment → Speaker == "user" → show in player subtitle widget + ├─ Segment → Speaker == "agent" → show in NPC speech bubble + └─ Segment → bIsFinal == false → show as "..." (in-progress) +``` + +### Interrupt the agent when the player starts speaking + +In Server VAD mode ElevenLabs handles this automatically. For manual control: + +``` +OnAgentStartedSpeaking → store "agent is speaking" flag +Input Action (any) → if agent is speaking → InterruptAgent() +``` + +### Multiple NPCs with different agents + +Each NPC Blueprint has its own `UElevenLabsConversationalAgentComponent`. Set a different `AgentID` on each component. Connections are fully independent. + +### Only start the conversation when the player is nearby + +``` +On Begin Overlap (trigger volume around NPC) + └─► [ElevenLabs Agent] Start Conversation + +On End Overlap + └─► [ElevenLabs Agent] End Conversation +``` + +### Adjusting microphone volume + +Get the `UElevenLabsMicrophoneCaptureComponent` from the owner and set `VolumeMultiplier`: + +```cpp +UElevenLabsMicrophoneCaptureComponent* Mic = + GetOwner()->FindComponentByClass(); +if (Mic) Mic->VolumeMultiplier = 2.0f; +``` + +--- + +## 12. Troubleshooting + +### Plugin doesn't appear in Project Settings + +Ensure the plugin is enabled in `.uproject` and the project was recompiled after adding it. + +### WebSocket connection fails immediately + +- Check the **API Key** is set correctly in Project Settings. +- Check the **Agent ID** exists in your ElevenLabs account. +- Enable **Verbose Logging** in Project Settings and check the Output Log for the exact WebSocket URL and error. +- Make sure your machine has internet access and port 443 (WSS) is not blocked. + +### No audio from the microphone + +- Windows may require microphone permission. Check **Settings → Privacy → Microphone**. +- Try setting `VolumeMultiplier` to `2.0` to rule out a volume issue. +- Check the Output Log for `"Failed to open default audio capture stream"`. + +### Agent audio is choppy or silent + +- The `USoundWaveProcedural` queue may be underflowing. This can happen if audio chunks arrive with long gaps. Check network latency. +- Ensure no other component is consuming the same `UAudioComponent`. + +### `OnAgentStoppedSpeaking` fires too early + +The silence detection threshold is 30 ticks (~0.5 s at 60 fps). If the agent has natural pauses in speech, increase `SilenceThresholdTicks` in `ElevenLabsConversationalAgentComponent.h`: + +```cpp +static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s +``` + +### Build error: "Plugin AudioCapture not found" + +Make sure the `AudioCapture` plugin is enabled in your project. It should be auto-enabled via the `.uplugin` dependency, but you can also add it manually to `.uproject`: + +```json +{ "Name": "AudioCapture", "Enabled": true } +``` + +--- + +*Documentation generated 2026-02-19 — Plugin v1.0.0 — UE 5.5*