# PS_AI_Agent_ElevenLabs — Plugin Documentation

**Engine**: Unreal Engine 5.5
**Plugin version**: 1.0.0
**Status**: Beta
**API**: [ElevenLabs Conversational AI](https://elevenlabs.io/docs/conversational-ai)

---

## Table of Contents

1. [Overview](#1-overview)
2. [Installation](#2-installation)
3. [Project Settings](#3-project-settings)
4. [Quick Start (Blueprint)](#4-quick-start-blueprint)
5. [Quick Start (C++)](#5-quick-start-c)
6. [Components Reference](#6-components-reference)
   - [UElevenLabsConversationalAgentComponent](#uelevenlabsconversationalagentcomponent)
   - [UElevenLabsMicrophoneCaptureComponent](#uelevenlabsmicrophonecapturecomponent)
   - [UElevenLabsWebSocketProxy](#uelevenlabswebsocketproxy)
7. [Data Types Reference](#7-data-types-reference)
8. [Turn Modes](#8-turn-modes)
9. [Security — Signed URL Mode](#9-security--signed-url-mode)
10. [Audio Pipeline](#10-audio-pipeline)
11. [Common Patterns](#11-common-patterns)
12. [Troubleshooting](#12-troubleshooting)

---

## 1. Overview

This plugin integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5, enabling real-time voice conversations between a player and an NPC (or any Actor).

### How it works

```
Player microphone
      │
      ▼
UElevenLabsMicrophoneCaptureComponent
  • Captures from default audio device
  • Resamples to 16 kHz mono float32
      │
      ▼
UElevenLabsConversationalAgentComponent
  • Converts float32 → int16 PCM bytes
  • Sends via WebSocket to ElevenLabs
      │  (wss://api.elevenlabs.io/v1/convai/conversation)
      ▼
ElevenLabs Conversational AI Agent
  • Transcribes speech
  • Runs LLM
  • Synthesizes voice (ElevenLabs TTS)
      │
      ▼
UElevenLabsConversationalAgentComponent
  • Receives Base64 PCM audio chunks
  • Feeds USoundWaveProcedural → UAudioComponent
      │
      ▼
Agent voice plays from the Actor's position in the world
```

### Key properties
- No gRPC, no third-party libraries — uses UE's built-in `WebSockets` and `AudioCapture` modules
- Blueprint-first: all events and controls are exposed to Blueprint
- Real-time bidirectional: audio streams in both directions simultaneously
- Server VAD (default) or push-to-talk

---

## 2. Installation

The plugin lives inside the project, not the engine, so no separate install is needed.

### Verify it is enabled

Open `Unreal/PS_AI_Agent/PS_AI_Agent.uproject` and confirm:

```json
{
  "Name": "PS_AI_Agent_ElevenLabs",
  "Enabled": true
}
```

### First compile

Open the project in the UE 5.5 Editor. It will detect the new plugin and ask to recompile — click **Yes**. Alternatively, compile from the command line:

```
"C:\Program Files\Epic Games\UE_5.5\Engine\Build\BatchFiles\Build.bat"
    PS_AI_AgentEditor Win64 Development
    "<repo>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject"
    -WaitMutex
```

---

## 3. Project Settings

Go to **Edit → Project Settings → Plugins → ElevenLabs AI Agent**.

| Setting | Description | Required |
|---|---|---|
| **API Key** | Your ElevenLabs API key from [elevenlabs.io](https://elevenlabs.io) | Yes (unless using Signed URL Mode) |
| **Agent ID** | Default agent ID. Create agents at [elevenlabs.io/app/conversational-ai](https://elevenlabs.io/app/conversational-ai) | Yes (unless set per-component) |
| **Signed URL Mode** | Fetch the WS URL from your own backend (keeps key off client). See [Section 9](#9-security--signed-url-mode) | No |
| **Signed URL Endpoint** | Your backend URL returning `{ "signed_url": "wss://..." }` | Only if Signed URL Mode = true |
| **Custom WebSocket URL** | Override the default `wss://api.elevenlabs.io/...` endpoint (debug only) | No |
| **Verbose Logging** | Log every WebSocket JSON frame to Output Log | No |

> **Security note**: Never ship with the API key hard-coded in a packaged build. Use Signed URL Mode for production, or load the key at runtime from a secure backend.

---

## 4. Quick Start (Blueprint)

### Step 1 — Add the component to an NPC

1. Open your NPC Blueprint (or any Actor Blueprint).
2. In the **Components** panel, click **Add** → search for **ElevenLabs Conversational Agent**.
3. Select the component. In the **Details** panel you can optionally set a specific **Agent ID** (overrides the project default).

### Step 2 — Set Turn Mode

In the component's **Details** panel:
- **Server VAD** (default): ElevenLabs automatically detects when the player stops speaking. Microphone streams continuously once connected.
- **Client Controlled**: You call `Start Listening` / `Stop Listening` manually (push-to-talk).

### Step 3 — Wire up events in the Event Graph

```
Event BeginPlay
    └─► [ElevenLabs Agent] Start Conversation

[ElevenLabs Agent] On Agent Connected
    └─► Print String "Connected! ID: " + Conversation Info → Conversation ID

[ElevenLabs Agent] On Agent Text Response
    └─► Set Text (UI widget) ← Response Text

[ElevenLabs Agent] On Agent Transcript
    └─► (optional) display live subtitles ← Segment → Text

[ElevenLabs Agent] On Agent Started Speaking
    └─► Play talking animation on NPC

[ElevenLabs Agent] On Agent Stopped Speaking
    └─► Return to idle animation

[ElevenLabs Agent] On Agent Error
    └─► Print String "Error: " + Error Message

Event EndPlay
    └─► [ElevenLabs Agent] End Conversation
```

### Step 4 — Push-to-talk (Client Controlled mode only)

```
Input Action "Talk" (Pressed)
    └─► [ElevenLabs Agent] Start Listening

Input Action "Talk" (Released)
    └─► [ElevenLabs Agent] Stop Listening
```

---

## 5. Quick Start (C++)

### 1. Add the plugin to your module's Build.cs

```csharp
PrivateDependencyModuleNames.Add("PS_AI_Agent_ElevenLabs");
```

### 2. Include and use

```cpp
#include "ElevenLabsConversationalAgentComponent.h"
#include "ElevenLabsDefinitions.h"

// In your Actor's header:
UPROPERTY(VisibleAnywhere)
UElevenLabsConversationalAgentComponent* ElevenLabsAgent;

// In the constructor:
ElevenLabsAgent = CreateDefaultSubobject<UElevenLabsConversationalAgentComponent>(
    TEXT("ElevenLabsAgent"));

// Override Agent ID at runtime (optional):
ElevenLabsAgent->AgentID = TEXT("your_agent_id_here");
ElevenLabsAgent->TurnMode = EElevenLabsTurnMode::Server;
ElevenLabsAgent->bAutoStartListening = true;

// Bind events:
ElevenLabsAgent->OnAgentConnected.AddDynamic(
    this, &AMyNPC::HandleAgentConnected);
ElevenLabsAgent->OnAgentTextResponse.AddDynamic(
    this, &AMyNPC::HandleAgentResponse);
ElevenLabsAgent->OnAgentStartedSpeaking.AddDynamic(
    this, &AMyNPC::PlayTalkingAnimation);

// Start the conversation:
ElevenLabsAgent->StartConversation();

// Later, to end it:
ElevenLabsAgent->EndConversation();
```

### 3. Callback signatures

```cpp
UFUNCTION()
void HandleAgentConnected(const FElevenLabsConversationInfo& Info)
{
    UE_LOG(LogTemp, Log, TEXT("Connected, ConvID=%s"), *Info.ConversationID);
}

UFUNCTION()
void HandleAgentResponse(const FString& ResponseText)
{
    // Display in UI, drive subtitles, etc.
}

UFUNCTION()
void PlayTalkingAnimation()
{
    // Switch to talking anim montage
}
```

---

## 6. Components Reference

### UElevenLabsConversationalAgentComponent

The **main component** — attach this to any Actor that should be able to speak.

**Category**: ElevenLabs
**Inherits from**: `UActorComponent`

#### Properties

| Property | Type | Default | Description |
|---|---|---|---|
| `AgentID` | `FString` | `""` | Agent ID for this actor. Overrides the project-level default when non-empty. |
| `TurnMode` | `EElevenLabsTurnMode` | `Server` | How speaker turns are detected. See [Section 8](#8-turn-modes). |
| `bAutoStartListening` | `bool` | `true` | If true, starts mic capture automatically once the WebSocket is ready. |

#### Functions

| Function | Blueprint | Description |
|---|---|---|
| `StartConversation()` | Callable | Opens the WebSocket connection. If `bAutoStartListening` is true, mic capture starts once connected. |
| `EndConversation()` | Callable | Closes the WebSocket, stops mic, stops audio playback. |
| `StartListening()` | Callable | Starts microphone capture. In Client mode, also sends `user_turn_start` to ElevenLabs. |
| `StopListening()` | Callable | Stops microphone capture. In Client mode, also sends `user_turn_end`. |
| `InterruptAgent()` | Callable | Stops the agent's current utterance immediately. |
| `IsConnected()` | Pure | Returns true if the WebSocket is open and the conversation is active. |
| `IsListening()` | Pure | Returns true if the microphone is currently capturing. |
| `IsAgentSpeaking()` | Pure | Returns true if agent audio is currently playing. |
| `GetConversationInfo()` | Pure | Returns `FElevenLabsConversationInfo` (ConversationID, AgentID). |
| `GetWebSocketProxy()` | Pure | Returns the underlying `UElevenLabsWebSocketProxy` for advanced use. |

#### Events

| Event | Parameters | Fired when |
|---|---|---|
| `OnAgentConnected` | `FElevenLabsConversationInfo` | WebSocket handshake + agent initiation complete. |
| `OnAgentDisconnected` | `int32 StatusCode`, `FString Reason` | WebSocket closed (graceful or remote). |
| `OnAgentError` | `FString ErrorMessage` | Connection or protocol error. |
| `OnAgentTranscript` | `FElevenLabsTranscriptSegment` | Any transcript arrives (user or agent, tentative or final). |
| `OnAgentTextResponse` | `FString ResponseText` | Final text response from the agent (complements the audio). |
| `OnAgentStartedSpeaking` | — | First audio chunk received from the agent. |
| `OnAgentStoppedSpeaking` | — | Audio queue empty for ~0.5 s (agent done speaking). |
| `OnAgentInterrupted` | — | Agent speech was interrupted (by user or by `InterruptAgent()`). |

---

### UElevenLabsMicrophoneCaptureComponent

A lightweight microphone capture component. Managed automatically by `UElevenLabsConversationalAgentComponent` — you only need to use this directly for advanced scenarios (e.g. custom audio routing).

**Category**: ElevenLabs
**Inherits from**: `UActorComponent`

#### Properties

| Property | Type | Default | Description |
|---|---|---|---|
| `VolumeMultiplier` | `float` | `1.0` | Gain applied to captured samples. Range: 0.0 – 4.0. |

#### Functions

| Function | Blueprint | Description |
|---|---|---|
| `StartCapture()` | Callable | Opens the default audio input device and starts streaming. |
| `StopCapture()` | Callable | Stops streaming and closes the device. |
| `IsCapturing()` | Pure | True while actively capturing. |

#### Delegate

`OnAudioCaptured` — fires on the game thread with `TArray<float>` PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually.

---

### UElevenLabsWebSocketProxy

Low-level WebSocket session manager. Used internally by `UElevenLabsConversationalAgentComponent`. Use this directly only if you need fine-grained protocol control.

**Inherits from**: `UObject`
**Instantiate via**: `NewObject<UElevenLabsWebSocketProxy>(Outer)`

#### Key functions

| Function | Description |
|---|---|
| `Connect(AgentID, APIKey)` | Open the WS connection. Parameters override project settings when non-empty. |
| `Disconnect()` | Send close frame and tear down the connection. |
| `SendAudioChunk(PCMData)` | Send raw int16 LE PCM bytes. Called automatically by the agent component. |
| `SendUserTurnStart()` | Signal start of user speech (Client turn mode only). |
| `SendUserTurnEnd()` | Signal end of user speech (Client turn mode only). |
| `SendInterrupt()` | Ask the agent to stop speaking. |
| `GetConnectionState()` | Returns `EElevenLabsConnectionState`. |
| `GetConversationInfo()` | Returns `FElevenLabsConversationInfo`. |

---

## 7. Data Types Reference

### EElevenLabsConnectionState

```
Disconnected  — No active connection
Connecting    — WebSocket handshake in progress
Connected     — Conversation active and ready
Error         — Connection or protocol failure
```

### EElevenLabsTurnMode

```
Server  — ElevenLabs Voice Activity Detection decides when the user stops speaking (recommended)
Client  — Your code calls StartListening/StopListening to define turns (push-to-talk)
```

### FElevenLabsConversationInfo

```
ConversationID  FString  — Unique session ID assigned by ElevenLabs
AgentID         FString  — The agent that responded
```

### FElevenLabsTranscriptSegment

```
Text      FString  — Transcribed text
Speaker   FString  — "user" or "agent"
bIsFinal  bool     — false while still speaking, true when the turn is complete
```

---

## 8. Turn Modes

### Server VAD (default)

ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking.

**When to use**: Casual conversation, hands-free interaction.

```
StartConversation()  →  mic streams continuously
                        ElevenLabs detects speech / silence automatically
                        Agent replies when it detects end-of-speech
```

### Client Controlled (push-to-talk)

Your code explicitly signals turn boundaries with `StartListening()` / `StopListening()`.

**When to use**: Noisy environments, precise control, walkie-talkie style.

```
Input Pressed   →  StartListening()   →  sends user_turn_start + begins audio
Input Released  →  StopListening()    →  stops audio + sends user_turn_end
                                         Agent replies after user_turn_end
```

---

## 9. Security — Signed URL Mode

By default, the API key is stored in Project Settings (Engine.ini). This is fine for development but **should not be shipped in packaged builds** as the key could be extracted.

### Production setup

1. Enable **Signed URL Mode** in Project Settings.
2. Set **Signed URL Endpoint** to a URL on your own backend (e.g. `https://your-server.com/api/elevenlabs-token`).
3. Your backend authenticates the player and calls the ElevenLabs API to generate a signed WebSocket URL, returning:
   ```json
   { "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..." }
   ```
4. The plugin fetches this URL before connecting — the API key never leaves your server.

---

## 10. Audio Pipeline

### Input (player → agent)

```
Device (any sample rate, any channels)
  ↓  FAudioCapture (UE built-in)
  ↓  Callback: float32 interleaved frames
  ↓  Downmix to mono (average channels)
  ↓  Resample to 16000 Hz (linear interpolation)
  ↓  Apply VolumeMultiplier
  ↓  Dispatch to Game Thread
  ↓  Convert float32 → int16 LE bytes
  ↓  Base64 encode
  ↓  WebSocket JSON frame: { "user_audio_chunk": "<base64>" }
```

### Output (agent → player)

```
WebSocket JSON frame: { "type": "audio", "audio_event": { "audio_base_64": "..." } }
  ↓  Base64 decode → int16 LE PCM bytes
  ↓  Enqueue in thread-safe AudioQueue
  ↓  USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue
  ↓  UAudioComponent plays from the Actor's world position (3D spatialized)
```

**Audio format** (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian.

---

## 11. Common Patterns

### Show subtitles in UI

```
OnAgentTranscript event:
  ├─ Segment → Speaker == "user"   → show in player subtitle widget
  ├─ Segment → Speaker == "agent"  → show in NPC speech bubble
  └─ Segment → bIsFinal == false   → show as "..." (in-progress)
```

### Interrupt the agent when the player starts speaking

In Server VAD mode ElevenLabs handles this automatically. For manual control:

```
OnAgentStartedSpeaking  →  store "agent is speaking" flag
Input Action (any)      →  if agent is speaking → InterruptAgent()
```

### Multiple NPCs with different agents

Each NPC Blueprint has its own `UElevenLabsConversationalAgentComponent`. Set a different `AgentID` on each component. Connections are fully independent.

### Only start the conversation when the player is nearby

```
On Begin Overlap (trigger volume around NPC)
  └─► [ElevenLabs Agent] Start Conversation

On End Overlap
  └─► [ElevenLabs Agent] End Conversation
```

### Adjusting microphone volume

Get the `UElevenLabsMicrophoneCaptureComponent` from the owner and set `VolumeMultiplier`:

```cpp
UElevenLabsMicrophoneCaptureComponent* Mic =
    GetOwner()->FindComponentByClass<UElevenLabsMicrophoneCaptureComponent>();
if (Mic) Mic->VolumeMultiplier = 2.0f;
```

---

## 12. Troubleshooting

### Plugin doesn't appear in Project Settings

Ensure the plugin is enabled in `.uproject` and the project was recompiled after adding it.

### WebSocket connection fails immediately

- Check the **API Key** is set correctly in Project Settings.
- Check the **Agent ID** exists in your ElevenLabs account.
- Enable **Verbose Logging** in Project Settings and check the Output Log for the exact WebSocket URL and error.
- Make sure your machine has internet access and port 443 (WSS) is not blocked.

### No audio from the microphone

- Windows may require microphone permission. Check **Settings → Privacy → Microphone**.
- Try setting `VolumeMultiplier` to `2.0` to rule out a volume issue.
- Check the Output Log for `"Failed to open default audio capture stream"`.

### Agent audio is choppy or silent

- The `USoundWaveProcedural` queue may be underflowing. This can happen if audio chunks arrive with long gaps. Check network latency.
- Ensure no other component is consuming the same `UAudioComponent`.

### `OnAgentStoppedSpeaking` fires too early

The silence detection threshold is 30 ticks (~0.5 s at 60 fps). If the agent has natural pauses in speech, increase `SilenceThresholdTicks` in `ElevenLabsConversationalAgentComponent.h`:

```cpp
static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s
```

### Build error: "Plugin AudioCapture not found"

Make sure the `AudioCapture` plugin is enabled in your project. It should be auto-enabled via the `.uplugin` dependency, but you can also add it manually to `.uproject`:

```json
{ "Name": "AudioCapture", "Enabled": true }
```

---

*Documentation generated 2026-02-19 — Plugin v1.0.0 — UE 5.5*