Compare commits

..

No commits in common. "302337b5736adf48e3f5437283b14eaf492c60fe" and "61710c9fde6c464c0193d94452fdb4f004f3bc06" have entirely different histories.

23 changed files with 1 additions and 3314 deletions

View File

@ -1,75 +0,0 @@
# Project Memory PS_AI_Agent
> This file is committed to the repository so it is available on any machine.
> Claude Code reads it automatically at session start (via the auto-memory system)
> when the working directory is inside this repo.
> **Keep it under ~180 lines** lines beyond 200 are truncated by the system.
---
## Project Location
- Repo root: `<repo_root>/` (wherever this is cloned)
- UE5 project: `<repo_root>/Unreal/PS_AI_Agent/`
- `.uproject`: `<repo_root>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject`
- Engine: **Unreal Engine 5.5** — Win64 primary target
- Default test map: `/Game/TestMap.TestMap`
## Plugins
| Plugin | Path | Purpose |
|--------|------|---------|
| Convai (reference) | `<repo_root>/ConvAI/Convai/` | gRPC + protobuf streaming to Convai API. Has ElevenLabs voice type enum in `ConvaiDefinitions.h`. Used as architectural reference. |
| **PS_AI_Agent_ElevenLabs** | `<repo_root>/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/` | Our ElevenLabs Conversational AI integration. See `.claude/elevenlabs_plugin.md` for full details. |
## User Preferences
- Plugin naming: `PS_AI_Agent_<Service>` (e.g. `PS_AI_Agent_ElevenLabs`)
- Save memory frequently during long sessions
- Goal: ElevenLabs Conversational AI integration — simpler than Convai, no gRPC
- Full original ask + intent: see `.claude/project_context.md`
- Git remote is a **private server** — no public exposure risk
## Key UE5 Plugin Patterns
- Settings object: `UCLASS(config=Engine, defaultconfig)` inheriting `UObject`, registered via `ISettingsModule`
- Module startup: `NewObject<USettings>(..., RF_Standalone)` + `AddToRoot()`
- WebSocket: `FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), Headers)`
- `WebSockets` is a **module** (Build.cs only) — NOT a plugin, don't put it in `.uplugin`
- Audio capture: `Audio::FAudioCapture::OpenAudioCaptureStream()` (UE 5.3+, replaces deprecated `OpenCaptureStream`)
- `AudioCapture` IS a plugin — declare it in `.uplugin` Plugins array
- Callback type: `FOnAudioCaptureFunction` = `TFunction<void(const void*, int32, int32, int32, double, bool)>`
- Cast `const void*` to `const float*` inside — device sends float32 interleaved
- Procedural audio playback: `USoundWaveProcedural` + `OnSoundWaveProceduralUnderflow` delegate
- Audio capture callbacks arrive on a **background thread** — always marshal to game thread with `AsyncTask(ENamedThreads::GameThread, ...)`
- Resample mic audio to **16000 Hz mono** before sending to ElevenLabs
- `TArray::RemoveAt(idx, count, EAllowShrinking::No)` — bool overload deprecated in UE 5.5
## Plugin Status
- **PS_AI_Agent_ElevenLabs**: compiles cleanly on UE 5.5 Win64 (verified 2026-02-19)
- v1.1.0 — all 3 protocol bugs fixed (transcript fields, pong format, client turn mode)
- Binary WS frame handling implemented (ElevenLabs sends ALL frames as binary, not text)
- First-byte discrimination: `{` = JSON control message, else = raw PCM audio
- `SendTextMessage()` added to both WebSocketProxy and ConversationalAgentComponent
- Connection confirmed working end-to-end; audio receive path functional
## ElevenLabs WebSocket Protocol Notes
- **ALL frames are binary**`OnRawMessage` handles everything; `OnMessage` (text) never fires
- Binary frame discrimination: peek byte[0] → `'{'` (0x7B) = JSON, else = raw PCM audio
- Fragment reassembly: accumulate into `BinaryFrameBuffer` until `BytesRemaining == 0`
- Pong: `{"type":"pong","event_id":N}``event_id` is **top-level**, NOT nested
- Transcript: type=`user_transcript`, key=`user_transcription_event`, field=`user_transcript`
- Client turn mode: `{"type":"user_activity"}` to signal speaking; no explicit end message
- Text input: `{"type":"user_message","text":"..."}` — agent replies with audio + text
## API Keys / Secrets
- ElevenLabs API key is set in **Project Settings → Plugins → ElevenLabs AI Agent** in the Editor
- UE saves it to `DefaultEngine.ini` under `[/Script/PS_AI_Agent_ElevenLabs.ElevenLabsSettings]`
- **The key is stripped from `DefaultEngine.ini` before every commit** — do not commit it
- Each developer sets the key locally; it does not go in git
## Claude Memory Files in This Repo
| File | Contents |
|------|----------|
| `.claude/MEMORY.md` | This file — project structure, patterns, status |
| `.claude/elevenlabs_plugin.md` | Plugin file map, ElevenLabs WS protocol, design decisions |
| `.claude/elevenlabs_api_reference.md` | Full ElevenLabs API reference (WS messages, REST, signed URL, Agent ID location) |
| `.claude/project_context.md` | Original ask, intent, short/long-term goals |
| `.claude/session_log_2026-02-19.md` | Full session record: steps, commits, technical decisions, next steps |
| `.claude/PS_AI_Agent_ElevenLabs_Documentation.md` | User-facing Markdown reference doc |

View File

@ -1,619 +0,0 @@
# PS_AI_Agent_ElevenLabs — Plugin Documentation
**Engine**: Unreal Engine 5.5
**Plugin version**: 1.1.0
**Status**: Beta — tested on UE 5.5 Win64, verified connection and audio pipeline
**API**: [ElevenLabs Conversational AI](https://elevenlabs.io/docs/eleven-agents/quickstart)
---
## Table of Contents
1. [Overview](#1-overview)
2. [Installation](#2-installation)
3. [Project Settings](#3-project-settings)
4. [Quick Start (Blueprint)](#4-quick-start-blueprint)
5. [Quick Start (C++)](#5-quick-start-c)
6. [Components Reference](#6-components-reference)
- [UElevenLabsConversationalAgentComponent](#uelevenlabsconversationalagentcomponent)
- [UElevenLabsMicrophoneCaptureComponent](#uelevenlabsmicrophonecapturecomponent)
- [UElevenLabsWebSocketProxy](#uelevenlabswebsocketproxy)
7. [Data Types Reference](#7-data-types-reference)
8. [Turn Modes](#8-turn-modes)
9. [Security — Signed URL Mode](#9-security--signed-url-mode)
10. [Audio Pipeline](#10-audio-pipeline)
11. [Common Patterns](#11-common-patterns)
12. [Troubleshooting](#12-troubleshooting)
13. [Changelog](#13-changelog)
---
## 1. Overview
This plugin integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5, enabling real-time voice conversations between a player and an NPC (or any Actor).
### How it works
```
Player microphone
UElevenLabsMicrophoneCaptureComponent
• Captures from default audio device
• Resamples to 16 kHz mono float32
UElevenLabsConversationalAgentComponent
• Converts float32 → int16 PCM bytes
• Base64-encodes and sends via WebSocket
│ (wss://api.elevenlabs.io/v1/convai/conversation)
ElevenLabs Conversational AI Agent
• Transcribes speech
• Runs LLM
• Synthesizes voice (ElevenLabs TTS)
UElevenLabsConversationalAgentComponent
• Receives raw binary PCM audio frames
• Feeds USoundWaveProcedural → UAudioComponent
Agent voice plays from the Actor's position in the world
```
### Key properties
- No gRPC, no third-party libraries — uses UE's built-in `WebSockets` and `AudioCapture` modules
- Blueprint-first: all events and controls are exposed to Blueprint
- Real-time bidirectional: audio streams in both directions simultaneously
- Server VAD (default) or push-to-talk
- Text input supported (no microphone needed for testing)
### Wire frame protocol notes
ElevenLabs sends **all WebSocket frames as binary** (not text frames). The plugin handles two binary frame types automatically:
- **JSON control frames** (start with `{`) — conversation init, transcripts, agent responses, ping/pong
- **Raw PCM audio frames** (binary) — agent speech audio, played directly via `USoundWaveProcedural`
---
## 2. Installation
The plugin lives inside the project, not the engine, so no separate install is needed.
### Verify it is enabled
Open `Unreal/PS_AI_Agent/PS_AI_Agent.uproject` and confirm:
```json
{
"Name": "PS_AI_Agent_ElevenLabs",
"Enabled": true
}
```
### First compile
Open the project in the UE 5.5 Editor. It will detect the new plugin and ask to recompile — click **Yes**. Alternatively, compile from the command line:
```
"C:\Program Files\Epic Games\UE_5.5\Engine\Build\BatchFiles\Build.bat"
PS_AI_AgentEditor Win64 Development
"<repo>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject"
-WaitMutex
```
---
## 3. Project Settings
Go to **Edit → Project Settings → Plugins → ElevenLabs AI Agent**.
| Setting | Description | Required |
|---|---|---|
| **API Key** | Your ElevenLabs API key. Find it at [elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys) | Yes (unless using Signed URL Mode or a public agent) |
| **Agent ID** | Default agent ID. Find it in the URL when editing an agent: `elevenlabs.io/app/conversational-ai/agents/<AGENT_ID>` | Yes (unless set per-component) |
| **Signed URL Mode** | Fetch the WS URL from your own backend (keeps key off client). See [Section 9](#9-security--signed-url-mode) | No |
| **Signed URL Endpoint** | Your backend URL returning `{ "signed_url": "wss://..." }` | Only if Signed URL Mode = true |
| **Custom WebSocket URL** | Override the default `wss://api.elevenlabs.io/...` endpoint (debug only) | No |
| **Verbose Logging** | Log every WebSocket frame type and first bytes to Output Log | No |
> **Security note**: The API key set in Project Settings is saved to `DefaultEngine.ini`. **Never commit this file with the key in it** — strip the `[ElevenLabsSettings]` section before committing. Use Signed URL Mode for production builds.
> **Finding your Agent ID**: Go to [elevenlabs.io/app/conversational-ai](https://elevenlabs.io/app/conversational-ai), click your agent, and copy the ID from the URL bar or the agent's Overview/API tab.
---
## 4. Quick Start (Blueprint)
### Step 1 — Add the component to an NPC
1. Open your NPC Blueprint (or any Actor Blueprint).
2. In the **Components** panel, click **Add** → search for **ElevenLabs Conversational Agent**.
3. Select the component. In the **Details** panel you can optionally set a specific **Agent ID** (overrides the project default).
### Step 2 — Set Turn Mode
In the component's **Details** panel:
- **Server VAD** (default): ElevenLabs automatically detects when the player stops speaking. Microphone streams continuously once connected.
- **Client Controlled**: You call `Start Listening` / `Stop Listening` manually (push-to-talk).
### Step 3 — Wire up events in the Event Graph
```
Event BeginPlay
└─► [ElevenLabs Agent] Start Conversation
[ElevenLabs Agent] On Agent Connected
└─► Print String "Connected! ConvID: " + Conversation Info → Conversation ID
[ElevenLabs Agent] On Agent Text Response
└─► Set Text (UI widget) ← Response Text
[ElevenLabs Agent] On Agent Transcript
└─► (optional) display live subtitles ← Segment → Text
[ElevenLabs Agent] On Agent Started Speaking
└─► Play talking animation on NPC
[ElevenLabs Agent] On Agent Stopped Speaking
└─► Return to idle animation
[ElevenLabs Agent] On Agent Error
└─► Print String "Error: " + Error Message
Event EndPlay
└─► [ElevenLabs Agent] End Conversation
```
### Step 4 — Push-to-talk (Client Controlled mode only)
```
Input Action "Talk" (Pressed)
└─► [ElevenLabs Agent] Start Listening
Input Action "Talk" (Released)
└─► [ElevenLabs Agent] Stop Listening
```
### Step 5 — Testing without a microphone
Once connected, use **Send Text Message** instead of speaking:
```
[ElevenLabs Agent] On Agent Connected
└─► [ElevenLabs Agent] Send Text Message ← "Hello, who are you?"
```
The agent will reply with audio and text exactly as if it heard you speak.
---
## 5. Quick Start (C++)
### 1. Add the plugin to your module's Build.cs
```csharp
PrivateDependencyModuleNames.Add("PS_AI_Agent_ElevenLabs");
```
### 2. Include and use
```cpp
#include "ElevenLabsConversationalAgentComponent.h"
#include "ElevenLabsDefinitions.h"
// In your Actor's header:
UPROPERTY(VisibleAnywhere)
UElevenLabsConversationalAgentComponent* ElevenLabsAgent;
// In the constructor:
ElevenLabsAgent = CreateDefaultSubobject<UElevenLabsConversationalAgentComponent>(
TEXT("ElevenLabsAgent"));
// Override Agent ID at runtime (optional):
ElevenLabsAgent->AgentID = TEXT("your_agent_id_here");
ElevenLabsAgent->TurnMode = EElevenLabsTurnMode::Server;
ElevenLabsAgent->bAutoStartListening = true;
// Bind events:
ElevenLabsAgent->OnAgentConnected.AddDynamic(
this, &AMyNPC::HandleAgentConnected);
ElevenLabsAgent->OnAgentTextResponse.AddDynamic(
this, &AMyNPC::HandleAgentResponse);
ElevenLabsAgent->OnAgentStartedSpeaking.AddDynamic(
this, &AMyNPC::PlayTalkingAnimation);
// Start the conversation:
ElevenLabsAgent->StartConversation();
// Send a text message (useful for testing without mic):
ElevenLabsAgent->SendTextMessage(TEXT("Hello, who are you?"));
// Later, to end:
ElevenLabsAgent->EndConversation();
```
### 3. Callback signatures
```cpp
UFUNCTION()
void HandleAgentConnected(const FElevenLabsConversationInfo& Info)
{
UE_LOG(LogTemp, Log, TEXT("Connected, ConvID=%s"), *Info.ConversationID);
}
UFUNCTION()
void HandleAgentResponse(const FString& ResponseText)
{
// Display in UI, drive subtitles, etc.
}
UFUNCTION()
void PlayTalkingAnimation()
{
// Switch to talking anim montage
}
```
---
## 6. Components Reference
### UElevenLabsConversationalAgentComponent
The **main component** — attach this to any Actor that should be able to speak.
**Category**: ElevenLabs
**Inherits from**: `UActorComponent`
#### Properties
| Property | Type | Default | Description |
|---|---|---|---|
| `AgentID` | `FString` | `""` | Agent ID for this actor. Overrides the project-level default when non-empty. |
| `TurnMode` | `EElevenLabsTurnMode` | `Server` | How speaker turns are detected. See [Section 8](#8-turn-modes). |
| `bAutoStartListening` | `bool` | `true` | If true, starts mic capture automatically once the WebSocket is connected and ready. |
#### Functions
| Function | Blueprint | Description |
|---|---|---|
| `StartConversation()` | Callable | Opens the WebSocket connection. If `bAutoStartListening` is true, mic capture starts once `OnAgentConnected` fires. |
| `EndConversation()` | Callable | Closes the WebSocket, stops mic, stops audio playback. |
| `StartListening()` | Callable | Starts microphone capture and streams to ElevenLabs. In Client mode, also sends `user_activity`. |
| `StopListening()` | Callable | Stops microphone capture. In Client mode, stops sending `user_activity`. |
| `SendTextMessage(Text)` | Callable | Sends a text message to the agent without using the microphone. Agent replies with full audio + text. Useful for testing. |
| `InterruptAgent()` | Callable | Stops the agent's current utterance immediately and clears the audio queue. |
| `IsConnected()` | Pure | Returns true if the WebSocket is open and the conversation is active. |
| `IsListening()` | Pure | Returns true if the microphone is currently capturing. |
| `IsAgentSpeaking()` | Pure | Returns true if agent audio is currently playing. |
| `GetConversationInfo()` | Pure | Returns `FElevenLabsConversationInfo` (ConversationID, AgentID). |
| `GetWebSocketProxy()` | Pure | Returns the underlying `UElevenLabsWebSocketProxy` for advanced use. |
#### Events
| Event | Parameters | Fired when |
|---|---|---|
| `OnAgentConnected` | `FElevenLabsConversationInfo` | WebSocket handshake + agent initiation metadata received. Safe to call `SendTextMessage` here. |
| `OnAgentDisconnected` | `int32 StatusCode`, `FString Reason` | WebSocket closed (graceful or remote). |
| `OnAgentError` | `FString ErrorMessage` | Connection or protocol error. |
| `OnAgentTranscript` | `FElevenLabsTranscriptSegment` | User speech-to-text transcript received (speaker is always `"user"`). |
| `OnAgentTextResponse` | `FString ResponseText` | Final text response from the agent (mirrors the audio). |
| `OnAgentStartedSpeaking` | — | First audio chunk received from the agent (audio playback begins). |
| `OnAgentStoppedSpeaking` | — | Audio queue empty for ~0.5 s (heuristic — agent done speaking). |
| `OnAgentInterrupted` | — | Agent speech was interrupted (by user or by `InterruptAgent()`). |
---
### UElevenLabsMicrophoneCaptureComponent
A lightweight microphone capture component. Managed automatically by `UElevenLabsConversationalAgentComponent` — you only need to use this directly for advanced scenarios (e.g. custom audio routing).
**Category**: ElevenLabs
**Inherits from**: `UActorComponent`
#### Properties
| Property | Type | Default | Description |
|---|---|---|---|
| `VolumeMultiplier` | `float` | `1.0` | Gain applied to captured samples before resampling. Range: 0.0 4.0. |
#### Functions
| Function | Blueprint | Description |
|---|---|---|
| `StartCapture()` | Callable | Opens the default audio input device and begins streaming. |
| `StopCapture()` | Callable | Stops streaming and closes the device. |
| `IsCapturing()` | Pure | True while actively capturing. |
#### Delegate
`OnAudioCaptured` — fires on the **game thread** with `TArray<float>` PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually.
---
### UElevenLabsWebSocketProxy
Low-level WebSocket session manager. Used internally by `UElevenLabsConversationalAgentComponent`. Use this directly only if you need fine-grained protocol control.
**Inherits from**: `UObject`
**Instantiate via**: `NewObject<UElevenLabsWebSocketProxy>(Outer)`
#### Key functions
| Function | Description |
|---|---|
| `Connect(AgentID, APIKey)` | Open the WS connection. Parameters override project settings when non-empty. |
| `Disconnect()` | Send close frame and tear down the connection. |
| `SendAudioChunk(PCMData)` | Send raw int16 LE PCM bytes as a Base64 JSON frame. Called automatically by the agent component. |
| `SendTextMessage(Text)` | Send `{"type":"user_message","text":"..."}`. Agent replies as if it heard speech. |
| `SendUserTurnStart()` | Client turn mode: sends `{"type":"user_activity"}` to signal user is speaking. |
| `SendUserTurnEnd()` | Client turn mode: stops sending `user_activity` (no explicit message — server detects silence). |
| `SendInterrupt()` | Ask the agent to stop speaking: sends `{"type":"interrupt"}`. |
| `GetConnectionState()` | Returns `EElevenLabsConnectionState`. |
| `GetConversationInfo()` | Returns `FElevenLabsConversationInfo`. |
---
## 7. Data Types Reference
### EElevenLabsConnectionState
```
Disconnected — No active connection
Connecting — WebSocket handshake in progress / awaiting conversation_initiation_metadata
Connected — Conversation active and ready (fires OnAgentConnected)
Error — Connection or protocol failure
```
> Note: State remains `Connecting` until the server sends `conversation_initiation_metadata`. `OnAgentConnected` fires on transition to `Connected`.
### EElevenLabsTurnMode
```
Server — ElevenLabs Voice Activity Detection decides when the user stops speaking (recommended)
Client — Your code calls StartListening/StopListening to define turns (push-to-talk)
```
### FElevenLabsConversationInfo
```
ConversationID FString — Unique session ID assigned by ElevenLabs
AgentID FString — The agent ID for this session
```
### FElevenLabsTranscriptSegment
```
Text FString — Transcribed text
Speaker FString — "user" (agent text comes via OnAgentTextResponse, not transcript)
bIsFinal bool — Always true for user transcripts (ElevenLabs sends final only)
```
---
## 8. Turn Modes
### Server VAD (default)
ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking.
**When to use**: Casual conversation, hands-free interaction, natural dialogue.
```
StartConversation() → mic streams continuously (if bAutoStartListening = true)
ElevenLabs detects speech / silence automatically
Agent replies when it detects end-of-speech
```
### Client Controlled (push-to-talk)
Your code explicitly signals turn boundaries with `StartListening()` / `StopListening()`. The plugin sends `{"type":"user_activity"}` while the user is speaking; stopping it signals end of turn.
**When to use**: Noisy environments, precise control, walkie-talkie style UI.
```
Input Pressed → StartListening() → streams audio + sends user_activity
Input Released → StopListening() → stops audio (no explicit end message)
Server detects silence and hands turn to agent
```
---
## 9. Security — Signed URL Mode
By default, the API key is stored in Project Settings (`DefaultEngine.ini`). This is fine for development but **should not be shipped in packaged builds** as the key could be extracted.
### Production setup
1. Enable **Signed URL Mode** in Project Settings.
2. Set **Signed URL Endpoint** to a URL on your own backend (e.g. `https://your-server.com/api/elevenlabs-token`).
3. Your backend authenticates the player and calls the ElevenLabs API to generate a signed WebSocket URL, returning:
```json
{ "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..." }
```
4. The plugin fetches this URL before connecting — the API key never leaves your server.
### Development workflow (API key in project settings)
- Set the key in **Project Settings → Plugins → ElevenLabs AI Agent**
- UE saves it to `DefaultEngine.ini` under `[/Script/PS_AI_Agent_ElevenLabs.ElevenLabsSettings]`
- **Strip this section from `DefaultEngine.ini` before every git commit**
- Each developer sets the key locally — it does not go in version control
---
## 10. Audio Pipeline
### Input (player → agent)
```
Device (any sample rate, any channels)
↓ FAudioCapture — UE built-in (UE 5.3+ API: OpenAudioCaptureStream)
↓ Callback: const void* → cast to float32 interleaved frames
↓ Downmix to mono (average all channels)
↓ Resample to 16000 Hz (linear interpolation)
↓ Apply VolumeMultiplier
↓ Dispatch to Game Thread (AsyncTask)
↓ Convert float32 → int16 signed, little-endian bytes
↓ Base64 encode
↓ Send as binary WebSocket frame: { "user_audio_chunk": "<base64>" }
```
### Output (agent → player)
```
Binary WebSocket frame arrives
↓ Peek first byte:
• '{' → UTF-8 JSON: parse type field, dispatch to handler
• other → raw PCM audio bytes
↓ [Audio path] Raw int16 LE PCM bytes at 16000 Hz mono
↓ Enqueue in thread-safe AudioQueue (FCriticalSection)
↓ USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue
↓ UAudioComponent plays from the Actor's world position (3D spatialized)
```
**Audio format** (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian.
### Silence detection heuristic
`OnAgentStoppedSpeaking` fires when the `AudioQueue` has been empty for **30 consecutive ticks** (~0.5 s at 60 fps). If the agent has natural pauses, increase `SilenceThresholdTicks` in the header:
```cpp
static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s
```
---
## 11. Common Patterns
### Test the connection without a microphone
```
BeginPlay → StartConversation()
OnAgentConnected → SendTextMessage("Hello, introduce yourself")
OnAgentTextResponse → Print string (confirms text pipeline works)
OnAgentStartedSpeaking → (confirms audio pipeline works)
```
### Show subtitles in UI
```
OnAgentTranscript:
Segment → Text → show in player subtitle widget (speaker always "user")
OnAgentTextResponse:
ResponseText → show in NPC speech bubble
```
### Interrupt the agent when the player starts speaking
In Server VAD mode ElevenLabs handles this automatically. For manual control:
```
OnAgentStartedSpeaking → set "agent is speaking" flag
Input Action (any) → if agent is speaking → InterruptAgent()
```
### Multiple NPCs with different agents
Each NPC Blueprint has its own `UElevenLabsConversationalAgentComponent`. Set a different `AgentID` on each component. WebSocket connections are fully independent.
### Only start the conversation when the player is nearby
```
On Begin Overlap (trigger volume around NPC)
└─► [ElevenLabs Agent] Start Conversation
On End Overlap
└─► [ElevenLabs Agent] End Conversation
```
### Adjust microphone volume
Get the `UElevenLabsMicrophoneCaptureComponent` from the owner and set `VolumeMultiplier`:
```cpp
UElevenLabsMicrophoneCaptureComponent* Mic =
GetOwner()->FindComponentByClass<UElevenLabsMicrophoneCaptureComponent>();
if (Mic) Mic->VolumeMultiplier = 2.0f;
```
---
## 12. Troubleshooting
### Plugin doesn't appear in Project Settings
Ensure the plugin is enabled in `.uproject` and the project was recompiled after adding it.
### WebSocket connection fails immediately
- Check the **API Key** is set correctly in Project Settings.
- Check the **Agent ID** exists in your ElevenLabs account (find it in the dashboard URL or via `GET /v1/convai/agents`).
- Enable **Verbose Logging** in Project Settings and check Output Log for the exact WS URL and error.
- Ensure port 443 (WSS) is not blocked by your firewall.
### `OnAgentConnected` never fires
- Connection was made but `conversation_initiation_metadata` not received yet — check Verbose Logging.
- If you see `"Binary audio frame"` logs but no `"Conversation initiated"` — the initiation JSON frame may be arriving as a non-`{` binary frame. Check the hex prefix logged at Verbose level.
### No audio from the microphone
- Windows may require microphone permission. Check **Settings → Privacy → Microphone**.
- Try setting `VolumeMultiplier` to `2.0` on the `MicrophoneCaptureComponent`.
- Check Output Log for `"Failed to open default audio capture stream"`.
### Agent audio is choppy or silent
- The `USoundWaveProcedural` queue may be underflowing due to network jitter. Check latency.
- Verify the audio format matches: plugin expects raw PCM 16-bit 16 kHz mono from the server. If ElevenLabs sends a different format (e.g. mp3_44100), audio will sound garbled — check `agent_output_audio_format` in the `conversation_initiation_metadata` via Verbose Logging.
- Ensure no other component is using the same `UAudioComponent`.
### `OnAgentStoppedSpeaking` fires too early
Increase `SilenceThresholdTicks` in `ElevenLabsConversationalAgentComponent.h`:
```cpp
static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s at 60fps
```
### Build error: "Plugin AudioCapture not found"
Make sure the `AudioCapture` plugin is enabled. It should be auto-enabled via the `.uplugin` dependency, but you can add it manually to `.uproject`:
```json
{ "Name": "AudioCapture", "Enabled": true }
```
### `"Received unexpected binary WebSocket frame"` in the log
This warning no longer appears in v1.1.0+. If you see it, you are running an older build — recompile the plugin.
---
## 13. Changelog
### v1.1.0 — 2026-02-19
**Bug fixes:**
- **Binary WebSocket frames**: ElevenLabs sends all frames as binary (not text). All frames were previously discarded. Now correctly handled — JSON control frames decoded as UTF-8, raw PCM audio frames routed directly to the audio queue.
- **Transcript message**: Wrong message type (`"transcript"``"user_transcript"`), wrong event key (`"transcript_event"``"user_transcription_event"`), wrong text field (`"message"``"user_transcript"`).
- **Pong format**: `event_id` was nested inside a `pong_event` object; corrected to top-level field per API spec.
- **Client turn mode**: `user_turn_start`/`user_turn_end` are not valid API messages; replaced with `user_activity` (start) and implicit silence (end).
**New features:**
- `SendTextMessage(Text)` on both `UElevenLabsConversationalAgentComponent` and `UElevenLabsWebSocketProxy` — send text to the agent without a microphone. Useful for testing.
- Verbose logging shows binary frame hex preview and JSON frame content prefix.
- Improved JSON parse error log now shows the first 80 characters of the failing message.
### v1.0.0 — 2026-02-19
Initial implementation. Plugin compiles cleanly on UE 5.5 Win64.
---
*Documentation updated 2026-02-19 — Plugin v1.1.0 — UE 5.5*

View File

@ -1,463 +0,0 @@
# ElevenLabs Conversational AI API Reference
> Saved for Claude Code sessions. Auto-loaded via `.claude/` directory.
> Last updated: 2026-02-19
---
## 1. Agent ID — Where to Find It
### In the Dashboard (UI)
1. Go to **https://elevenlabs.io/app/conversational-ai**
2. Click on your agent to open it
3. The **Agent ID** is shown in the agent settings page — typically in the URL bar and/or in the agent's "General" settings tab
- URL pattern: `https://elevenlabs.io/app/conversational-ai/agents/<AGENT_ID>`
- Also visible in the "API" or "Overview" tab of the agent editor (copy button available)
### Via API
```http
GET https://api.elevenlabs.io/v1/convai/agents
xi-api-key: YOUR_API_KEY
```
Returns a list of all agents with their `agent_id` strings.
### Via API (single agent)
```http
GET https://api.elevenlabs.io/v1/convai/agents/{agent_id}
xi-api-key: YOUR_API_KEY
```
### Agent ID Format
- Type: `string`
- Returned on agent creation via `POST /v1/convai/agents/create`
- Used as URL path param and WebSocket query param throughout the API
---
## 2. WebSocket Conversational AI
### Connection URL
```
wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<AGENT_ID>
```
Regional alternatives:
| Region | URL |
|--------|-----|
| Default (Global) | `wss://api.elevenlabs.io/` |
| US | `wss://api.us.elevenlabs.io/` |
| EU | `wss://api.eu.residency.elevenlabs.io/` |
| India | `wss://api.in.residency.elevenlabs.io/` |
### Authentication
- **Public agents**: No key required, just `agent_id` query param
- **Private agents**: Use a **Signed URL** (see Section 4) instead of direct `agent_id`
- **Server-side** (backend): Pass `xi-api-key` as an HTTP upgrade header
```
Headers:
xi-api-key: YOUR_API_KEY
```
> ⚠️ Never expose your API key client-side. For browser/mobile apps, use Signed URLs.
---
## 3. WebSocket Protocol — Message Reference
### Audio Format
- **Input (mic → server)**: PCM 16-bit signed, **16000 Hz**, mono, little-endian, Base64-encoded
- **Output (server → client)**: Base64-encoded audio (format specified in `conversation_initiation_metadata`)
---
### Messages FROM Server (Subscribe / Receive)
#### `conversation_initiation_metadata`
Sent immediately after connection. Contains conversation ID and audio format specs.
```json
{
"type": "conversation_initiation_metadata",
"conversation_initiation_metadata_event": {
"conversation_id": "string",
"agent_output_audio_format": "pcm_16000 | mp3_44100 | ...",
"user_input_audio_format": "pcm_16000"
}
}
```
#### `audio`
Agent speech audio chunk.
```json
{
"type": "audio",
"audio_event": {
"audio_base_64": "BASE64_PCM_BYTES",
"event_id": 42
}
}
```
#### `user_transcript`
Transcribed text of what the user said.
```json
{
"type": "user_transcript",
"user_transcription_event": {
"user_transcript": "Hello, how are you?"
}
}
```
#### `agent_response`
The text the agent is saying (arrives in parallel with audio).
```json
{
"type": "agent_response",
"agent_response_event": {
"agent_response": "I'm doing great, thanks!"
}
}
```
#### `agent_response_correction`
Sent after an interruption — shows what was truncated.
```json
{
"type": "agent_response_correction",
"agent_response_correction_event": {
"original_agent_response": "string",
"corrected_agent_response": "string"
}
}
```
#### `interruption`
Signals that a specific audio event was interrupted.
```json
{
"type": "interruption",
"interruption_event": {
"event_id": 42
}
}
```
#### `ping`
Keepalive ping from server. Client must reply with `pong`.
```json
{
"type": "ping",
"ping_event": {
"event_id": 1,
"ping_ms": 150
}
}
```
#### `client_tool_call`
Requests the client execute a tool (custom tools integration).
```json
{
"type": "client_tool_call",
"client_tool_call": {
"tool_name": "string",
"tool_call_id": "string",
"parameters": {}
}
}
```
#### `contextual_update`
Text context added to conversation state (non-interrupting).
```json
{
"type": "contextual_update",
"contextual_update_event": {
"text": "string"
}
}
```
#### `vad_score`
Voice Activity Detection confidence score (0.01.0).
```json
{
"type": "vad_score",
"vad_score_event": {
"vad_score": 0.85
}
}
```
#### `internal_tentative_agent_response`
Preliminary agent text during LLM generation (not final).
```json
{
"type": "internal_tentative_agent_response",
"tentative_agent_response_internal_event": {
"tentative_agent_response": "string"
}
}
```
---
### Messages TO Server (Publish / Send)
#### `user_audio_chunk`
Microphone audio data. Send continuously during user speech.
```json
{
"user_audio_chunk": "BASE64_PCM_16BIT_16KHZ_MONO"
}
```
Audio must be: **PCM 16-bit signed, 16000 Hz, mono, little-endian**, then Base64-encoded.
#### `pong`
Reply to server `ping` to keep connection alive.
```json
{
"type": "pong",
"event_id": 1
}
```
#### `conversation_initiation_client_data`
Override agent configuration at connection time. Send before or just after connecting.
```json
{
"type": "conversation_initiation_client_data",
"conversation_config_override": {
"agent": {
"prompt": { "prompt": "Custom system prompt override" },
"first_message": "Hello! How can I help?",
"language": "en"
},
"tts": {
"voice_id": "string",
"speed": 1.0,
"stability": 0.5,
"similarity_boost": 0.75
}
},
"dynamic_variables": {
"user_name": "Alice",
"session_id": 12345
}
}
```
Config override ranges:
- `tts.speed`: 0.7 1.2
- `tts.stability`: 0.0 1.0
- `tts.similarity_boost`: 0.0 1.0
#### `client_tool_result`
Response to a `client_tool_call` from the server.
```json
{
"type": "client_tool_result",
"tool_call_id": "string",
"result": "tool output string",
"is_error": false
}
```
#### `contextual_update`
Inject context without interrupting the conversation.
```json
{
"type": "contextual_update",
"text": "User just entered room 4B"
}
```
#### `user_message`
Send a text message (no mic audio needed).
```json
{
"type": "user_message",
"text": "What is the weather like?"
}
```
#### `user_activity`
Signal that user is active (for turn detection in client mode).
```json
{
"type": "user_activity"
}
```
---
## 4. Signed URL (Private Agents)
Used for browser/mobile clients to authenticate without exposing the API key.
### Flow
1. **Backend** calls ElevenLabs API to get a temporary signed URL
2. Backend returns signed URL to client
3. **Client** opens WebSocket to the signed URL (no API key needed)
### Get Signed URL
```http
GET https://api.elevenlabs.io/v1/convai/conversation/get-signed-url?agent_id=<AGENT_ID>
xi-api-key: YOUR_API_KEY
```
Optional query params:
- `include_conversation_id=true` — generates unique conversation ID, prevents URL reuse
- `branch_id` — specific agent branch
Response:
```json
{
"signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=...&token=..."
}
```
Client connects to `signed_url` directly — no headers needed.
---
## 5. Agents REST API
Base URL: `https://api.elevenlabs.io`
Auth header: `xi-api-key: YOUR_API_KEY`
### Create Agent
```http
POST /v1/convai/agents/create
Content-Type: application/json
{
"name": "My NPC Agent",
"conversation_config": {
"agent": {
"first_message": "Hello adventurer!",
"prompt": { "prompt": "You are a wise tavern keeper in a fantasy world." },
"language": "en"
}
}
}
```
Response includes `agent_id`.
### List Agents
```http
GET /v1/convai/agents?page_size=30&search=&sort_by=created_at&sort_direction=desc
```
Response:
```json
{
"agents": [
{
"agent_id": "abc123xyz",
"name": "My NPC Agent",
"created_at_unix_secs": 1708300000,
"last_call_time_unix_secs": null,
"archived": false,
"tags": []
}
],
"has_more": false,
"next_cursor": null
}
```
### Get Agent
```http
GET /v1/convai/agents/{agent_id}
```
### Update Agent
```http
PATCH /v1/convai/agents/{agent_id}
Content-Type: application/json
{ "name": "Updated Name", "conversation_config": { ... } }
```
### Delete Agent
```http
DELETE /v1/convai/agents/{agent_id}
```
---
## 6. Turn Modes
### Server VAD (Default / Recommended)
- ElevenLabs server detects when user stops speaking
- Client streams audio continuously
- Server handles all turn-taking automatically
### Client Turn Mode
- Client explicitly signals turn boundaries
- Send `user_activity` to indicate user is speaking
- Use when you have your own VAD or push-to-talk UI
---
## 7. Audio Pipeline (UE5 Implementation Notes)
```
Microphone (FAudioCapture)
→ float32 samples at device rate (e.g. 44100 Hz stereo)
→ Resample to 16000 Hz mono
→ Convert float32 → int16 little-endian
→ Base64-encode
→ Send as {"user_audio_chunk": "BASE64"}
Server → {"type":"audio","audio_event":{"audio_base_64":"BASE64"}}
→ Base64-decode
→ Raw PCM bytes
→ Push to USoundWaveProcedural
→ UAudioComponent plays back
```
### Float32 → Int16 Conversion (C++)
```cpp
static TArray<uint8> FloatPCMToInt16Bytes(const TArray<float>& FloatSamples)
{
TArray<uint8> Bytes;
Bytes.SetNumUninitialized(FloatSamples.Num() * 2);
for (int32 i = 0; i < FloatSamples.Num(); i++)
{
float Clamped = FMath::Clamp(FloatSamples[i], -1.f, 1.f);
int16 Sample = (int16)(Clamped * 32767.f);
Bytes[i * 2] = (uint8)(Sample & 0xFF); // Low byte
Bytes[i * 2 + 1] = (uint8)((Sample >> 8) & 0xFF); // High byte
}
return Bytes;
}
```
---
## 8. Quick Integration Checklist (UE5 Plugin)
- [ ] Set `AgentID` in `UElevenLabsSettings` (Project Settings → ElevenLabs AI Agent)
- Or override per-component via `UElevenLabsConversationalAgentComponent::AgentID`
- [ ] Set `API_Key` in settings (or leave empty for public agents)
- [ ] Add `UElevenLabsConversationalAgentComponent` to your NPC actor
- [ ] Set `TurnMode` (default: `Server` — recommended)
- [ ] Bind to events: `OnAgentConnected`, `OnAgentTranscript`, `OnAgentTextResponse`, `OnAgentStartedSpeaking`, `OnAgentStoppedSpeaking`
- [ ] Call `StartConversation()` to begin
- [ ] Call `EndConversation()` when done
---
## 9. Key API URLs Reference
| Purpose | URL |
|---------|-----|
| Dashboard | https://elevenlabs.io/app/conversational-ai |
| API Keys | https://elevenlabs.io/app/settings/api-keys |
| WebSocket endpoint | wss://api.elevenlabs.io/v1/convai/conversation |
| Agents list | GET https://api.elevenlabs.io/v1/convai/agents |
| Agent by ID | GET https://api.elevenlabs.io/v1/convai/agents/{agent_id} |
| Create agent | POST https://api.elevenlabs.io/v1/convai/agents/create |
| Signed URL | GET https://api.elevenlabs.io/v1/convai/conversation/get-signed-url |
| WS protocol docs | https://elevenlabs.io/docs/eleven-agents/api-reference/eleven-agents/websocket |
| Quickstart | https://elevenlabs.io/docs/eleven-agents/quickstart |

View File

@ -1,61 +0,0 @@
# PS_AI_Agent_ElevenLabs Plugin
## Location
`Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/`
## File Map
```
PS_AI_Agent_ElevenLabs.uplugin
Source/PS_AI_Agent_ElevenLabs/
PS_AI_Agent_ElevenLabs.Build.cs
Public/
PS_AI_Agent_ElevenLabs.h FPS_AI_Agent_ElevenLabsModule + UElevenLabsSettings
ElevenLabsDefinitions.h Enums, structs, ElevenLabsMessageType/Audio constants
ElevenLabsWebSocketProxy.h/.cpp UObject managing one WS session
ElevenLabsConversationalAgentComponent.h/.cpp Main ActorComponent (attach to NPC)
ElevenLabsMicrophoneCaptureComponent.h/.cpp Mic capture, resample, dispatch to game thread
Private/
(implementations of the above)
```
## ElevenLabs Conversational AI Protocol
- **WebSocket URL**: `wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<ID>`
- **Auth**: HTTP upgrade header `xi-api-key: <key>` (set in Project Settings)
- **All frames**: JSON text (no binary frames used by the API)
- **Audio format**: PCM 16-bit signed, 16000 Hz, mono, little-endian — Base64-encoded in JSON
### Client → Server messages
| Type field value | Payload |
|---|---|
| *(none key is the type)* `user_audio_chunk` | `{ "user_audio_chunk": "<base64 PCM>" }` |
| `user_turn_start` | `{ "type": "user_turn_start" }` |
| `user_turn_end` | `{ "type": "user_turn_end" }` |
| `interrupt` | `{ "type": "interrupt" }` |
| `pong` | `{ "type": "pong", "pong_event": { "event_id": N } }` |
### Server → Client messages (field: `type`)
| type value | Key nested object | Notes |
|---|---|---|
| `conversation_initiation_metadata` | `conversation_initiation_metadata_event.conversation_id` | Marks WS ready |
| `audio` | `audio_event.audio_base_64` | Base64 PCM from agent |
| `transcript` | `transcript_event.{speaker, message, is_final}` | User or agent speech |
| `agent_response` | `agent_response_event.agent_response` | Final agent text |
| `interruption` | — | Agent stopped mid-sentence |
| `ping` | `ping_event.event_id` | Must reply with pong |
## Key Design Decisions
- **No gRPC / no ThirdParty libs** — pure UE WebSockets + HTTP, builds out of the box
- Audio resampled in-plugin: device rate → 16000 Hz mono (linear interpolation)
- `USoundWaveProcedural` for real-time agent audio playback (queue-driven)
- Silence heuristic: 30 game-thread ticks (~0.5 s at 60 fps) with no new audio → agent done speaking
- `bSignedURLMode` setting: fetch a signed WS URL from your own backend (keeps API key off client)
- Two turn modes: `Server VAD` (ElevenLabs detects speech end) and `Client Controlled` (push-to-talk)
## Build Dependencies (Build.cs)
Core, CoreUObject, Engine, InputCore, Json, JsonUtilities, WebSockets, HTTP,
AudioMixer, AudioCaptureCore, AudioCapture, Voice, SignalProcessing
## Status
- **Session 1** (2026-02-19): All source files written, registered in .uproject. Not yet compiled.
- **TODO**: Open in UE 5.5 Editor → compile → test basic WS connection with a test agent ID.
- **Watch out**: Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature vs UE 5.5 API.

View File

@ -1,79 +0,0 @@
# Project Context & Original Ask
## What the user wants to build
A **UE5 plugin** that integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5,
allowing an in-game NPC (or any Actor) to hold a real-time voice conversation with a player.
### The original request (paraphrased)
> "I want to create a plugin to use ElevenLabs Conversational Agent in Unreal Engine 5.5.
> I previously used the Convai plugin which does what I want, but I prefer ElevenLabs quality.
> The goal is to create a plugin in the existing Unreal Project to make a first step for integration.
> Convai AI plugin may be too big in terms of functionality for the new project, but it is the final goal.
> You can use the Convai source code to find the right way to make the ElevenLabs version —
> it should be very similar."
### Plugin name
`PS_AI_Agent_ElevenLabs`
---
## User's mental model / intent
1. **Short-term**: A working first-step plugin — minimal but functional — that can:
- Connect to ElevenLabs Conversational AI via WebSocket
- Capture microphone audio from the player
- Stream it to ElevenLabs in real time
- Play back the agent's voice response
- Surface key events (transcript, agent text, speaking state) to Blueprint
2. **Long-term**: Match the full feature set of Convai — character IDs, session memory,
actions/environment context, lip-sync, etc. — but powered by ElevenLabs instead.
3. **Key preference**: Simpler than Convai. No gRPC, no protobuf, no ThirdParty precompiled
libraries. ElevenLabs' Conversational AI API uses plain WebSocket + JSON, which maps
naturally to UE's built-in `WebSockets` module.
---
## How we used Convai as a reference
We studied the Convai plugin source (`ConvAI/Convai/`) to understand:
- **Module structure**: `UConvaiSettings` + `IModuleInterface` + `ISettingsModule` registration
- **Audio capture pattern**: `Audio::FAudioCapture`, ring buffers, thread-safe dispatch to game thread
- **Audio playback pattern**: `USoundWaveProcedural` fed from a queue
- **Component architecture**: `UConvaiChatbotComponent` (NPC side) + `UConvaiPlayerComponent` (player side)
- **HTTP proxy pattern**: `UConvaiAPIBaseProxy` base class for async REST calls
- **Voice type enum**: Convai already had `EVoiceType::ElevenLabsVoices` — confirming ElevenLabs
is a natural fit
We then replaced gRPC/protobuf with **WebSocket + JSON** to match the ElevenLabs API, and
simplified the architecture to the minimum needed for a first working version.
---
## What was built (Session 1 — 2026-02-19)
All source files created and registered. See `.claude/elevenlabs_plugin.md` for full file map and protocol details.
### Components created
| Class | Role |
|---|---|
| `UElevenLabsSettings` | Project Settings UI — API key, Agent ID, security options |
| `UElevenLabsWebSocketProxy` | Manages one WS session: connect, send audio, handle all server message types |
| `UElevenLabsConversationalAgentComponent` | ActorComponent to attach to any NPC — orchestrates mic + WS + playback |
| `UElevenLabsMicrophoneCaptureComponent` | Wraps `Audio::FAudioCapture`, resamples to 16 kHz mono |
### Not yet done (next sessions)
- Compile & test in UE 5.5 Editor
- Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature for UE 5.5
- Add lip-sync support (future)
- Add session memory / conversation history (future)
- Add environment/action context support (future, matching Convai's full feature set)
---
## Notes on the ElevenLabs API
- Docs: https://elevenlabs.io/docs/conversational-ai
- Create agents at: https://elevenlabs.io/app/conversational-ai
- API keys at: https://elevenlabs.io (dashboard)

View File

@ -1,200 +0,0 @@
# Session Log — 2026-02-19
**Project**: PS_AI_Agent (Unreal Engine 5.5)
**Machine**: Desktop PC (j_foucher)
**Working directory**: `E:\ASTERION\GIT\PS_AI_Agent`
---
## Conversation Summary
### 1. Initial Request
User asked to create a plugin to use the ElevenLabs Conversational AI Agent in UE5.5.
Reference: existing Convai plugin (gRPC-based, more complex). Goal: simpler version using ElevenLabs.
Plugin name requested: `PS_AI_Agent_ElevenLabs`.
### 2. Codebase Exploration
Explored the Convai plugin source at `ConvAI/Convai/` to understand:
- Module/settings structure
- AudioCapture patterns
- HTTP proxy pattern
- gRPC streaming architecture (to know what to replace with WebSocket)
- Convai already had `EVoiceType::ElevenLabsVoices` — confirming the direction
### 3. Plugin Created
All source files written from scratch under:
`Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/`
Files created:
- `PS_AI_Agent_ElevenLabs.uplugin`
- `PS_AI_Agent_ElevenLabs.Build.cs`
- `Public/PS_AI_Agent_ElevenLabs.h` — Module + `UElevenLabsSettings`
- `Public/ElevenLabsDefinitions.h` — Enums, structs, protocol constants
- `Public/ElevenLabsWebSocketProxy.h` + `.cpp` — WS session manager
- `Public/ElevenLabsConversationalAgentComponent.h` + `.cpp` — Main NPC component
- `Public/ElevenLabsMicrophoneCaptureComponent.h` + `.cpp` — Mic capture
- `PS_AI_Agent.uproject` — Plugin registered
Commit: `f0055e8`
### 4. Memory Files Created
To allow context recovery on any machine (including laptop):
- `.claude/MEMORY.md` — project structure + patterns (auto-loaded by Claude Code)
- `.claude/elevenlabs_plugin.md` — plugin file map + API protocol details
- `.claude/project_context.md` — original ask, intent, short/long-term goals
- Local copy also at `C:\Users\j_foucher\.claude\projects\...\memory\`
Commit: `f0055e8` (with plugin), updated in `4d6ae10`
### 5. .gitignore Updated
Added to existing ignores:
- `Unreal/PS_AI_Agent/Plugins/*/Binaries/`
- `Unreal/PS_AI_Agent/Plugins/*/Intermediate/`
- `Unreal/PS_AI_Agent/*.sln` / `*.suo`
- `.claude/settings.local.json`
- `generate_pptx.py`
Commit: `4d6ae10`, `b114ab0`
### 6. Compile — First Attempt (Errors Found)
Ran `Build.bat PS_AI_AgentEditor Win64 Development`. Errors:
- `WebSockets` listed in `.uplugin` — it's a module not a plugin → removed
- `OpenDefaultCaptureStream` doesn't exist in UE 5.5 → use `OpenAudioCaptureStream`
- `FOnAudioCaptureFunction` callback uses `const void*` not `const float*` → fixed cast
- `TArray::RemoveAt(0, N, false)` deprecated → use `EAllowShrinking::No`
- `AudioCapture` is a plugin and must be in `.uplugin` Plugins array → added
Commit: `bb1a857`
### 7. Compile — Success
Clean build, no warnings, no errors.
Output: `Plugins/PS_AI_Agent_ElevenLabs/Binaries/Win64/UnrealEditor-PS_AI_Agent_ElevenLabs.dll`
Memory updated with confirmed UE 5.5 API patterns. Commit: `3b98edc`
### 8. Documentation — Markdown
Full reference doc written to `.claude/PS_AI_Agent_ElevenLabs_Documentation.md`:
- Installation, Project Settings, Quick Start (BP + C++), Components Reference,
Data Types, Turn Modes, Security/Signed URL, Audio Pipeline, Common Patterns, Troubleshooting.
Commit: `c833ccd`
### 9. Documentation — PowerPoint
20-slide dark-themed PowerPoint generated via Python (python-pptx 1.0.2):
- File: `PS_AI_Agent_ElevenLabs_Documentation.pptx` in repo root
- Covers all sections with visual layout, code blocks, flow diagrams, colour-coded elements
- Generator script `generate_pptx.py` excluded from git via .gitignore
Commit: `1b72026`
---
## Session 2 — 2026-02-19 (continued context)
### 10. API vs Implementation Cross-Check (3 bugs found and fixed)
Cross-referenced `elevenlabs_api_reference.md` against plugin source. Found 3 protocol bugs:
**Bug 1 — Transcript fields wrong:**
- Type: `"transcript"``"user_transcript"`
- Event key: `"transcript_event"``"user_transcription_event"`
- Field: `"message"``"user_transcript"`
**Bug 2 — Pong format wrong:**
- `event_id` was nested in `pong_event{}` → must be top-level
**Bug 3 — Client turn mode messages don't exist:**
- `"user_turn_start"` / `"user_turn_end"` are not valid API types
- Replaced: start → `"user_activity"`, end → no-op (server detects silence)
Commit: `ae2c9b9`
### 11. SendTextMessage Added
User asked for text input to agent for testing (without mic).
Added `SendTextMessage(FString)` to `UElevenLabsWebSocketProxy` and `UElevenLabsConversationalAgentComponent`.
Sends `{"type":"user_message","text":"..."}` — agent replies with audio + text.
Commit: `b489d11`
### 12. Binary WebSocket Frame Fix
User reported: `"Received unexpected binary WebSocket frame"` warnings.
Root cause: ElevenLabs sends **ALL WebSocket frames as binary**, never text.
`OnMessage` (text handler) never fires. `OnRawMessage` must handle everything.
Fix: Implemented `OnWsBinaryMessage` with fragment reassembly (`BinaryFrameBuffer`).
Commit: `669c503`
### 13. JSON vs PCM Discrimination Fix
After binary fix: `"Failed to parse WebSocket message as JSON"` errors.
Root cause: Binary frames contain BOTH JSON control messages AND raw PCM audio.
Fix: Peek at byte[0] of assembled buffer:
- `'{'` (0x7B) → UTF-8 JSON → route to `OnWsMessage()`
- anything else → raw PCM audio → broadcast to `OnAudioReceived`
Commit: `4834567`
### 14. Documentation Updated to v1.1.0
Full rewrite of `.claude/PS_AI_Agent_ElevenLabs_Documentation.md`:
- Added Changelog section (v1.0.0 / v1.1.0)
- Updated audio pipeline (binary PCM path, not Base64 JSON)
- Added `SendTextMessage` to all function tables and examples
- Corrected turn mode docs, transcript docs, `OnAgentConnected` timing
- New troubleshooting entries
Commit: `e464cfe`
### 15. Test Blueprint Asset Updated
`test_AI_Actor.uasset` updated in UE Editor.
Commit: `99017f4`
---
## Git History (this session)
| Hash | Message |
|------|---------|
| `f0055e8` | Add PS_AI_Agent_ElevenLabs plugin (initial implementation) |
| `4d6ae10` | Update .gitignore: exclude plugin build artifacts and local Claude settings |
| `b114ab0` | Broaden .gitignore: use glob for all plugin Binaries/Intermediate |
| `bb1a857` | Fix compile errors in PS_AI_Agent_ElevenLabs plugin |
| `3b98edc` | Update memory: document confirmed UE 5.5 API patterns and plugin compile status |
| `c833ccd` | Add plugin documentation for PS_AI_Agent_ElevenLabs |
| `1b72026` | Add PowerPoint documentation and update .gitignore |
| `bbeb429` | ElevenLabs API reference doc |
| `dbd6161` | TestMap, test actor, DefaultEngine.ini, memory update |
| `ae2c9b9` | Fix 3 WebSocket protocol bugs |
| `b489d11` | Add SendTextMessage |
| `669c503` | Fix binary WebSocket frames |
| `4834567` | Fix JSON vs binary frame discrimination |
| `e464cfe` | Update documentation to v1.1.0 |
| `99017f4` | Update test_AI_Actor blueprint asset |
---
## Key Technical Decisions Made This Session
| Decision | Reason |
|----------|--------|
| WebSocket instead of gRPC | ElevenLabs Conversational AI uses WS/JSON; no ThirdParty libs needed |
| `AudioCapture` in `.uplugin` Plugins array | It's an engine plugin, not a module — UBT requires it declared |
| `WebSockets` in Build.cs only | It's a module (no `.uplugin` file), declaring it in `.uplugin` causes build error |
| `FOnAudioCaptureFunction` uses `const void*` | UE 5.3+ API change — must cast to `float*` inside callback |
| `EAllowShrinking::No` | Bool overload of `RemoveAt` deprecated in UE 5.5 |
| `USoundWaveProcedural` for playback | Allows pushing raw PCM bytes at runtime without file I/O |
| Silence threshold = 30 ticks | ~0.5s at 60fps heuristic to detect agent finished speaking |
| Binary frame handling | ElevenLabs sends ALL WS frames as binary; peek byte[0] to discriminate JSON vs PCM |
| `user_activity` for client turn | `user_turn_start`/`user_turn_end` don't exist in ElevenLabs API |
---
## Next Steps (not done yet)
- [ ] Verify mic audio actually reaches ElevenLabs (enable Verbose Logging, test in Editor)
- [ ] Test `USoundWaveProcedural` underflow behaviour in practice (check for audio glitches)
- [ ] Test `SendTextMessage` end-to-end in Blueprint
- [ ] Add lip-sync support (future)
- [ ] Add session memory / conversation history (future, matching Convai)
- [ ] Add environment/action context support (future)
- [ ] Consider Signed URL Mode backend implementation

14
.gitignore vendored
View File

@ -4,17 +4,3 @@ Unreal/PS_AI_Agent/Binaries/
Unreal/PS_AI_Agent/Intermediate/ Unreal/PS_AI_Agent/Intermediate/
Unreal/PS_AI_Agent/Saved/ Unreal/PS_AI_Agent/Saved/
ConvAI/Convai/Binaries/ ConvAI/Convai/Binaries/
# All plugin build artifacts (Binaries + Intermediate for any plugin)
Unreal/PS_AI_Agent/Plugins/*/Binaries/
Unreal/PS_AI_Agent/Plugins/*/Intermediate/
# UE5 generated solution files
Unreal/PS_AI_Agent/*.sln
Unreal/PS_AI_Agent/*.suo
# Claude Code local session settings (machine-specific, memory files in .claude/ are kept)
.claude/settings.local.json
# Documentation generator script (dev tool, output .pptx is committed instead)
generate_pptx.py

View File

@ -1,8 +1,7 @@
[/Script/EngineSettings.GameMapsSettings] [/Script/EngineSettings.GameMapsSettings]
GameDefaultMap=/Game/TestMap.TestMap GameDefaultMap=/Engine/Maps/Templates/OpenWorld
EditorStartupMap=/Game/TestMap.TestMap
[/Script/Engine.RendererSettings] [/Script/Engine.RendererSettings]
r.AllowStaticLighting=False r.AllowStaticLighting=False
@ -91,4 +90,3 @@ ConnectionType=USBOnly
bUseManualIPAddress=False bUseManualIPAddress=False
ManualIPAddress= ManualIPAddress=

View File

@ -17,10 +17,6 @@
"TargetAllowList": [ "TargetAllowList": [
"Editor" "Editor"
] ]
},
{
"Name": "PS_AI_Agent_ElevenLabs",
"Enabled": true
} }
] ]
} }

View File

@ -1,35 +0,0 @@
{
"FileVersion": 3,
"Version": 1,
"VersionName": "1.0.0",
"FriendlyName": "PS AI Agent - ElevenLabs",
"Description": "Integrates ElevenLabs Conversational AI Agent into Unreal Engine 5.5. Supports real-time voice conversation via WebSocket, microphone capture, and audio playback.",
"Category": "AI",
"CreatedBy": "ASTERION",
"CreatedByURL": "",
"DocsURL": "https://elevenlabs.io/docs/conversational-ai",
"MarketplaceURL": "",
"SupportURL": "",
"CanContainContent": false,
"IsBetaVersion": true,
"IsExperimentalVersion": false,
"Installed": false,
"Modules": [
{
"Name": "PS_AI_Agent_ElevenLabs",
"Type": "Runtime",
"LoadingPhase": "PreDefault",
"PlatformAllowList": [
"Win64",
"Mac",
"Linux"
]
}
],
"Plugins": [
{
"Name": "AudioCapture",
"Enabled": true
}
]
}

View File

@ -1,40 +0,0 @@
// Copyright ASTERION. All Rights Reserved.
using UnrealBuildTool;
public class PS_AI_Agent_ElevenLabs : ModuleRules
{
public PS_AI_Agent_ElevenLabs(ReadOnlyTargetRules Target) : base(Target)
{
DefaultBuildSettings = BuildSettingsVersion.Latest;
PCHUsage = PCHUsageMode.UseExplicitOrSharedPCHs;
PublicDependencyModuleNames.AddRange(new string[]
{
"Core",
"CoreUObject",
"Engine",
"InputCore",
// JSON serialization for WebSocket message payloads
"Json",
"JsonUtilities",
// WebSocket for ElevenLabs Conversational AI real-time API
"WebSockets",
// HTTP for REST calls (agent metadata, auth, etc.)
"HTTP",
// Audio capture (microphone input)
"AudioMixer",
"AudioCaptureCore",
"AudioCapture",
"Voice",
"SignalProcessing",
});
PrivateDependencyModuleNames.AddRange(new string[]
{
"Projects",
// For ISettingsModule (Project Settings integration)
"Settings",
});
}
}

View File

@ -1,345 +0,0 @@
// Copyright ASTERION. All Rights Reserved.
#include "ElevenLabsConversationalAgentComponent.h"
#include "ElevenLabsMicrophoneCaptureComponent.h"
#include "PS_AI_Agent_ElevenLabs.h"
#include "Components/AudioComponent.h"
#include "Sound/SoundWaveProcedural.h"
#include "GameFramework/Actor.h"
#include "Engine/World.h"
DEFINE_LOG_CATEGORY_STATIC(LogElevenLabsAgent, Log, All);
// ─────────────────────────────────────────────────────────────────────────────
// Constructor
// ─────────────────────────────────────────────────────────────────────────────
UElevenLabsConversationalAgentComponent::UElevenLabsConversationalAgentComponent()
{
PrimaryComponentTick.bCanEverTick = true;
// Tick is used only to detect silence (agent stopped speaking).
// Disable if not needed for perf.
PrimaryComponentTick.TickInterval = 1.0f / 60.0f;
}
// ─────────────────────────────────────────────────────────────────────────────
// Lifecycle
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsConversationalAgentComponent::BeginPlay()
{
Super::BeginPlay();
InitAudioPlayback();
}
void UElevenLabsConversationalAgentComponent::EndPlay(const EEndPlayReason::Type EndPlayReason)
{
EndConversation();
Super::EndPlay(EndPlayReason);
}
void UElevenLabsConversationalAgentComponent::TickComponent(float DeltaTime, ELevelTick TickType,
FActorComponentTickFunction* ThisTickFunction)
{
Super::TickComponent(DeltaTime, TickType, ThisTickFunction);
if (bAgentSpeaking)
{
FScopeLock Lock(&AudioQueueLock);
if (AudioQueue.Num() == 0)
{
SilentTickCount++;
if (SilentTickCount >= SilenceThresholdTicks)
{
bAgentSpeaking = false;
SilentTickCount = 0;
OnAgentStoppedSpeaking.Broadcast();
}
}
else
{
SilentTickCount = 0;
}
}
}
// ─────────────────────────────────────────────────────────────────────────────
// Control
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsConversationalAgentComponent::StartConversation()
{
if (!WebSocketProxy)
{
WebSocketProxy = NewObject<UElevenLabsWebSocketProxy>(this);
WebSocketProxy->OnConnected.AddDynamic(this,
&UElevenLabsConversationalAgentComponent::HandleConnected);
WebSocketProxy->OnDisconnected.AddDynamic(this,
&UElevenLabsConversationalAgentComponent::HandleDisconnected);
WebSocketProxy->OnError.AddDynamic(this,
&UElevenLabsConversationalAgentComponent::HandleError);
WebSocketProxy->OnAudioReceived.AddDynamic(this,
&UElevenLabsConversationalAgentComponent::HandleAudioReceived);
WebSocketProxy->OnTranscript.AddDynamic(this,
&UElevenLabsConversationalAgentComponent::HandleTranscript);
WebSocketProxy->OnAgentResponse.AddDynamic(this,
&UElevenLabsConversationalAgentComponent::HandleAgentResponse);
WebSocketProxy->OnInterrupted.AddDynamic(this,
&UElevenLabsConversationalAgentComponent::HandleInterrupted);
}
WebSocketProxy->Connect(AgentID);
}
void UElevenLabsConversationalAgentComponent::EndConversation()
{
StopListening();
StopAgentAudio();
if (WebSocketProxy)
{
WebSocketProxy->Disconnect();
WebSocketProxy = nullptr;
}
}
void UElevenLabsConversationalAgentComponent::StartListening()
{
if (!IsConnected())
{
UE_LOG(LogElevenLabsAgent, Warning, TEXT("StartListening: not connected."));
return;
}
if (bIsListening) return;
bIsListening = true;
if (TurnMode == EElevenLabsTurnMode::Client)
{
WebSocketProxy->SendUserTurnStart();
}
// Find the microphone component on our owner actor, or create one.
UElevenLabsMicrophoneCaptureComponent* Mic =
GetOwner()->FindComponentByClass<UElevenLabsMicrophoneCaptureComponent>();
if (!Mic)
{
Mic = NewObject<UElevenLabsMicrophoneCaptureComponent>(GetOwner(),
TEXT("ElevenLabsMicrophone"));
Mic->RegisterComponent();
}
Mic->OnAudioCaptured.AddUObject(this,
&UElevenLabsConversationalAgentComponent::OnMicrophoneDataCaptured);
Mic->StartCapture();
UE_LOG(LogElevenLabsAgent, Log, TEXT("Microphone capture started."));
}
void UElevenLabsConversationalAgentComponent::StopListening()
{
if (!bIsListening) return;
bIsListening = false;
if (UElevenLabsMicrophoneCaptureComponent* Mic =
GetOwner() ? GetOwner()->FindComponentByClass<UElevenLabsMicrophoneCaptureComponent>() : nullptr)
{
Mic->StopCapture();
Mic->OnAudioCaptured.RemoveAll(this);
}
if (WebSocketProxy && TurnMode == EElevenLabsTurnMode::Client)
{
WebSocketProxy->SendUserTurnEnd();
}
UE_LOG(LogElevenLabsAgent, Log, TEXT("Microphone capture stopped."));
}
void UElevenLabsConversationalAgentComponent::SendTextMessage(const FString& Text)
{
if (!IsConnected())
{
UE_LOG(LogElevenLabsAgent, Warning, TEXT("SendTextMessage: not connected. Call StartConversation() first."));
return;
}
WebSocketProxy->SendTextMessage(Text);
}
void UElevenLabsConversationalAgentComponent::InterruptAgent()
{
if (WebSocketProxy) WebSocketProxy->SendInterrupt();
StopAgentAudio();
}
// ─────────────────────────────────────────────────────────────────────────────
// State queries
// ─────────────────────────────────────────────────────────────────────────────
bool UElevenLabsConversationalAgentComponent::IsConnected() const
{
return WebSocketProxy && WebSocketProxy->IsConnected();
}
const FElevenLabsConversationInfo& UElevenLabsConversationalAgentComponent::GetConversationInfo() const
{
static FElevenLabsConversationInfo Empty;
return WebSocketProxy ? WebSocketProxy->GetConversationInfo() : Empty;
}
// ─────────────────────────────────────────────────────────────────────────────
// WebSocket event handlers
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsConversationalAgentComponent::HandleConnected(const FElevenLabsConversationInfo& Info)
{
UE_LOG(LogElevenLabsAgent, Log, TEXT("Agent connected. ConversationID=%s"), *Info.ConversationID);
OnAgentConnected.Broadcast(Info);
if (bAutoStartListening)
{
StartListening();
}
}
void UElevenLabsConversationalAgentComponent::HandleDisconnected(int32 StatusCode, const FString& Reason)
{
UE_LOG(LogElevenLabsAgent, Log, TEXT("Agent disconnected. Code=%d Reason=%s"), StatusCode, *Reason);
bIsListening = false;
bAgentSpeaking = false;
OnAgentDisconnected.Broadcast(StatusCode, Reason);
}
void UElevenLabsConversationalAgentComponent::HandleError(const FString& ErrorMessage)
{
UE_LOG(LogElevenLabsAgent, Error, TEXT("Agent error: %s"), *ErrorMessage);
OnAgentError.Broadcast(ErrorMessage);
}
void UElevenLabsConversationalAgentComponent::HandleAudioReceived(const TArray<uint8>& PCMData)
{
EnqueueAgentAudio(PCMData);
}
void UElevenLabsConversationalAgentComponent::HandleTranscript(const FElevenLabsTranscriptSegment& Segment)
{
OnAgentTranscript.Broadcast(Segment);
}
void UElevenLabsConversationalAgentComponent::HandleAgentResponse(const FString& ResponseText)
{
OnAgentTextResponse.Broadcast(ResponseText);
}
void UElevenLabsConversationalAgentComponent::HandleInterrupted()
{
StopAgentAudio();
OnAgentInterrupted.Broadcast();
}
// ─────────────────────────────────────────────────────────────────────────────
// Audio playback
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsConversationalAgentComponent::InitAudioPlayback()
{
AActor* Owner = GetOwner();
if (!Owner) return;
// USoundWaveProcedural lets us push raw PCM data at runtime.
ProceduralSoundWave = NewObject<USoundWaveProcedural>(this);
ProceduralSoundWave->SetSampleRate(ElevenLabsAudio::SampleRate);
ProceduralSoundWave->NumChannels = ElevenLabsAudio::Channels;
ProceduralSoundWave->Duration = INDEFINITELY_LOOPING_DURATION;
ProceduralSoundWave->SoundGroup = SOUNDGROUP_Voice;
ProceduralSoundWave->bLooping = false;
// Create the audio component attached to the owner.
AudioPlaybackComponent = NewObject<UAudioComponent>(Owner, TEXT("ElevenLabsAudioPlayback"));
AudioPlaybackComponent->RegisterComponent();
AudioPlaybackComponent->bAutoActivate = false;
AudioPlaybackComponent->SetSound(ProceduralSoundWave);
// When the procedural sound wave needs more audio data, pull from our queue.
ProceduralSoundWave->OnSoundWaveProceduralUnderflow =
FOnSoundWaveProceduralUnderflow::CreateUObject(
this, &UElevenLabsConversationalAgentComponent::OnProceduralUnderflow);
}
void UElevenLabsConversationalAgentComponent::OnProceduralUnderflow(
USoundWaveProcedural* InProceduralWave, const int32 SamplesRequired)
{
FScopeLock Lock(&AudioQueueLock);
if (AudioQueue.Num() == 0) return;
const int32 BytesRequired = SamplesRequired * sizeof(int16);
const int32 BytesToPush = FMath::Min(AudioQueue.Num(), BytesRequired);
InProceduralWave->QueueAudio(AudioQueue.GetData(), BytesToPush);
AudioQueue.RemoveAt(0, BytesToPush, EAllowShrinking::No);
}
void UElevenLabsConversationalAgentComponent::EnqueueAgentAudio(const TArray<uint8>& PCMData)
{
{
FScopeLock Lock(&AudioQueueLock);
AudioQueue.Append(PCMData);
}
// Start playback if not already playing.
if (!bAgentSpeaking)
{
bAgentSpeaking = true;
SilentTickCount = 0;
OnAgentStartedSpeaking.Broadcast();
if (AudioPlaybackComponent && !AudioPlaybackComponent->IsPlaying())
{
AudioPlaybackComponent->Play();
}
}
}
void UElevenLabsConversationalAgentComponent::StopAgentAudio()
{
if (AudioPlaybackComponent && AudioPlaybackComponent->IsPlaying())
{
AudioPlaybackComponent->Stop();
}
FScopeLock Lock(&AudioQueueLock);
AudioQueue.Empty();
if (bAgentSpeaking)
{
bAgentSpeaking = false;
SilentTickCount = 0;
OnAgentStoppedSpeaking.Broadcast();
}
}
// ─────────────────────────────────────────────────────────────────────────────
// Microphone → WebSocket
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsConversationalAgentComponent::OnMicrophoneDataCaptured(const TArray<float>& FloatPCM)
{
if (!IsConnected() || !bIsListening) return;
TArray<uint8> PCMBytes = FloatPCMToInt16Bytes(FloatPCM);
WebSocketProxy->SendAudioChunk(PCMBytes);
}
TArray<uint8> UElevenLabsConversationalAgentComponent::FloatPCMToInt16Bytes(const TArray<float>& FloatPCM)
{
TArray<uint8> Out;
Out.Reserve(FloatPCM.Num() * 2);
for (float Sample : FloatPCM)
{
// Clamp to [-1,1] then scale to int16 range
const float Clamped = FMath::Clamp(Sample, -1.0f, 1.0f);
const int16 Int16Sample = static_cast<int16>(Clamped * 32767.0f);
// Little-endian
Out.Add(static_cast<uint8>(Int16Sample & 0xFF));
Out.Add(static_cast<uint8>((Int16Sample >> 8) & 0xFF));
}
return Out;
}

View File

@ -1,171 +0,0 @@
// Copyright ASTERION. All Rights Reserved.
#include "ElevenLabsMicrophoneCaptureComponent.h"
#include "ElevenLabsDefinitions.h"
#include "AudioCaptureCore.h"
#include "Async/Async.h"
DEFINE_LOG_CATEGORY_STATIC(LogElevenLabsMic, Log, All);
// ─────────────────────────────────────────────────────────────────────────────
// Constructor
// ─────────────────────────────────────────────────────────────────────────────
UElevenLabsMicrophoneCaptureComponent::UElevenLabsMicrophoneCaptureComponent()
{
PrimaryComponentTick.bCanEverTick = false;
}
// ─────────────────────────────────────────────────────────────────────────────
// Lifecycle
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsMicrophoneCaptureComponent::EndPlay(const EEndPlayReason::Type EndPlayReason)
{
StopCapture();
Super::EndPlay(EndPlayReason);
}
// ─────────────────────────────────────────────────────────────────────────────
// Capture control
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsMicrophoneCaptureComponent::StartCapture()
{
if (bCapturing)
{
UE_LOG(LogElevenLabsMic, Warning, TEXT("StartCapture called while already capturing."));
return;
}
// Open the default audio capture stream.
// FOnAudioCaptureFunction uses const void* per UE 5.3+ API (cast to float* inside).
Audio::FOnAudioCaptureFunction CaptureCallback =
[this](const void* InAudio, int32 NumFrames, int32 InNumChannels,
int32 InSampleRate, double StreamTime, bool bOverflow)
{
OnAudioGenerate(InAudio, NumFrames, InNumChannels, InSampleRate, StreamTime, bOverflow);
};
if (!AudioCapture.OpenAudioCaptureStream(DeviceParams, MoveTemp(CaptureCallback), 1024))
{
UE_LOG(LogElevenLabsMic, Error, TEXT("Failed to open default audio capture stream."));
return;
}
// Retrieve the actual device parameters after opening the stream.
Audio::FCaptureDeviceInfo DeviceInfo;
if (AudioCapture.GetCaptureDeviceInfo(DeviceInfo))
{
DeviceSampleRate = DeviceInfo.PreferredSampleRate;
DeviceChannels = DeviceInfo.InputChannels;
UE_LOG(LogElevenLabsMic, Log, TEXT("Capture device: %s | Rate=%d | Channels=%d"),
*DeviceInfo.DeviceName, DeviceSampleRate, DeviceChannels);
}
AudioCapture.StartStream();
bCapturing = true;
UE_LOG(LogElevenLabsMic, Log, TEXT("Audio capture started."));
}
void UElevenLabsMicrophoneCaptureComponent::StopCapture()
{
if (!bCapturing) return;
AudioCapture.StopStream();
AudioCapture.CloseStream();
bCapturing = false;
UE_LOG(LogElevenLabsMic, Log, TEXT("Audio capture stopped."));
}
// ─────────────────────────────────────────────────────────────────────────────
// Audio callback (background thread)
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsMicrophoneCaptureComponent::OnAudioGenerate(
const void* InAudio, int32 NumFrames,
int32 InNumChannels, int32 InSampleRate,
double StreamTime, bool bOverflow)
{
if (bOverflow)
{
UE_LOG(LogElevenLabsMic, Verbose, TEXT("Audio capture buffer overflow."));
}
// Device sends float32 interleaved samples; cast from the void* API.
const float* FloatAudio = static_cast<const float*>(InAudio);
// Resample + downmix to 16000 Hz mono.
TArray<float> Resampled = ResampleTo16000(FloatAudio, NumFrames, InNumChannels, InSampleRate);
// Apply volume multiplier.
if (!FMath::IsNearlyEqual(VolumeMultiplier, 1.0f))
{
for (float& S : Resampled)
{
S *= VolumeMultiplier;
}
}
// Fire the delegate on the game thread so subscribers don't need to be
// thread-safe (WebSocket Send is not thread-safe in UE's implementation).
AsyncTask(ENamedThreads::GameThread, [this, Data = MoveTemp(Resampled)]()
{
if (bCapturing)
{
OnAudioCaptured.Broadcast(Data);
}
});
}
// ─────────────────────────────────────────────────────────────────────────────
// Resampling
// ─────────────────────────────────────────────────────────────────────────────
TArray<float> UElevenLabsMicrophoneCaptureComponent::ResampleTo16000(
const float* InAudio, int32 NumSamples,
int32 InChannels, int32 InSampleRate)
{
const int32 TargetRate = ElevenLabsAudio::SampleRate; // 16000
// --- Step 1: Downmix to mono ---
TArray<float> Mono;
if (InChannels == 1)
{
Mono = TArray<float>(InAudio, NumSamples);
}
else
{
const int32 NumFrames = NumSamples / InChannels;
Mono.Reserve(NumFrames);
for (int32 i = 0; i < NumFrames; i++)
{
float Sum = 0.0f;
for (int32 c = 0; c < InChannels; c++)
{
Sum += InAudio[i * InChannels + c];
}
Mono.Add(Sum / static_cast<float>(InChannels));
}
}
// --- Step 2: Resample via linear interpolation ---
if (InSampleRate == TargetRate)
{
return Mono;
}
const float Ratio = static_cast<float>(InSampleRate) / static_cast<float>(TargetRate);
const int32 OutSamples = FMath::FloorToInt(static_cast<float>(Mono.Num()) / Ratio);
TArray<float> Out;
Out.Reserve(OutSamples);
for (int32 i = 0; i < OutSamples; i++)
{
const float SrcIndex = static_cast<float>(i) * Ratio;
const int32 SrcLow = FMath::FloorToInt(SrcIndex);
const int32 SrcHigh = FMath::Min(SrcLow + 1, Mono.Num() - 1);
const float Alpha = SrcIndex - static_cast<float>(SrcLow);
Out.Add(FMath::Lerp(Mono[SrcLow], Mono[SrcHigh], Alpha));
}
return Out;
}

View File

@ -1,455 +0,0 @@
// Copyright ASTERION. All Rights Reserved.
#include "ElevenLabsWebSocketProxy.h"
#include "PS_AI_Agent_ElevenLabs.h"
#include "WebSocketsModule.h"
#include "IWebSocket.h"
#include "Json.h"
#include "JsonUtilities.h"
#include "Misc/Base64.h"
DEFINE_LOG_CATEGORY_STATIC(LogElevenLabsWS, Log, All);
// ─────────────────────────────────────────────────────────────────────────────
// Helpers
// ─────────────────────────────────────────────────────────────────────────────
static void EL_LOG(bool bVerbose, const TCHAR* Format, ...)
{
if (!bVerbose) return;
va_list Args;
va_start(Args, Format);
// Forward to UE_LOG at Verbose level
TCHAR Buffer[2048];
FCString::GetVarArgs(Buffer, UE_ARRAY_COUNT(Buffer), Format, Args);
va_end(Args);
UE_LOG(LogElevenLabsWS, Verbose, TEXT("%s"), Buffer);
}
// ─────────────────────────────────────────────────────────────────────────────
// Connect / Disconnect
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsWebSocketProxy::Connect(const FString& AgentIDOverride, const FString& APIKeyOverride)
{
if (ConnectionState == EElevenLabsConnectionState::Connected ||
ConnectionState == EElevenLabsConnectionState::Connecting)
{
UE_LOG(LogElevenLabsWS, Warning, TEXT("Connect called but already connecting/connected. Ignoring."));
return;
}
if (!FModuleManager::Get().IsModuleLoaded("WebSockets"))
{
FModuleManager::LoadModuleChecked<FWebSocketsModule>("WebSockets");
}
const FString URL = BuildWebSocketURL(AgentIDOverride, APIKeyOverride);
if (URL.IsEmpty())
{
const FString Msg = TEXT("Cannot connect: no Agent ID configured. Set it in Project Settings or pass it to Connect().");
UE_LOG(LogElevenLabsWS, Error, TEXT("%s"), *Msg);
OnError.Broadcast(Msg);
ConnectionState = EElevenLabsConnectionState::Error;
return;
}
UE_LOG(LogElevenLabsWS, Log, TEXT("Connecting to ElevenLabs: %s"), *URL);
ConnectionState = EElevenLabsConnectionState::Connecting;
// Headers: the ElevenLabs Conversational AI WS endpoint accepts the
// xi-api-key header on the initial HTTP upgrade request.
TMap<FString, FString> UpgradeHeaders;
const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings();
const FString ResolvedKey = APIKeyOverride.IsEmpty() ? Settings->API_Key : APIKeyOverride;
if (!ResolvedKey.IsEmpty())
{
UpgradeHeaders.Add(TEXT("xi-api-key"), ResolvedKey);
}
WebSocket = FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), UpgradeHeaders);
WebSocket->OnConnected().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsConnected);
WebSocket->OnConnectionError().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsConnectionError);
WebSocket->OnClosed().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsClosed);
WebSocket->OnMessage().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsMessage);
WebSocket->OnRawMessage().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsBinaryMessage);
WebSocket->Connect();
}
void UElevenLabsWebSocketProxy::Disconnect()
{
if (WebSocket.IsValid() && WebSocket->IsConnected())
{
WebSocket->Close(1000, TEXT("Client disconnected"));
}
ConnectionState = EElevenLabsConnectionState::Disconnected;
}
// ─────────────────────────────────────────────────────────────────────────────
// Audio & turn control
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsWebSocketProxy::SendAudioChunk(const TArray<uint8>& PCMData)
{
if (!IsConnected())
{
UE_LOG(LogElevenLabsWS, Warning, TEXT("SendAudioChunk: not connected."));
return;
}
if (PCMData.Num() == 0) return;
// ElevenLabs expects: { "user_audio_chunk": "<base64 PCM>" }
const FString Base64Audio = FBase64::Encode(PCMData.GetData(), PCMData.Num());
TSharedPtr<FJsonObject> Msg = MakeShareable(new FJsonObject());
Msg->SetStringField(ElevenLabsMessageType::AudioChunk, Base64Audio);
SendJsonMessage(Msg);
}
void UElevenLabsWebSocketProxy::SendUserTurnStart()
{
// In client turn mode, signal that the user is active/speaking.
// API message: { "type": "user_activity" }
if (!IsConnected()) return;
TSharedPtr<FJsonObject> Msg = MakeShareable(new FJsonObject());
Msg->SetStringField(TEXT("type"), ElevenLabsMessageType::UserActivity);
SendJsonMessage(Msg);
}
void UElevenLabsWebSocketProxy::SendUserTurnEnd()
{
// In client turn mode, stopping user_activity signals end of user turn.
// The API uses user_activity for ongoing speech; simply stop sending it.
// No explicit end message is required — silence is detected server-side.
// We still log for debug visibility.
UE_LOG(LogElevenLabsWS, Log, TEXT("User turn ended (client mode) — stopped sending user_activity."));
}
void UElevenLabsWebSocketProxy::SendTextMessage(const FString& Text)
{
if (!IsConnected())
{
UE_LOG(LogElevenLabsWS, Warning, TEXT("SendTextMessage: not connected."));
return;
}
if (Text.IsEmpty()) return;
// API: { "type": "user_message", "text": "Hello agent" }
TSharedPtr<FJsonObject> Msg = MakeShareable(new FJsonObject());
Msg->SetStringField(TEXT("type"), ElevenLabsMessageType::UserMessage);
Msg->SetStringField(TEXT("text"), Text);
SendJsonMessage(Msg);
}
void UElevenLabsWebSocketProxy::SendInterrupt()
{
if (!IsConnected()) return;
TSharedPtr<FJsonObject> Msg = MakeShareable(new FJsonObject());
Msg->SetStringField(TEXT("type"), ElevenLabsMessageType::Interrupt);
SendJsonMessage(Msg);
}
// ─────────────────────────────────────────────────────────────────────────────
// WebSocket callbacks
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsWebSocketProxy::OnWsConnected()
{
UE_LOG(LogElevenLabsWS, Log, TEXT("WebSocket connected. Waiting for conversation_initiation_metadata..."));
// State stays Connecting until we receive the initiation metadata from the server.
}
void UElevenLabsWebSocketProxy::OnWsConnectionError(const FString& Error)
{
UE_LOG(LogElevenLabsWS, Error, TEXT("WebSocket connection error: %s"), *Error);
ConnectionState = EElevenLabsConnectionState::Error;
OnError.Broadcast(Error);
}
void UElevenLabsWebSocketProxy::OnWsClosed(int32 StatusCode, const FString& Reason, bool bWasClean)
{
UE_LOG(LogElevenLabsWS, Log, TEXT("WebSocket closed. Code=%d Reason=%s Clean=%d"), StatusCode, *Reason, bWasClean);
ConnectionState = EElevenLabsConnectionState::Disconnected;
WebSocket.Reset();
OnDisconnected.Broadcast(StatusCode, Reason);
}
void UElevenLabsWebSocketProxy::OnWsMessage(const FString& Message)
{
const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings();
if (Settings->bVerboseLogging)
{
UE_LOG(LogElevenLabsWS, Verbose, TEXT(">> %s"), *Message);
}
TSharedPtr<FJsonObject> Root;
TSharedRef<TJsonReader<>> Reader = TJsonReaderFactory<>::Create(Message);
if (!FJsonSerializer::Deserialize(Reader, Root) || !Root.IsValid())
{
UE_LOG(LogElevenLabsWS, Warning, TEXT("Failed to parse WebSocket message as JSON (first 80 chars): %.80s"), *Message);
return;
}
FString MsgType;
// ElevenLabs wraps the type in a "type" field
if (!Root->TryGetStringField(TEXT("type"), MsgType))
{
// Fallback: some messages use the top-level key as the type
// e.g. { "user_audio_chunk": "..." } from ourselves (shouldn't arrive)
UE_LOG(LogElevenLabsWS, Verbose, TEXT("Message has no 'type' field, ignoring."));
return;
}
if (MsgType == ElevenLabsMessageType::ConversationInitiation)
{
HandleConversationInitiation(Root);
}
else if (MsgType == ElevenLabsMessageType::AudioResponse)
{
HandleAudioResponse(Root);
}
else if (MsgType == ElevenLabsMessageType::UserTranscript)
{
HandleTranscript(Root);
}
else if (MsgType == ElevenLabsMessageType::AgentResponse)
{
HandleAgentResponse(Root);
}
else if (MsgType == ElevenLabsMessageType::AgentResponseCorrection)
{
// Silently ignore for now — corrected text after interruption.
UE_LOG(LogElevenLabsWS, Verbose, TEXT("agent_response_correction received (ignored)."));
}
else if (MsgType == ElevenLabsMessageType::InterruptionEvent)
{
HandleInterruption(Root);
}
else if (MsgType == ElevenLabsMessageType::PingEvent)
{
HandlePing(Root);
}
else
{
UE_LOG(LogElevenLabsWS, Verbose, TEXT("Unhandled message type: %s"), *MsgType);
}
}
void UElevenLabsWebSocketProxy::OnWsBinaryMessage(const void* Data, SIZE_T Size, SIZE_T BytesRemaining)
{
// Accumulate fragments until BytesRemaining == 0.
const uint8* Bytes = static_cast<const uint8*>(Data);
BinaryFrameBuffer.Append(Bytes, Size);
if (BytesRemaining > 0)
{
// More fragments coming — wait for the rest
return;
}
const int32 TotalSize = BinaryFrameBuffer.Num();
// Peek at first byte to distinguish JSON (starts with '{') from raw binary audio.
const bool bLooksLikeJson = (TotalSize > 0 && BinaryFrameBuffer[0] == '{');
if (bLooksLikeJson)
{
// Null-terminate safely then decode as UTF-8 JSON
BinaryFrameBuffer.Add(0);
const FString JsonString = FString(UTF8_TO_TCHAR(
reinterpret_cast<const char*>(BinaryFrameBuffer.GetData())));
BinaryFrameBuffer.Reset();
const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings();
if (Settings->bVerboseLogging)
{
UE_LOG(LogElevenLabsWS, Verbose, TEXT("Binary JSON frame (%d bytes): %.120s"), TotalSize, *JsonString);
}
OnWsMessage(JsonString);
}
else
{
// Raw binary audio frame — PCM bytes sent directly without Base64/JSON wrapper.
// Log first few bytes as hex to help diagnose the format.
const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings();
if (Settings->bVerboseLogging)
{
FString HexPreview;
const int32 PreviewBytes = FMath::Min(TotalSize, 8);
for (int32 i = 0; i < PreviewBytes; i++)
{
HexPreview += FString::Printf(TEXT("%02X "), BinaryFrameBuffer[i]);
}
UE_LOG(LogElevenLabsWS, Verbose, TEXT("Binary audio frame: %d bytes | first bytes: %s"), TotalSize, *HexPreview);
}
// Broadcast raw PCM bytes directly to the audio queue.
TArray<uint8> PCMData = MoveTemp(BinaryFrameBuffer);
BinaryFrameBuffer.Reset();
OnAudioReceived.Broadcast(PCMData);
}
}
// ─────────────────────────────────────────────────────────────────────────────
// Message handlers
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsWebSocketProxy::HandleConversationInitiation(const TSharedPtr<FJsonObject>& Root)
{
// Expected structure:
// { "type": "conversation_initiation_metadata",
// "conversation_initiation_metadata_event": {
// "conversation_id": "...",
// "agent_output_audio_format": "pcm_16000"
// }
// }
const TSharedPtr<FJsonObject>* MetaObj = nullptr;
if (Root->TryGetObjectField(TEXT("conversation_initiation_metadata_event"), MetaObj) && MetaObj)
{
(*MetaObj)->TryGetStringField(TEXT("conversation_id"), ConversationInfo.ConversationID);
}
UE_LOG(LogElevenLabsWS, Log, TEXT("Conversation initiated. ID=%s"), *ConversationInfo.ConversationID);
ConnectionState = EElevenLabsConnectionState::Connected;
OnConnected.Broadcast(ConversationInfo);
}
void UElevenLabsWebSocketProxy::HandleAudioResponse(const TSharedPtr<FJsonObject>& Root)
{
// Expected structure:
// { "type": "audio",
// "audio_event": { "audio_base_64": "<base64 PCM>", "event_id": 1 }
// }
const TSharedPtr<FJsonObject>* AudioEvent = nullptr;
if (!Root->TryGetObjectField(TEXT("audio_event"), AudioEvent) || !AudioEvent)
{
UE_LOG(LogElevenLabsWS, Warning, TEXT("audio message missing 'audio_event' field."));
return;
}
FString Base64Audio;
if (!(*AudioEvent)->TryGetStringField(TEXT("audio_base_64"), Base64Audio))
{
UE_LOG(LogElevenLabsWS, Warning, TEXT("audio_event missing 'audio_base_64' field."));
return;
}
TArray<uint8> PCMData;
if (!FBase64::Decode(Base64Audio, PCMData))
{
UE_LOG(LogElevenLabsWS, Warning, TEXT("Failed to Base64-decode audio data."));
return;
}
OnAudioReceived.Broadcast(PCMData);
}
void UElevenLabsWebSocketProxy::HandleTranscript(const TSharedPtr<FJsonObject>& Root)
{
// API structure:
// { "type": "user_transcript",
// "user_transcription_event": { "user_transcript": "Hello" }
// }
// This message only carries the user's speech-to-text — speaker is always "user".
const TSharedPtr<FJsonObject>* TranscriptEvent = nullptr;
if (!Root->TryGetObjectField(TEXT("user_transcription_event"), TranscriptEvent) || !TranscriptEvent)
{
UE_LOG(LogElevenLabsWS, Warning, TEXT("user_transcript message missing 'user_transcription_event' field."));
return;
}
FElevenLabsTranscriptSegment Segment;
Segment.Speaker = TEXT("user");
(*TranscriptEvent)->TryGetStringField(TEXT("user_transcript"), Segment.Text);
// user_transcript messages are always final (interim results are not sent for user speech)
Segment.bIsFinal = true;
OnTranscript.Broadcast(Segment);
}
void UElevenLabsWebSocketProxy::HandleAgentResponse(const TSharedPtr<FJsonObject>& Root)
{
// { "type": "agent_response",
// "agent_response_event": { "agent_response": "..." }
// }
const TSharedPtr<FJsonObject>* ResponseEvent = nullptr;
if (!Root->TryGetObjectField(TEXT("agent_response_event"), ResponseEvent) || !ResponseEvent)
{
return;
}
FString ResponseText;
(*ResponseEvent)->TryGetStringField(TEXT("agent_response"), ResponseText);
OnAgentResponse.Broadcast(ResponseText);
}
void UElevenLabsWebSocketProxy::HandleInterruption(const TSharedPtr<FJsonObject>& Root)
{
UE_LOG(LogElevenLabsWS, Log, TEXT("Agent interrupted."));
OnInterrupted.Broadcast();
}
void UElevenLabsWebSocketProxy::HandlePing(const TSharedPtr<FJsonObject>& Root)
{
// Reply with a pong to keep the connection alive.
// Incoming: { "type": "ping", "ping_event": { "event_id": 1, "ping_ms": 150 } }
// Reply: { "type": "pong", "event_id": 1 } ← event_id is top-level, no wrapper object
int32 EventID = 0;
const TSharedPtr<FJsonObject>* PingEvent = nullptr;
if (Root->TryGetObjectField(TEXT("ping_event"), PingEvent) && PingEvent)
{
(*PingEvent)->TryGetNumberField(TEXT("event_id"), EventID);
}
TSharedPtr<FJsonObject> Pong = MakeShareable(new FJsonObject());
Pong->SetStringField(TEXT("type"), TEXT("pong"));
Pong->SetNumberField(TEXT("event_id"), EventID); // top-level, not nested
SendJsonMessage(Pong);
}
// ─────────────────────────────────────────────────────────────────────────────
// Helpers
// ─────────────────────────────────────────────────────────────────────────────
void UElevenLabsWebSocketProxy::SendJsonMessage(const TSharedPtr<FJsonObject>& JsonObj)
{
if (!WebSocket.IsValid() || !WebSocket->IsConnected())
{
UE_LOG(LogElevenLabsWS, Warning, TEXT("SendJsonMessage: WebSocket not connected."));
return;
}
FString Out;
TSharedRef<TJsonWriter<>> Writer = TJsonWriterFactory<>::Create(&Out);
FJsonSerializer::Serialize(JsonObj.ToSharedRef(), Writer);
const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings();
if (Settings->bVerboseLogging)
{
UE_LOG(LogElevenLabsWS, Verbose, TEXT("<< %s"), *Out);
}
WebSocket->Send(Out);
}
FString UElevenLabsWebSocketProxy::BuildWebSocketURL(const FString& AgentIDOverride, const FString& APIKeyOverride) const
{
const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings();
// Custom URL override takes full precedence
if (!Settings->CustomWebSocketURL.IsEmpty())
{
return Settings->CustomWebSocketURL;
}
const FString ResolvedAgentID = AgentIDOverride.IsEmpty() ? Settings->AgentID : AgentIDOverride;
if (ResolvedAgentID.IsEmpty())
{
return FString();
}
// Official ElevenLabs Conversational AI WebSocket endpoint
// wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<ID>
return FString::Printf(
TEXT("wss://api.elevenlabs.io/v1/convai/conversation?agent_id=%s"),
*ResolvedAgentID);
}

View File

@ -1,50 +0,0 @@
// Copyright ASTERION. All Rights Reserved.
#include "PS_AI_Agent_ElevenLabs.h"
#include "Developer/Settings/Public/ISettingsModule.h"
#include "UObject/UObjectGlobals.h"
#include "UObject/Package.h"
IMPLEMENT_MODULE(FPS_AI_Agent_ElevenLabsModule, PS_AI_Agent_ElevenLabs)
#define LOCTEXT_NAMESPACE "PS_AI_Agent_ElevenLabs"
void FPS_AI_Agent_ElevenLabsModule::StartupModule()
{
Settings = NewObject<UElevenLabsSettings>(GetTransientPackage(), "ElevenLabsSettings", RF_Standalone);
Settings->AddToRoot();
if (ISettingsModule* SettingsModule = FModuleManager::GetModulePtr<ISettingsModule>("Settings"))
{
SettingsModule->RegisterSettings(
"Project", "Plugins", "ElevenLabsAIAgent",
LOCTEXT("SettingsName", "ElevenLabs AI Agent"),
LOCTEXT("SettingsDescription", "Configure the ElevenLabs Conversational AI Agent plugin"),
Settings);
}
}
void FPS_AI_Agent_ElevenLabsModule::ShutdownModule()
{
if (ISettingsModule* SettingsModule = FModuleManager::GetModulePtr<ISettingsModule>("Settings"))
{
SettingsModule->UnregisterSettings("Project", "Plugins", "ElevenLabsAIAgent");
}
if (!GExitPurge)
{
Settings->RemoveFromRoot();
}
else
{
Settings = nullptr;
}
}
UElevenLabsSettings* FPS_AI_Agent_ElevenLabsModule::GetSettings() const
{
check(Settings);
return Settings;
}
#undef LOCTEXT_NAMESPACE

View File

@ -1,233 +0,0 @@
// Copyright ASTERION. All Rights Reserved.
#pragma once
#include "CoreMinimal.h"
#include "Components/ActorComponent.h"
#include "ElevenLabsDefinitions.h"
#include "ElevenLabsWebSocketProxy.h"
#include "Sound/SoundWaveProcedural.h"
#include "ElevenLabsConversationalAgentComponent.generated.h"
class UAudioComponent;
class UElevenLabsMicrophoneCaptureComponent;
// ─────────────────────────────────────────────────────────────────────────────
// Delegates exposed to Blueprint
// ─────────────────────────────────────────────────────────────────────────────
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnAgentConnected,
const FElevenLabsConversationInfo&, ConversationInfo);
DECLARE_DYNAMIC_MULTICAST_DELEGATE_TwoParams(FOnAgentDisconnected,
int32, StatusCode, const FString&, Reason);
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnAgentError,
const FString&, ErrorMessage);
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnAgentTranscript,
const FElevenLabsTranscriptSegment&, Segment);
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnAgentTextResponse,
const FString&, ResponseText);
DECLARE_DYNAMIC_MULTICAST_DELEGATE(FOnAgentStartedSpeaking);
DECLARE_DYNAMIC_MULTICAST_DELEGATE(FOnAgentStoppedSpeaking);
DECLARE_DYNAMIC_MULTICAST_DELEGATE(FOnAgentInterrupted);
// ─────────────────────────────────────────────────────────────────────────────
// UElevenLabsConversationalAgentComponent
//
// Attach this to any Actor (e.g. a character NPC) to give it a voice powered by
// the ElevenLabs Conversational AI API.
//
// Workflow:
// 1. Set AgentID (or rely on project default).
// 2. Call StartConversation() to open the WebSocket.
// 3. Call StartListening() / StopListening() to control microphone capture.
// 4. React to events (OnAgentTranscript, OnAgentTextResponse, etc.) in Blueprint.
// 5. Call EndConversation() when done.
// ─────────────────────────────────────────────────────────────────────────────
UCLASS(ClassGroup = "ElevenLabs", meta = (BlueprintSpawnableComponent),
DisplayName = "ElevenLabs Conversational Agent")
class PS_AI_AGENT_ELEVENLABS_API UElevenLabsConversationalAgentComponent : public UActorComponent
{
GENERATED_BODY()
public:
UElevenLabsConversationalAgentComponent();
// ── Configuration ─────────────────────────────────────────────────────────
/**
* ElevenLabs Agent ID. Overrides the project-level default in Project Settings.
* Leave empty to use the project default.
*/
UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs")
FString AgentID;
/**
* Turn mode:
* - Server VAD: ElevenLabs detects end-of-speech automatically (recommended).
* - Client Controlled: you call StartListening/StopListening manually (push-to-talk).
*/
UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs")
EElevenLabsTurnMode TurnMode = EElevenLabsTurnMode::Server;
/**
* Automatically start listening (microphone capture) once the WebSocket is
* connected and the conversation is initiated.
*/
UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs")
bool bAutoStartListening = true;
// ── Events ────────────────────────────────────────────────────────────────
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnAgentConnected OnAgentConnected;
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnAgentDisconnected OnAgentDisconnected;
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnAgentError OnAgentError;
/** Fired for every transcript segment (user speech or agent speech, tentative and final). */
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnAgentTranscript OnAgentTranscript;
/** Final text response produced by the agent (mirrors the audio). */
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnAgentTextResponse OnAgentTextResponse;
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnAgentStartedSpeaking OnAgentStartedSpeaking;
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnAgentStoppedSpeaking OnAgentStoppedSpeaking;
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnAgentInterrupted OnAgentInterrupted;
// ── Control ───────────────────────────────────────────────────────────────
/**
* Open the WebSocket connection and start the conversation.
* If bAutoStartListening is true, microphone capture also starts once connected.
*/
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void StartConversation();
/** Close the WebSocket and stop all audio. */
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void EndConversation();
/**
* Start capturing microphone audio and streaming it to ElevenLabs.
* In Client turn mode, also sends a UserTurnStart signal.
*/
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void StartListening();
/**
* Stop capturing microphone audio.
* In Client turn mode, also sends a UserTurnEnd signal.
*/
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void StopListening();
/**
* Send a plain text message to the agent without using the microphone.
* The agent will respond with audio and text just as if it heard you speak.
* Useful for testing in the Editor or for text-based interaction.
*/
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void SendTextMessage(const FString& Text);
/** Interrupt the agent's current utterance. */
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void InterruptAgent();
// ── State queries ─────────────────────────────────────────────────────────
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
bool IsConnected() const;
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
bool IsListening() const { return bIsListening; }
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
bool IsAgentSpeaking() const { return bAgentSpeaking; }
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
const FElevenLabsConversationInfo& GetConversationInfo() const;
/** Access the underlying WebSocket proxy (advanced use). */
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
UElevenLabsWebSocketProxy* GetWebSocketProxy() const { return WebSocketProxy; }
// ─────────────────────────────────────────────────────────────────────────
// UActorComponent overrides
// ─────────────────────────────────────────────────────────────────────────
virtual void BeginPlay() override;
virtual void EndPlay(const EEndPlayReason::Type EndPlayReason) override;
virtual void TickComponent(float DeltaTime, ELevelTick TickType,
FActorComponentTickFunction* ThisTickFunction) override;
private:
// ── Internal event handlers ───────────────────────────────────────────────
UFUNCTION()
void HandleConnected(const FElevenLabsConversationInfo& Info);
UFUNCTION()
void HandleDisconnected(int32 StatusCode, const FString& Reason);
UFUNCTION()
void HandleError(const FString& ErrorMessage);
UFUNCTION()
void HandleAudioReceived(const TArray<uint8>& PCMData);
UFUNCTION()
void HandleTranscript(const FElevenLabsTranscriptSegment& Segment);
UFUNCTION()
void HandleAgentResponse(const FString& ResponseText);
UFUNCTION()
void HandleInterrupted();
// ── Audio playback ────────────────────────────────────────────────────────
void InitAudioPlayback();
void EnqueueAgentAudio(const TArray<uint8>& PCMData);
void StopAgentAudio();
/** Called by USoundWaveProcedural when it needs more PCM data. */
void OnProceduralUnderflow(USoundWaveProcedural* InProceduralWave, const int32 SamplesRequired);
// ── Microphone streaming ──────────────────────────────────────────────────
void OnMicrophoneDataCaptured(const TArray<float>& FloatPCM);
/** Convert float PCM to int16 little-endian bytes for ElevenLabs. */
static TArray<uint8> FloatPCMToInt16Bytes(const TArray<float>& FloatPCM);
// ── Sub-objects ───────────────────────────────────────────────────────────
UPROPERTY()
UElevenLabsWebSocketProxy* WebSocketProxy = nullptr;
UPROPERTY()
UAudioComponent* AudioPlaybackComponent = nullptr;
UPROPERTY()
USoundWaveProcedural* ProceduralSoundWave = nullptr;
// ── State ─────────────────────────────────────────────────────────────────
bool bIsListening = false;
bool bAgentSpeaking = false;
// Accumulates incoming PCM bytes until the audio component needs data.
TArray<uint8> AudioQueue;
FCriticalSection AudioQueueLock;
// Simple heuristic: if we haven't received audio data for this many ticks,
// consider the agent done speaking.
int32 SilentTickCount = 0;
static constexpr int32 SilenceThresholdTicks = 30; // ~0.5s at 60fps
};

View File

@ -1,109 +0,0 @@
// Copyright ASTERION. All Rights Reserved.
#pragma once
#include "CoreMinimal.h"
#include "ElevenLabsDefinitions.generated.h"
// ─────────────────────────────────────────────────────────────────────────────
// Connection state
// ─────────────────────────────────────────────────────────────────────────────
UENUM(BlueprintType)
enum class EElevenLabsConnectionState : uint8
{
Disconnected UMETA(DisplayName = "Disconnected"),
Connecting UMETA(DisplayName = "Connecting"),
Connected UMETA(DisplayName = "Connected"),
Error UMETA(DisplayName = "Error"),
};
// ─────────────────────────────────────────────────────────────────────────────
// Agent turn mode
// ─────────────────────────────────────────────────────────────────────────────
UENUM(BlueprintType)
enum class EElevenLabsTurnMode : uint8
{
/** ElevenLabs server decides when the user has finished speaking (default). */
Server UMETA(DisplayName = "Server VAD"),
/** Client explicitly signals turn start/end (manual push-to-talk). */
Client UMETA(DisplayName = "Client Controlled"),
};
// ─────────────────────────────────────────────────────────────────────────────
// WebSocket message type helpers (internal, not exposed to Blueprint)
// ─────────────────────────────────────────────────────────────────────────────
namespace ElevenLabsMessageType
{
// Client → Server
static const FString AudioChunk = TEXT("user_audio_chunk");
// Client turn mode: signal user is currently active/speaking
static const FString UserActivity = TEXT("user_activity");
// Client turn mode: send a text message without audio
static const FString UserMessage = TEXT("user_message");
static const FString Interrupt = TEXT("interrupt");
static const FString ClientToolResult = TEXT("client_tool_result");
static const FString ConversationClientData = TEXT("conversation_initiation_client_data");
// Server → Client
static const FString ConversationInitiation = TEXT("conversation_initiation_metadata");
static const FString AudioResponse = TEXT("audio");
// User speech-to-text transcript (speaker is always the user)
static const FString UserTranscript = TEXT("user_transcript");
static const FString AgentResponse = TEXT("agent_response");
static const FString AgentResponseCorrection= TEXT("agent_response_correction");
static const FString InterruptionEvent = TEXT("interruption");
static const FString PingEvent = TEXT("ping");
static const FString ClientToolCall = TEXT("client_tool_call");
static const FString InternalTentativeAgent = TEXT("internal_tentative_agent_response");
}
// ─────────────────────────────────────────────────────────────────────────────
// Audio format exchanged with ElevenLabs
// PCM 16-bit signed, 16000 Hz, mono, little-endian.
// ─────────────────────────────────────────────────────────────────────────────
namespace ElevenLabsAudio
{
static constexpr int32 SampleRate = 16000;
static constexpr int32 Channels = 1;
static constexpr int32 BitsPerSample = 16;
// Chunk size sent per WebSocket frame: 100 ms of audio
static constexpr int32 ChunkSamples = SampleRate / 10; // 1600 samples = 3200 bytes
}
// ─────────────────────────────────────────────────────────────────────────────
// Conversation metadata received on successful connection
// ─────────────────────────────────────────────────────────────────────────────
USTRUCT(BlueprintType)
struct PS_AI_AGENT_ELEVENLABS_API FElevenLabsConversationInfo
{
GENERATED_BODY()
/** Unique ID of this conversation session assigned by ElevenLabs. */
UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs")
FString ConversationID;
/** Agent ID that is responding. */
UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs")
FString AgentID;
};
// ─────────────────────────────────────────────────────────────────────────────
// Transcript segment
// ─────────────────────────────────────────────────────────────────────────────
USTRUCT(BlueprintType)
struct PS_AI_AGENT_ELEVENLABS_API FElevenLabsTranscriptSegment
{
GENERATED_BODY()
/** Transcribed text. */
UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs")
FString Text;
/** "user" or "agent". */
UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs")
FString Speaker;
/** Whether this is a final transcript or a tentative (in-progress) one. */
UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs")
bool bIsFinal = false;
};

View File

@ -1,73 +0,0 @@
// Copyright ASTERION. All Rights Reserved.
#pragma once
#include "CoreMinimal.h"
#include "Components/ActorComponent.h"
#include "AudioCapture.h"
#include "ElevenLabsMicrophoneCaptureComponent.generated.h"
// Delivers captured float PCM samples (16000 Hz mono, resampled from device rate).
DECLARE_MULTICAST_DELEGATE_OneParam(FOnElevenLabsAudioCaptured, const TArray<float>& /*FloatPCM*/);
/**
* Lightweight microphone capture component.
* Captures from the default audio input device, resamples to 16000 Hz mono,
* and delivers chunks via FOnElevenLabsAudioCaptured.
*
* Modelled after Convai's ConvaiAudioCaptureComponent but stripped to the
* minimal functionality needed for the ElevenLabs Conversational AI API.
*/
UCLASS(ClassGroup = "ElevenLabs", meta = (BlueprintSpawnableComponent),
DisplayName = "ElevenLabs Microphone Capture")
class PS_AI_AGENT_ELEVENLABS_API UElevenLabsMicrophoneCaptureComponent : public UActorComponent
{
GENERATED_BODY()
public:
UElevenLabsMicrophoneCaptureComponent();
/** Volume multiplier applied to captured samples before forwarding. */
UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs|Microphone",
meta = (ClampMin = "0.0", ClampMax = "4.0"))
float VolumeMultiplier = 1.0f;
/**
* Delegate fired on the game thread each time a new chunk of PCM audio
* is captured. Samples are float32, resampled to 16000 Hz mono.
*/
FOnElevenLabsAudioCaptured OnAudioCaptured;
/** Open the default capture device and begin streaming audio. */
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void StartCapture();
/** Stop streaming and close the capture device. */
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void StopCapture();
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
bool IsCapturing() const { return bCapturing; }
// ─────────────────────────────────────────────────────────────────────────
// UActorComponent overrides
// ─────────────────────────────────────────────────────────────────────────
virtual void EndPlay(const EEndPlayReason::Type EndPlayReason) override;
private:
/** Called by the audio capture callback on a background thread. Raw void* per UE 5.3+ API. */
void OnAudioGenerate(const void* InAudio, int32 NumFrames,
int32 InNumChannels, int32 InSampleRate, double StreamTime, bool bOverflow);
/** Simple linear resample from InSampleRate to 16000 Hz. Input is float32 frames. */
static TArray<float> ResampleTo16000(const float* InAudio, int32 NumFrames,
int32 InChannels, int32 InSampleRate);
Audio::FAudioCapture AudioCapture;
Audio::FAudioCaptureDeviceParams DeviceParams;
bool bCapturing = false;
// Device sample rate discovered on StartCapture
int32 DeviceSampleRate = 44100;
int32 DeviceChannels = 1;
};

View File

@ -1,186 +0,0 @@
// Copyright ASTERION. All Rights Reserved.
#pragma once
#include "CoreMinimal.h"
#include "UObject/NoExportTypes.h"
#include "ElevenLabsDefinitions.h"
#include "IWebSocket.h"
#include "ElevenLabsWebSocketProxy.generated.h"
// ─────────────────────────────────────────────────────────────────────────────
// Delegates (all Blueprint-assignable)
// ─────────────────────────────────────────────────────────────────────────────
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsConnected,
const FElevenLabsConversationInfo&, ConversationInfo);
DECLARE_DYNAMIC_MULTICAST_DELEGATE_TwoParams(FOnElevenLabsDisconnected,
int32, StatusCode, const FString&, Reason);
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsError,
const FString&, ErrorMessage);
/** Fired when a PCM audio chunk arrives from the agent. Raw bytes, 16-bit signed 16kHz mono. */
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsAudioReceived,
const TArray<uint8>&, PCMData);
/** Fired for user or agent transcript segments. */
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsTranscript,
const FElevenLabsTranscriptSegment&, Segment);
/** Fired with the final text response from the agent. */
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsAgentResponse,
const FString&, ResponseText);
/** Fired when the agent interrupts the user. */
DECLARE_DYNAMIC_MULTICAST_DELEGATE(FOnElevenLabsInterrupted);
// ─────────────────────────────────────────────────────────────────────────────
// WebSocket Proxy
// Manages the lifecycle of a single ElevenLabs Conversational AI WebSocket session.
// Instantiate via UElevenLabsConversationalAgentComponent (the component manages
// one proxy at a time), or create manually through Blueprints.
// ─────────────────────────────────────────────────────────────────────────────
UCLASS(BlueprintType, Blueprintable)
class PS_AI_AGENT_ELEVENLABS_API UElevenLabsWebSocketProxy : public UObject
{
GENERATED_BODY()
public:
// ── Events ────────────────────────────────────────────────────────────────
/** Called once the WebSocket handshake succeeds and the agent sends its initiation metadata. */
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnElevenLabsConnected OnConnected;
/** Called when the WebSocket closes (graceful or remote). */
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnElevenLabsDisconnected OnDisconnected;
/** Called on any connection or protocol error. */
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnElevenLabsError OnError;
/** Raw PCM audio coming from the agent — feed this into your audio component. */
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnElevenLabsAudioReceived OnAudioReceived;
/** User or agent transcript (may be tentative while the conversation is ongoing). */
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnElevenLabsTranscript OnTranscript;
/** Final text response from the agent (complements audio). */
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnElevenLabsAgentResponse OnAgentResponse;
/** The agent was interrupted by new user speech. */
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
FOnElevenLabsInterrupted OnInterrupted;
// ── Lifecycle ─────────────────────────────────────────────────────────────
/**
* Open a WebSocket connection to ElevenLabs.
* Uses settings from Project Settings unless overridden by the parameters.
*
* @param AgentID ElevenLabs agent ID. Overrides the project-level default when non-empty.
* @param APIKey API key. Overrides the project-level default when non-empty.
*/
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void Connect(const FString& AgentID = TEXT(""), const FString& APIKey = TEXT(""));
/**
* Gracefully close the WebSocket connection.
* OnDisconnected will fire after the server acknowledges.
*/
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void Disconnect();
/** Current connection state. */
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
EElevenLabsConnectionState GetConnectionState() const { return ConnectionState; }
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
bool IsConnected() const { return ConnectionState == EElevenLabsConnectionState::Connected; }
// ── Audio sending ─────────────────────────────────────────────────────────
/**
* Send a chunk of raw PCM audio to ElevenLabs.
* Audio must be 16-bit signed, 16000 Hz, mono, little-endian.
* The data is Base64-encoded and sent as a JSON message.
* Call this repeatedly while the microphone is capturing.
*
* @param PCMData Raw PCM bytes (16-bit LE, 16kHz, mono).
*/
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void SendAudioChunk(const TArray<uint8>& PCMData);
// ── Turn control (only relevant in Client turn mode) ──────────────────────
/**
* Signal that the user is actively speaking (Client turn mode).
* Sends a { "type": "user_activity" } message to the server.
* Call this periodically while the user is speaking (e.g. every audio chunk).
*/
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void SendUserTurnStart();
/**
* Signal that the user has finished speaking (Client turn mode).
* No explicit API message simply stop sending user_activity.
* The server detects silence and hands the turn to the agent.
*/
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void SendUserTurnEnd();
/**
* Send a text message to the agent (no microphone needed).
* Useful for testing or text-only interaction.
* Sends: { "type": "user_message", "text": "..." }
*/
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void SendTextMessage(const FString& Text);
/** Ask the agent to stop the current utterance. */
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
void SendInterrupt();
// ── Info ──────────────────────────────────────────────────────────────────
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
const FElevenLabsConversationInfo& GetConversationInfo() const { return ConversationInfo; }
// ─────────────────────────────────────────────────────────────────────────
// Internal
// ─────────────────────────────────────────────────────────────────────────
private:
void OnWsConnected();
void OnWsConnectionError(const FString& Error);
void OnWsClosed(int32 StatusCode, const FString& Reason, bool bWasClean);
void OnWsMessage(const FString& Message);
void OnWsBinaryMessage(const void* Data, SIZE_T Size, SIZE_T BytesRemaining);
void HandleConversationInitiation(const TSharedPtr<FJsonObject>& Payload);
void HandleAudioResponse(const TSharedPtr<FJsonObject>& Payload);
void HandleTranscript(const TSharedPtr<FJsonObject>& Payload);
void HandleAgentResponse(const TSharedPtr<FJsonObject>& Payload);
void HandleInterruption(const TSharedPtr<FJsonObject>& Payload);
void HandlePing(const TSharedPtr<FJsonObject>& Payload);
/** Build and send a JSON text frame to the server. */
void SendJsonMessage(const TSharedPtr<FJsonObject>& JsonObj);
/** Resolve the WebSocket URL from settings / parameters. */
FString BuildWebSocketURL(const FString& AgentID, const FString& APIKey) const;
TSharedPtr<IWebSocket> WebSocket;
EElevenLabsConnectionState ConnectionState = EElevenLabsConnectionState::Disconnected;
FElevenLabsConversationInfo ConversationInfo;
// Accumulation buffer for multi-fragment binary WebSocket frames.
// ElevenLabs sends JSON as binary frames; large messages arrive in fragments.
TArray<uint8> BinaryFrameBuffer;
};

View File

@ -1,99 +0,0 @@
// Copyright ASTERION. All Rights Reserved.
#pragma once
#include "CoreMinimal.h"
#include "Modules/ModuleManager.h"
#include "PS_AI_Agent_ElevenLabs.generated.h"
// ─────────────────────────────────────────────────────────────────────────────
// Settings object exposed in Project Settings → Plugins → ElevenLabs AI Agent
// ─────────────────────────────────────────────────────────────────────────────
UCLASS(config = Engine, defaultconfig)
class PS_AI_AGENT_ELEVENLABS_API UElevenLabsSettings : public UObject
{
GENERATED_BODY()
public:
UElevenLabsSettings(const FObjectInitializer& ObjectInitializer)
: Super(ObjectInitializer)
{
API_Key = TEXT("");
AgentID = TEXT("");
bSignedURLMode = false;
}
/**
* ElevenLabs API key.
* Obtain from https://elevenlabs.io used to authenticate WebSocket connections.
* Keep this secret; do not ship with the key hard-coded in a shipping build.
*/
UPROPERTY(Config, EditAnywhere, Category = "ElevenLabs API")
FString API_Key;
/**
* The default ElevenLabs Conversational Agent ID to use when none is specified
* on the component. Create agents at https://elevenlabs.io/app/conversational-ai
*/
UPROPERTY(Config, EditAnywhere, Category = "ElevenLabs API")
FString AgentID;
/**
* When true, the plugin fetches a signed WebSocket URL from your own backend
* before connecting, so the API key is never exposed in the client.
* Set SignedURLEndpoint to point to your server that returns the signed URL.
*/
UPROPERTY(Config, EditAnywhere, Category = "ElevenLabs API | Security")
bool bSignedURLMode;
/**
* Your backend endpoint that returns a signed WebSocket URL for ElevenLabs.
* Only used when bSignedURLMode = true.
* Expected response body: { "signed_url": "wss://..." }
*/
UPROPERTY(Config, EditAnywhere, Category = "ElevenLabs API | Security",
meta = (EditCondition = "bSignedURLMode"))
FString SignedURLEndpoint;
/**
* Override the ElevenLabs WebSocket base URL. Leave empty to use the default:
* wss://api.elevenlabs.io/v1/convai/conversation
*/
UPROPERTY(Config, EditAnywhere, AdvancedDisplay, Category = "ElevenLabs API")
FString CustomWebSocketURL;
/** Log verbose WebSocket messages to the Output Log (useful during development). */
UPROPERTY(Config, EditAnywhere, AdvancedDisplay, Category = "ElevenLabs API")
bool bVerboseLogging = false;
};
// ─────────────────────────────────────────────────────────────────────────────
// Module
// ─────────────────────────────────────────────────────────────────────────────
class PS_AI_AGENT_ELEVENLABS_API FPS_AI_Agent_ElevenLabsModule : public IModuleInterface
{
public:
/** IModuleInterface implementation */
virtual void StartupModule() override;
virtual void ShutdownModule() override;
virtual bool IsGameModule() const override { return true; }
/** Singleton access */
static inline FPS_AI_Agent_ElevenLabsModule& Get()
{
return FModuleManager::LoadModuleChecked<FPS_AI_Agent_ElevenLabsModule>("PS_AI_Agent_ElevenLabs");
}
static inline bool IsAvailable()
{
return FModuleManager::Get().IsModuleLoaded("PS_AI_Agent_ElevenLabs");
}
/** Access the settings object at runtime */
UElevenLabsSettings* GetSettings() const;
private:
UElevenLabsSettings* Settings = nullptr;
};