Update plugin documentation to v1.1.0
Reflects all bug fixes and new features added since initial release: - Binary WS frame handling (JSON vs raw PCM discrimination) - Corrected transcript message type and field names - Corrected pong format (top-level event_id) - Corrected client turn mode (user_activity, no explicit end message) - New SendTextMessage feature documented with Blueprint + C++ examples - Added Section 13: Changelog (v1.0.0 / v1.1.0) - Updated audio pipeline diagram for raw binary PCM output path - Added OnAgentConnected timing note (fires after initiation_metadata) - Added FTranscriptSegment clarification (speaker always "user") - Added API key / git workflow note in Security section - New troubleshooting entries for binary frames and OnAgentConnected - New "Test without microphone" common pattern Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
483456728d
commit
e464cfe288
@ -1,9 +1,9 @@
|
|||||||
# PS_AI_Agent_ElevenLabs — Plugin Documentation
|
# PS_AI_Agent_ElevenLabs — Plugin Documentation
|
||||||
|
|
||||||
**Engine**: Unreal Engine 5.5
|
**Engine**: Unreal Engine 5.5
|
||||||
**Plugin version**: 1.0.0
|
**Plugin version**: 1.1.0
|
||||||
**Status**: Beta
|
**Status**: Beta — tested on UE 5.5 Win64, verified connection and audio pipeline
|
||||||
**API**: [ElevenLabs Conversational AI](https://elevenlabs.io/docs/conversational-ai)
|
**API**: [ElevenLabs Conversational AI](https://elevenlabs.io/docs/eleven-agents/quickstart)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -24,6 +24,7 @@
|
|||||||
10. [Audio Pipeline](#10-audio-pipeline)
|
10. [Audio Pipeline](#10-audio-pipeline)
|
||||||
11. [Common Patterns](#11-common-patterns)
|
11. [Common Patterns](#11-common-patterns)
|
||||||
12. [Troubleshooting](#12-troubleshooting)
|
12. [Troubleshooting](#12-troubleshooting)
|
||||||
|
13. [Changelog](#13-changelog)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -44,7 +45,7 @@ UElevenLabsMicrophoneCaptureComponent
|
|||||||
▼
|
▼
|
||||||
UElevenLabsConversationalAgentComponent
|
UElevenLabsConversationalAgentComponent
|
||||||
• Converts float32 → int16 PCM bytes
|
• Converts float32 → int16 PCM bytes
|
||||||
• Sends via WebSocket to ElevenLabs
|
• Base64-encodes and sends via WebSocket
|
||||||
│ (wss://api.elevenlabs.io/v1/convai/conversation)
|
│ (wss://api.elevenlabs.io/v1/convai/conversation)
|
||||||
▼
|
▼
|
||||||
ElevenLabs Conversational AI Agent
|
ElevenLabs Conversational AI Agent
|
||||||
@ -54,7 +55,7 @@ ElevenLabs Conversational AI Agent
|
|||||||
│
|
│
|
||||||
▼
|
▼
|
||||||
UElevenLabsConversationalAgentComponent
|
UElevenLabsConversationalAgentComponent
|
||||||
• Receives Base64 PCM audio chunks
|
• Receives raw binary PCM audio frames
|
||||||
• Feeds USoundWaveProcedural → UAudioComponent
|
• Feeds USoundWaveProcedural → UAudioComponent
|
||||||
│
|
│
|
||||||
▼
|
▼
|
||||||
@ -66,6 +67,12 @@ Agent voice plays from the Actor's position in the world
|
|||||||
- Blueprint-first: all events and controls are exposed to Blueprint
|
- Blueprint-first: all events and controls are exposed to Blueprint
|
||||||
- Real-time bidirectional: audio streams in both directions simultaneously
|
- Real-time bidirectional: audio streams in both directions simultaneously
|
||||||
- Server VAD (default) or push-to-talk
|
- Server VAD (default) or push-to-talk
|
||||||
|
- Text input supported (no microphone needed for testing)
|
||||||
|
|
||||||
|
### Wire frame protocol notes
|
||||||
|
ElevenLabs sends **all WebSocket frames as binary** (not text frames). The plugin handles two binary frame types automatically:
|
||||||
|
- **JSON control frames** (start with `{`) — conversation init, transcripts, agent responses, ping/pong
|
||||||
|
- **Raw PCM audio frames** (binary) — agent speech audio, played directly via `USoundWaveProcedural`
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -103,14 +110,16 @@ Go to **Edit → Project Settings → Plugins → ElevenLabs AI Agent**.
|
|||||||
|
|
||||||
| Setting | Description | Required |
|
| Setting | Description | Required |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| **API Key** | Your ElevenLabs API key from [elevenlabs.io](https://elevenlabs.io) | Yes (unless using Signed URL Mode) |
|
| **API Key** | Your ElevenLabs API key. Find it at [elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys) | Yes (unless using Signed URL Mode or a public agent) |
|
||||||
| **Agent ID** | Default agent ID. Create agents at [elevenlabs.io/app/conversational-ai](https://elevenlabs.io/app/conversational-ai) | Yes (unless set per-component) |
|
| **Agent ID** | Default agent ID. Find it in the URL when editing an agent: `elevenlabs.io/app/conversational-ai/agents/<AGENT_ID>` | Yes (unless set per-component) |
|
||||||
| **Signed URL Mode** | Fetch the WS URL from your own backend (keeps key off client). See [Section 9](#9-security--signed-url-mode) | No |
|
| **Signed URL Mode** | Fetch the WS URL from your own backend (keeps key off client). See [Section 9](#9-security--signed-url-mode) | No |
|
||||||
| **Signed URL Endpoint** | Your backend URL returning `{ "signed_url": "wss://..." }` | Only if Signed URL Mode = true |
|
| **Signed URL Endpoint** | Your backend URL returning `{ "signed_url": "wss://..." }` | Only if Signed URL Mode = true |
|
||||||
| **Custom WebSocket URL** | Override the default `wss://api.elevenlabs.io/...` endpoint (debug only) | No |
|
| **Custom WebSocket URL** | Override the default `wss://api.elevenlabs.io/...` endpoint (debug only) | No |
|
||||||
| **Verbose Logging** | Log every WebSocket JSON frame to Output Log | No |
|
| **Verbose Logging** | Log every WebSocket frame type and first bytes to Output Log | No |
|
||||||
|
|
||||||
> **Security note**: Never ship with the API key hard-coded in a packaged build. Use Signed URL Mode for production, or load the key at runtime from a secure backend.
|
> **Security note**: The API key set in Project Settings is saved to `DefaultEngine.ini`. **Never commit this file with the key in it** — strip the `[ElevenLabsSettings]` section before committing. Use Signed URL Mode for production builds.
|
||||||
|
|
||||||
|
> **Finding your Agent ID**: Go to [elevenlabs.io/app/conversational-ai](https://elevenlabs.io/app/conversational-ai), click your agent, and copy the ID from the URL bar or the agent's Overview/API tab.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -135,7 +144,7 @@ Event BeginPlay
|
|||||||
└─► [ElevenLabs Agent] Start Conversation
|
└─► [ElevenLabs Agent] Start Conversation
|
||||||
|
|
||||||
[ElevenLabs Agent] On Agent Connected
|
[ElevenLabs Agent] On Agent Connected
|
||||||
└─► Print String "Connected! ID: " + Conversation Info → Conversation ID
|
└─► Print String "Connected! ConvID: " + Conversation Info → Conversation ID
|
||||||
|
|
||||||
[ElevenLabs Agent] On Agent Text Response
|
[ElevenLabs Agent] On Agent Text Response
|
||||||
└─► Set Text (UI widget) ← Response Text
|
└─► Set Text (UI widget) ← Response Text
|
||||||
@ -166,6 +175,17 @@ Input Action "Talk" (Released)
|
|||||||
└─► [ElevenLabs Agent] Stop Listening
|
└─► [ElevenLabs Agent] Stop Listening
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Step 5 — Testing without a microphone
|
||||||
|
|
||||||
|
Once connected, use **Send Text Message** instead of speaking:
|
||||||
|
|
||||||
|
```
|
||||||
|
[ElevenLabs Agent] On Agent Connected
|
||||||
|
└─► [ElevenLabs Agent] Send Text Message ← "Hello, who are you?"
|
||||||
|
```
|
||||||
|
|
||||||
|
The agent will reply with audio and text exactly as if it heard you speak.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 5. Quick Start (C++)
|
## 5. Quick Start (C++)
|
||||||
@ -206,7 +226,10 @@ ElevenLabsAgent->OnAgentStartedSpeaking.AddDynamic(
|
|||||||
// Start the conversation:
|
// Start the conversation:
|
||||||
ElevenLabsAgent->StartConversation();
|
ElevenLabsAgent->StartConversation();
|
||||||
|
|
||||||
// Later, to end it:
|
// Send a text message (useful for testing without mic):
|
||||||
|
ElevenLabsAgent->SendTextMessage(TEXT("Hello, who are you?"));
|
||||||
|
|
||||||
|
// Later, to end:
|
||||||
ElevenLabsAgent->EndConversation();
|
ElevenLabsAgent->EndConversation();
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -249,17 +272,18 @@ The **main component** — attach this to any Actor that should be able to speak
|
|||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| `AgentID` | `FString` | `""` | Agent ID for this actor. Overrides the project-level default when non-empty. |
|
| `AgentID` | `FString` | `""` | Agent ID for this actor. Overrides the project-level default when non-empty. |
|
||||||
| `TurnMode` | `EElevenLabsTurnMode` | `Server` | How speaker turns are detected. See [Section 8](#8-turn-modes). |
|
| `TurnMode` | `EElevenLabsTurnMode` | `Server` | How speaker turns are detected. See [Section 8](#8-turn-modes). |
|
||||||
| `bAutoStartListening` | `bool` | `true` | If true, starts mic capture automatically once the WebSocket is ready. |
|
| `bAutoStartListening` | `bool` | `true` | If true, starts mic capture automatically once the WebSocket is connected and ready. |
|
||||||
|
|
||||||
#### Functions
|
#### Functions
|
||||||
|
|
||||||
| Function | Blueprint | Description |
|
| Function | Blueprint | Description |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `StartConversation()` | Callable | Opens the WebSocket connection. If `bAutoStartListening` is true, mic capture starts once connected. |
|
| `StartConversation()` | Callable | Opens the WebSocket connection. If `bAutoStartListening` is true, mic capture starts once `OnAgentConnected` fires. |
|
||||||
| `EndConversation()` | Callable | Closes the WebSocket, stops mic, stops audio playback. |
|
| `EndConversation()` | Callable | Closes the WebSocket, stops mic, stops audio playback. |
|
||||||
| `StartListening()` | Callable | Starts microphone capture. In Client mode, also sends `user_turn_start` to ElevenLabs. |
|
| `StartListening()` | Callable | Starts microphone capture and streams to ElevenLabs. In Client mode, also sends `user_activity`. |
|
||||||
| `StopListening()` | Callable | Stops microphone capture. In Client mode, also sends `user_turn_end`. |
|
| `StopListening()` | Callable | Stops microphone capture. In Client mode, stops sending `user_activity`. |
|
||||||
| `InterruptAgent()` | Callable | Stops the agent's current utterance immediately. |
|
| `SendTextMessage(Text)` | Callable | Sends a text message to the agent without using the microphone. Agent replies with full audio + text. Useful for testing. |
|
||||||
|
| `InterruptAgent()` | Callable | Stops the agent's current utterance immediately and clears the audio queue. |
|
||||||
| `IsConnected()` | Pure | Returns true if the WebSocket is open and the conversation is active. |
|
| `IsConnected()` | Pure | Returns true if the WebSocket is open and the conversation is active. |
|
||||||
| `IsListening()` | Pure | Returns true if the microphone is currently capturing. |
|
| `IsListening()` | Pure | Returns true if the microphone is currently capturing. |
|
||||||
| `IsAgentSpeaking()` | Pure | Returns true if agent audio is currently playing. |
|
| `IsAgentSpeaking()` | Pure | Returns true if agent audio is currently playing. |
|
||||||
@ -270,13 +294,13 @@ The **main component** — attach this to any Actor that should be able to speak
|
|||||||
|
|
||||||
| Event | Parameters | Fired when |
|
| Event | Parameters | Fired when |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `OnAgentConnected` | `FElevenLabsConversationInfo` | WebSocket handshake + agent initiation complete. |
|
| `OnAgentConnected` | `FElevenLabsConversationInfo` | WebSocket handshake + agent initiation metadata received. Safe to call `SendTextMessage` here. |
|
||||||
| `OnAgentDisconnected` | `int32 StatusCode`, `FString Reason` | WebSocket closed (graceful or remote). |
|
| `OnAgentDisconnected` | `int32 StatusCode`, `FString Reason` | WebSocket closed (graceful or remote). |
|
||||||
| `OnAgentError` | `FString ErrorMessage` | Connection or protocol error. |
|
| `OnAgentError` | `FString ErrorMessage` | Connection or protocol error. |
|
||||||
| `OnAgentTranscript` | `FElevenLabsTranscriptSegment` | Any transcript arrives (user or agent, tentative or final). |
|
| `OnAgentTranscript` | `FElevenLabsTranscriptSegment` | User speech-to-text transcript received (speaker is always `"user"`). |
|
||||||
| `OnAgentTextResponse` | `FString ResponseText` | Final text response from the agent (complements the audio). |
|
| `OnAgentTextResponse` | `FString ResponseText` | Final text response from the agent (mirrors the audio). |
|
||||||
| `OnAgentStartedSpeaking` | — | First audio chunk received from the agent. |
|
| `OnAgentStartedSpeaking` | — | First audio chunk received from the agent (audio playback begins). |
|
||||||
| `OnAgentStoppedSpeaking` | — | Audio queue empty for ~0.5 s (agent done speaking). |
|
| `OnAgentStoppedSpeaking` | — | Audio queue empty for ~0.5 s (heuristic — agent done speaking). |
|
||||||
| `OnAgentInterrupted` | — | Agent speech was interrupted (by user or by `InterruptAgent()`). |
|
| `OnAgentInterrupted` | — | Agent speech was interrupted (by user or by `InterruptAgent()`). |
|
||||||
|
|
||||||
---
|
---
|
||||||
@ -292,19 +316,19 @@ A lightweight microphone capture component. Managed automatically by `UElevenLab
|
|||||||
|
|
||||||
| Property | Type | Default | Description |
|
| Property | Type | Default | Description |
|
||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| `VolumeMultiplier` | `float` | `1.0` | Gain applied to captured samples. Range: 0.0 – 4.0. |
|
| `VolumeMultiplier` | `float` | `1.0` | Gain applied to captured samples before resampling. Range: 0.0 – 4.0. |
|
||||||
|
|
||||||
#### Functions
|
#### Functions
|
||||||
|
|
||||||
| Function | Blueprint | Description |
|
| Function | Blueprint | Description |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `StartCapture()` | Callable | Opens the default audio input device and starts streaming. |
|
| `StartCapture()` | Callable | Opens the default audio input device and begins streaming. |
|
||||||
| `StopCapture()` | Callable | Stops streaming and closes the device. |
|
| `StopCapture()` | Callable | Stops streaming and closes the device. |
|
||||||
| `IsCapturing()` | Pure | True while actively capturing. |
|
| `IsCapturing()` | Pure | True while actively capturing. |
|
||||||
|
|
||||||
#### Delegate
|
#### Delegate
|
||||||
|
|
||||||
`OnAudioCaptured` — fires on the game thread with `TArray<float>` PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually.
|
`OnAudioCaptured` — fires on the **game thread** with `TArray<float>` PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -321,10 +345,11 @@ Low-level WebSocket session manager. Used internally by `UElevenLabsConversation
|
|||||||
|---|---|
|
|---|---|
|
||||||
| `Connect(AgentID, APIKey)` | Open the WS connection. Parameters override project settings when non-empty. |
|
| `Connect(AgentID, APIKey)` | Open the WS connection. Parameters override project settings when non-empty. |
|
||||||
| `Disconnect()` | Send close frame and tear down the connection. |
|
| `Disconnect()` | Send close frame and tear down the connection. |
|
||||||
| `SendAudioChunk(PCMData)` | Send raw int16 LE PCM bytes. Called automatically by the agent component. |
|
| `SendAudioChunk(PCMData)` | Send raw int16 LE PCM bytes as a Base64 JSON frame. Called automatically by the agent component. |
|
||||||
| `SendUserTurnStart()` | Signal start of user speech (Client turn mode only). |
|
| `SendTextMessage(Text)` | Send `{"type":"user_message","text":"..."}`. Agent replies as if it heard speech. |
|
||||||
| `SendUserTurnEnd()` | Signal end of user speech (Client turn mode only). |
|
| `SendUserTurnStart()` | Client turn mode: sends `{"type":"user_activity"}` to signal user is speaking. |
|
||||||
| `SendInterrupt()` | Ask the agent to stop speaking. |
|
| `SendUserTurnEnd()` | Client turn mode: stops sending `user_activity` (no explicit message — server detects silence). |
|
||||||
|
| `SendInterrupt()` | Ask the agent to stop speaking: sends `{"type":"interrupt"}`. |
|
||||||
| `GetConnectionState()` | Returns `EElevenLabsConnectionState`. |
|
| `GetConnectionState()` | Returns `EElevenLabsConnectionState`. |
|
||||||
| `GetConversationInfo()` | Returns `FElevenLabsConversationInfo`. |
|
| `GetConversationInfo()` | Returns `FElevenLabsConversationInfo`. |
|
||||||
|
|
||||||
@ -336,11 +361,13 @@ Low-level WebSocket session manager. Used internally by `UElevenLabsConversation
|
|||||||
|
|
||||||
```
|
```
|
||||||
Disconnected — No active connection
|
Disconnected — No active connection
|
||||||
Connecting — WebSocket handshake in progress
|
Connecting — WebSocket handshake in progress / awaiting conversation_initiation_metadata
|
||||||
Connected — Conversation active and ready
|
Connected — Conversation active and ready (fires OnAgentConnected)
|
||||||
Error — Connection or protocol failure
|
Error — Connection or protocol failure
|
||||||
```
|
```
|
||||||
|
|
||||||
|
> Note: State remains `Connecting` until the server sends `conversation_initiation_metadata`. `OnAgentConnected` fires on transition to `Connected`.
|
||||||
|
|
||||||
### EElevenLabsTurnMode
|
### EElevenLabsTurnMode
|
||||||
|
|
||||||
```
|
```
|
||||||
@ -352,15 +379,15 @@ Client — Your code calls StartListening/StopListening to define turns (push-t
|
|||||||
|
|
||||||
```
|
```
|
||||||
ConversationID FString — Unique session ID assigned by ElevenLabs
|
ConversationID FString — Unique session ID assigned by ElevenLabs
|
||||||
AgentID FString — The agent that responded
|
AgentID FString — The agent ID for this session
|
||||||
```
|
```
|
||||||
|
|
||||||
### FElevenLabsTranscriptSegment
|
### FElevenLabsTranscriptSegment
|
||||||
|
|
||||||
```
|
```
|
||||||
Text FString — Transcribed text
|
Text FString — Transcribed text
|
||||||
Speaker FString — "user" or "agent"
|
Speaker FString — "user" (agent text comes via OnAgentTextResponse, not transcript)
|
||||||
bIsFinal bool — false while still speaking, true when the turn is complete
|
bIsFinal bool — Always true for user transcripts (ElevenLabs sends final only)
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
@ -371,31 +398,31 @@ bIsFinal bool — false while still speaking, true when the turn is complet
|
|||||||
|
|
||||||
ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking.
|
ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking.
|
||||||
|
|
||||||
**When to use**: Casual conversation, hands-free interaction.
|
**When to use**: Casual conversation, hands-free interaction, natural dialogue.
|
||||||
|
|
||||||
```
|
```
|
||||||
StartConversation() → mic streams continuously
|
StartConversation() → mic streams continuously (if bAutoStartListening = true)
|
||||||
ElevenLabs detects speech / silence automatically
|
ElevenLabs detects speech / silence automatically
|
||||||
Agent replies when it detects end-of-speech
|
Agent replies when it detects end-of-speech
|
||||||
```
|
```
|
||||||
|
|
||||||
### Client Controlled (push-to-talk)
|
### Client Controlled (push-to-talk)
|
||||||
|
|
||||||
Your code explicitly signals turn boundaries with `StartListening()` / `StopListening()`.
|
Your code explicitly signals turn boundaries with `StartListening()` / `StopListening()`. The plugin sends `{"type":"user_activity"}` while the user is speaking; stopping it signals end of turn.
|
||||||
|
|
||||||
**When to use**: Noisy environments, precise control, walkie-talkie style.
|
**When to use**: Noisy environments, precise control, walkie-talkie style UI.
|
||||||
|
|
||||||
```
|
```
|
||||||
Input Pressed → StartListening() → sends user_turn_start + begins audio
|
Input Pressed → StartListening() → streams audio + sends user_activity
|
||||||
Input Released → StopListening() → stops audio + sends user_turn_end
|
Input Released → StopListening() → stops audio (no explicit end message)
|
||||||
Agent replies after user_turn_end
|
Server detects silence and hands turn to agent
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 9. Security — Signed URL Mode
|
## 9. Security — Signed URL Mode
|
||||||
|
|
||||||
By default, the API key is stored in Project Settings (Engine.ini). This is fine for development but **should not be shipped in packaged builds** as the key could be extracted.
|
By default, the API key is stored in Project Settings (`DefaultEngine.ini`). This is fine for development but **should not be shipped in packaged builds** as the key could be extracted.
|
||||||
|
|
||||||
### Production setup
|
### Production setup
|
||||||
|
|
||||||
@ -407,6 +434,13 @@ By default, the API key is stored in Project Settings (Engine.ini). This is fine
|
|||||||
```
|
```
|
||||||
4. The plugin fetches this URL before connecting — the API key never leaves your server.
|
4. The plugin fetches this URL before connecting — the API key never leaves your server.
|
||||||
|
|
||||||
|
### Development workflow (API key in project settings)
|
||||||
|
|
||||||
|
- Set the key in **Project Settings → Plugins → ElevenLabs AI Agent**
|
||||||
|
- UE saves it to `DefaultEngine.ini` under `[/Script/PS_AI_Agent_ElevenLabs.ElevenLabsSettings]`
|
||||||
|
- **Strip this section from `DefaultEngine.ini` before every git commit**
|
||||||
|
- Each developer sets the key locally — it does not go in version control
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 10. Audio Pipeline
|
## 10. Audio Pipeline
|
||||||
@ -415,40 +449,63 @@ By default, the API key is stored in Project Settings (Engine.ini). This is fine
|
|||||||
|
|
||||||
```
|
```
|
||||||
Device (any sample rate, any channels)
|
Device (any sample rate, any channels)
|
||||||
↓ FAudioCapture (UE built-in)
|
↓ FAudioCapture — UE built-in (UE 5.3+ API: OpenAudioCaptureStream)
|
||||||
↓ Callback: float32 interleaved frames
|
↓ Callback: const void* → cast to float32 interleaved frames
|
||||||
↓ Downmix to mono (average channels)
|
↓ Downmix to mono (average all channels)
|
||||||
↓ Resample to 16000 Hz (linear interpolation)
|
↓ Resample to 16000 Hz (linear interpolation)
|
||||||
↓ Apply VolumeMultiplier
|
↓ Apply VolumeMultiplier
|
||||||
↓ Dispatch to Game Thread
|
↓ Dispatch to Game Thread (AsyncTask)
|
||||||
↓ Convert float32 → int16 LE bytes
|
↓ Convert float32 → int16 signed, little-endian bytes
|
||||||
↓ Base64 encode
|
↓ Base64 encode
|
||||||
↓ WebSocket JSON frame: { "user_audio_chunk": "<base64>" }
|
↓ Send as binary WebSocket frame: { "user_audio_chunk": "<base64>" }
|
||||||
```
|
```
|
||||||
|
|
||||||
### Output (agent → player)
|
### Output (agent → player)
|
||||||
|
|
||||||
```
|
```
|
||||||
WebSocket JSON frame: { "type": "audio", "audio_event": { "audio_base_64": "..." } }
|
Binary WebSocket frame arrives
|
||||||
↓ Base64 decode → int16 LE PCM bytes
|
↓ Peek first byte:
|
||||||
↓ Enqueue in thread-safe AudioQueue
|
• '{' → UTF-8 JSON: parse type field, dispatch to handler
|
||||||
|
• other → raw PCM audio bytes
|
||||||
|
↓ [Audio path] Raw int16 LE PCM bytes at 16000 Hz mono
|
||||||
|
↓ Enqueue in thread-safe AudioQueue (FCriticalSection)
|
||||||
↓ USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue
|
↓ USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue
|
||||||
↓ UAudioComponent plays from the Actor's world position (3D spatialized)
|
↓ UAudioComponent plays from the Actor's world position (3D spatialized)
|
||||||
```
|
```
|
||||||
|
|
||||||
**Audio format** (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian.
|
**Audio format** (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian.
|
||||||
|
|
||||||
|
### Silence detection heuristic
|
||||||
|
|
||||||
|
`OnAgentStoppedSpeaking` fires when the `AudioQueue` has been empty for **30 consecutive ticks** (~0.5 s at 60 fps). If the agent has natural pauses, increase `SilenceThresholdTicks` in the header:
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 11. Common Patterns
|
## 11. Common Patterns
|
||||||
|
|
||||||
|
### Test the connection without a microphone
|
||||||
|
|
||||||
|
```
|
||||||
|
BeginPlay → StartConversation()
|
||||||
|
|
||||||
|
OnAgentConnected → SendTextMessage("Hello, introduce yourself")
|
||||||
|
|
||||||
|
OnAgentTextResponse → Print string (confirms text pipeline works)
|
||||||
|
OnAgentStartedSpeaking → (confirms audio pipeline works)
|
||||||
|
```
|
||||||
|
|
||||||
### Show subtitles in UI
|
### Show subtitles in UI
|
||||||
|
|
||||||
```
|
```
|
||||||
OnAgentTranscript event:
|
OnAgentTranscript:
|
||||||
├─ Segment → Speaker == "user" → show in player subtitle widget
|
Segment → Text → show in player subtitle widget (speaker always "user")
|
||||||
├─ Segment → Speaker == "agent" → show in NPC speech bubble
|
|
||||||
└─ Segment → bIsFinal == false → show as "..." (in-progress)
|
OnAgentTextResponse:
|
||||||
|
ResponseText → show in NPC speech bubble
|
||||||
```
|
```
|
||||||
|
|
||||||
### Interrupt the agent when the player starts speaking
|
### Interrupt the agent when the player starts speaking
|
||||||
@ -456,13 +513,13 @@ OnAgentTranscript event:
|
|||||||
In Server VAD mode ElevenLabs handles this automatically. For manual control:
|
In Server VAD mode ElevenLabs handles this automatically. For manual control:
|
||||||
|
|
||||||
```
|
```
|
||||||
OnAgentStartedSpeaking → store "agent is speaking" flag
|
OnAgentStartedSpeaking → set "agent is speaking" flag
|
||||||
Input Action (any) → if agent is speaking → InterruptAgent()
|
Input Action (any) → if agent is speaking → InterruptAgent()
|
||||||
```
|
```
|
||||||
|
|
||||||
### Multiple NPCs with different agents
|
### Multiple NPCs with different agents
|
||||||
|
|
||||||
Each NPC Blueprint has its own `UElevenLabsConversationalAgentComponent`. Set a different `AgentID` on each component. Connections are fully independent.
|
Each NPC Blueprint has its own `UElevenLabsConversationalAgentComponent`. Set a different `AgentID` on each component. WebSocket connections are fully independent.
|
||||||
|
|
||||||
### Only start the conversation when the player is nearby
|
### Only start the conversation when the player is nearby
|
||||||
|
|
||||||
@ -474,7 +531,7 @@ On End Overlap
|
|||||||
└─► [ElevenLabs Agent] End Conversation
|
└─► [ElevenLabs Agent] End Conversation
|
||||||
```
|
```
|
||||||
|
|
||||||
### Adjusting microphone volume
|
### Adjust microphone volume
|
||||||
|
|
||||||
Get the `UElevenLabsMicrophoneCaptureComponent` from the owner and set `VolumeMultiplier`:
|
Get the `UElevenLabsMicrophoneCaptureComponent` from the owner and set `VolumeMultiplier`:
|
||||||
|
|
||||||
@ -495,37 +552,68 @@ Ensure the plugin is enabled in `.uproject` and the project was recompiled after
|
|||||||
### WebSocket connection fails immediately
|
### WebSocket connection fails immediately
|
||||||
|
|
||||||
- Check the **API Key** is set correctly in Project Settings.
|
- Check the **API Key** is set correctly in Project Settings.
|
||||||
- Check the **Agent ID** exists in your ElevenLabs account.
|
- Check the **Agent ID** exists in your ElevenLabs account (find it in the dashboard URL or via `GET /v1/convai/agents`).
|
||||||
- Enable **Verbose Logging** in Project Settings and check the Output Log for the exact WebSocket URL and error.
|
- Enable **Verbose Logging** in Project Settings and check Output Log for the exact WS URL and error.
|
||||||
- Make sure your machine has internet access and port 443 (WSS) is not blocked.
|
- Ensure port 443 (WSS) is not blocked by your firewall.
|
||||||
|
|
||||||
|
### `OnAgentConnected` never fires
|
||||||
|
|
||||||
|
- Connection was made but `conversation_initiation_metadata` not received yet — check Verbose Logging.
|
||||||
|
- If you see `"Binary audio frame"` logs but no `"Conversation initiated"` — the initiation JSON frame may be arriving as a non-`{` binary frame. Check the hex prefix logged at Verbose level.
|
||||||
|
|
||||||
### No audio from the microphone
|
### No audio from the microphone
|
||||||
|
|
||||||
- Windows may require microphone permission. Check **Settings → Privacy → Microphone**.
|
- Windows may require microphone permission. Check **Settings → Privacy → Microphone**.
|
||||||
- Try setting `VolumeMultiplier` to `2.0` to rule out a volume issue.
|
- Try setting `VolumeMultiplier` to `2.0` on the `MicrophoneCaptureComponent`.
|
||||||
- Check the Output Log for `"Failed to open default audio capture stream"`.
|
- Check Output Log for `"Failed to open default audio capture stream"`.
|
||||||
|
|
||||||
### Agent audio is choppy or silent
|
### Agent audio is choppy or silent
|
||||||
|
|
||||||
- The `USoundWaveProcedural` queue may be underflowing. This can happen if audio chunks arrive with long gaps. Check network latency.
|
- The `USoundWaveProcedural` queue may be underflowing due to network jitter. Check latency.
|
||||||
- Ensure no other component is consuming the same `UAudioComponent`.
|
- Verify the audio format matches: plugin expects raw PCM 16-bit 16 kHz mono from the server. If ElevenLabs sends a different format (e.g. mp3_44100), audio will sound garbled — check `agent_output_audio_format` in the `conversation_initiation_metadata` via Verbose Logging.
|
||||||
|
- Ensure no other component is using the same `UAudioComponent`.
|
||||||
|
|
||||||
### `OnAgentStoppedSpeaking` fires too early
|
### `OnAgentStoppedSpeaking` fires too early
|
||||||
|
|
||||||
The silence detection threshold is 30 ticks (~0.5 s at 60 fps). If the agent has natural pauses in speech, increase `SilenceThresholdTicks` in `ElevenLabsConversationalAgentComponent.h`:
|
Increase `SilenceThresholdTicks` in `ElevenLabsConversationalAgentComponent.h`:
|
||||||
|
|
||||||
```cpp
|
```cpp
|
||||||
static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s
|
static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s at 60fps
|
||||||
```
|
```
|
||||||
|
|
||||||
### Build error: "Plugin AudioCapture not found"
|
### Build error: "Plugin AudioCapture not found"
|
||||||
|
|
||||||
Make sure the `AudioCapture` plugin is enabled in your project. It should be auto-enabled via the `.uplugin` dependency, but you can also add it manually to `.uproject`:
|
Make sure the `AudioCapture` plugin is enabled. It should be auto-enabled via the `.uplugin` dependency, but you can add it manually to `.uproject`:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{ "Name": "AudioCapture", "Enabled": true }
|
{ "Name": "AudioCapture", "Enabled": true }
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### `"Received unexpected binary WebSocket frame"` in the log
|
||||||
|
|
||||||
|
This warning no longer appears in v1.1.0+. If you see it, you are running an older build — recompile the plugin.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
*Documentation generated 2026-02-19 — Plugin v1.0.0 — UE 5.5*
|
## 13. Changelog
|
||||||
|
|
||||||
|
### v1.1.0 — 2026-02-19
|
||||||
|
|
||||||
|
**Bug fixes:**
|
||||||
|
- **Binary WebSocket frames**: ElevenLabs sends all frames as binary (not text). All frames were previously discarded. Now correctly handled — JSON control frames decoded as UTF-8, raw PCM audio frames routed directly to the audio queue.
|
||||||
|
- **Transcript message**: Wrong message type (`"transcript"` → `"user_transcript"`), wrong event key (`"transcript_event"` → `"user_transcription_event"`), wrong text field (`"message"` → `"user_transcript"`).
|
||||||
|
- **Pong format**: `event_id` was nested inside a `pong_event` object; corrected to top-level field per API spec.
|
||||||
|
- **Client turn mode**: `user_turn_start`/`user_turn_end` are not valid API messages; replaced with `user_activity` (start) and implicit silence (end).
|
||||||
|
|
||||||
|
**New features:**
|
||||||
|
- `SendTextMessage(Text)` on both `UElevenLabsConversationalAgentComponent` and `UElevenLabsWebSocketProxy` — send text to the agent without a microphone. Useful for testing.
|
||||||
|
- Verbose logging shows binary frame hex preview and JSON frame content prefix.
|
||||||
|
- Improved JSON parse error log now shows the first 80 characters of the failing message.
|
||||||
|
|
||||||
|
### v1.0.0 — 2026-02-19
|
||||||
|
|
||||||
|
Initial implementation. Plugin compiles cleanly on UE 5.5 Win64.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Documentation updated 2026-02-19 — Plugin v1.1.0 — UE 5.5*
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user