Update plugin documentation to v1.1.0
Reflects all bug fixes and new features added since initial release: - Binary WS frame handling (JSON vs raw PCM discrimination) - Corrected transcript message type and field names - Corrected pong format (top-level event_id) - Corrected client turn mode (user_activity, no explicit end message) - New SendTextMessage feature documented with Blueprint + C++ examples - Added Section 13: Changelog (v1.0.0 / v1.1.0) - Updated audio pipeline diagram for raw binary PCM output path - Added OnAgentConnected timing note (fires after initiation_metadata) - Added FTranscriptSegment clarification (speaker always "user") - Added API key / git workflow note in Security section - New troubleshooting entries for binary frames and OnAgentConnected - New "Test without microphone" common pattern Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
483456728d
commit
e464cfe288
@ -1,9 +1,9 @@
|
||||
# PS_AI_Agent_ElevenLabs — Plugin Documentation
|
||||
|
||||
**Engine**: Unreal Engine 5.5
|
||||
**Plugin version**: 1.0.0
|
||||
**Status**: Beta
|
||||
**API**: [ElevenLabs Conversational AI](https://elevenlabs.io/docs/conversational-ai)
|
||||
**Plugin version**: 1.1.0
|
||||
**Status**: Beta — tested on UE 5.5 Win64, verified connection and audio pipeline
|
||||
**API**: [ElevenLabs Conversational AI](https://elevenlabs.io/docs/eleven-agents/quickstart)
|
||||
|
||||
---
|
||||
|
||||
@ -24,6 +24,7 @@
|
||||
10. [Audio Pipeline](#10-audio-pipeline)
|
||||
11. [Common Patterns](#11-common-patterns)
|
||||
12. [Troubleshooting](#12-troubleshooting)
|
||||
13. [Changelog](#13-changelog)
|
||||
|
||||
---
|
||||
|
||||
@ -44,7 +45,7 @@ UElevenLabsMicrophoneCaptureComponent
|
||||
▼
|
||||
UElevenLabsConversationalAgentComponent
|
||||
• Converts float32 → int16 PCM bytes
|
||||
• Sends via WebSocket to ElevenLabs
|
||||
• Base64-encodes and sends via WebSocket
|
||||
│ (wss://api.elevenlabs.io/v1/convai/conversation)
|
||||
▼
|
||||
ElevenLabs Conversational AI Agent
|
||||
@ -54,7 +55,7 @@ ElevenLabs Conversational AI Agent
|
||||
│
|
||||
▼
|
||||
UElevenLabsConversationalAgentComponent
|
||||
• Receives Base64 PCM audio chunks
|
||||
• Receives raw binary PCM audio frames
|
||||
• Feeds USoundWaveProcedural → UAudioComponent
|
||||
│
|
||||
▼
|
||||
@ -66,6 +67,12 @@ Agent voice plays from the Actor's position in the world
|
||||
- Blueprint-first: all events and controls are exposed to Blueprint
|
||||
- Real-time bidirectional: audio streams in both directions simultaneously
|
||||
- Server VAD (default) or push-to-talk
|
||||
- Text input supported (no microphone needed for testing)
|
||||
|
||||
### Wire frame protocol notes
|
||||
ElevenLabs sends **all WebSocket frames as binary** (not text frames). The plugin handles two binary frame types automatically:
|
||||
- **JSON control frames** (start with `{`) — conversation init, transcripts, agent responses, ping/pong
|
||||
- **Raw PCM audio frames** (binary) — agent speech audio, played directly via `USoundWaveProcedural`
|
||||
|
||||
---
|
||||
|
||||
@ -103,14 +110,16 @@ Go to **Edit → Project Settings → Plugins → ElevenLabs AI Agent**.
|
||||
|
||||
| Setting | Description | Required |
|
||||
|---|---|---|
|
||||
| **API Key** | Your ElevenLabs API key from [elevenlabs.io](https://elevenlabs.io) | Yes (unless using Signed URL Mode) |
|
||||
| **Agent ID** | Default agent ID. Create agents at [elevenlabs.io/app/conversational-ai](https://elevenlabs.io/app/conversational-ai) | Yes (unless set per-component) |
|
||||
| **API Key** | Your ElevenLabs API key. Find it at [elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys) | Yes (unless using Signed URL Mode or a public agent) |
|
||||
| **Agent ID** | Default agent ID. Find it in the URL when editing an agent: `elevenlabs.io/app/conversational-ai/agents/<AGENT_ID>` | Yes (unless set per-component) |
|
||||
| **Signed URL Mode** | Fetch the WS URL from your own backend (keeps key off client). See [Section 9](#9-security--signed-url-mode) | No |
|
||||
| **Signed URL Endpoint** | Your backend URL returning `{ "signed_url": "wss://..." }` | Only if Signed URL Mode = true |
|
||||
| **Custom WebSocket URL** | Override the default `wss://api.elevenlabs.io/...` endpoint (debug only) | No |
|
||||
| **Verbose Logging** | Log every WebSocket JSON frame to Output Log | No |
|
||||
| **Verbose Logging** | Log every WebSocket frame type and first bytes to Output Log | No |
|
||||
|
||||
> **Security note**: Never ship with the API key hard-coded in a packaged build. Use Signed URL Mode for production, or load the key at runtime from a secure backend.
|
||||
> **Security note**: The API key set in Project Settings is saved to `DefaultEngine.ini`. **Never commit this file with the key in it** — strip the `[ElevenLabsSettings]` section before committing. Use Signed URL Mode for production builds.
|
||||
|
||||
> **Finding your Agent ID**: Go to [elevenlabs.io/app/conversational-ai](https://elevenlabs.io/app/conversational-ai), click your agent, and copy the ID from the URL bar or the agent's Overview/API tab.
|
||||
|
||||
---
|
||||
|
||||
@ -135,7 +144,7 @@ Event BeginPlay
|
||||
└─► [ElevenLabs Agent] Start Conversation
|
||||
|
||||
[ElevenLabs Agent] On Agent Connected
|
||||
└─► Print String "Connected! ID: " + Conversation Info → Conversation ID
|
||||
└─► Print String "Connected! ConvID: " + Conversation Info → Conversation ID
|
||||
|
||||
[ElevenLabs Agent] On Agent Text Response
|
||||
└─► Set Text (UI widget) ← Response Text
|
||||
@ -166,6 +175,17 @@ Input Action "Talk" (Released)
|
||||
└─► [ElevenLabs Agent] Stop Listening
|
||||
```
|
||||
|
||||
### Step 5 — Testing without a microphone
|
||||
|
||||
Once connected, use **Send Text Message** instead of speaking:
|
||||
|
||||
```
|
||||
[ElevenLabs Agent] On Agent Connected
|
||||
└─► [ElevenLabs Agent] Send Text Message ← "Hello, who are you?"
|
||||
```
|
||||
|
||||
The agent will reply with audio and text exactly as if it heard you speak.
|
||||
|
||||
---
|
||||
|
||||
## 5. Quick Start (C++)
|
||||
@ -206,7 +226,10 @@ ElevenLabsAgent->OnAgentStartedSpeaking.AddDynamic(
|
||||
// Start the conversation:
|
||||
ElevenLabsAgent->StartConversation();
|
||||
|
||||
// Later, to end it:
|
||||
// Send a text message (useful for testing without mic):
|
||||
ElevenLabsAgent->SendTextMessage(TEXT("Hello, who are you?"));
|
||||
|
||||
// Later, to end:
|
||||
ElevenLabsAgent->EndConversation();
|
||||
```
|
||||
|
||||
@ -249,17 +272,18 @@ The **main component** — attach this to any Actor that should be able to speak
|
||||
|---|---|---|---|
|
||||
| `AgentID` | `FString` | `""` | Agent ID for this actor. Overrides the project-level default when non-empty. |
|
||||
| `TurnMode` | `EElevenLabsTurnMode` | `Server` | How speaker turns are detected. See [Section 8](#8-turn-modes). |
|
||||
| `bAutoStartListening` | `bool` | `true` | If true, starts mic capture automatically once the WebSocket is ready. |
|
||||
| `bAutoStartListening` | `bool` | `true` | If true, starts mic capture automatically once the WebSocket is connected and ready. |
|
||||
|
||||
#### Functions
|
||||
|
||||
| Function | Blueprint | Description |
|
||||
|---|---|---|
|
||||
| `StartConversation()` | Callable | Opens the WebSocket connection. If `bAutoStartListening` is true, mic capture starts once connected. |
|
||||
| `StartConversation()` | Callable | Opens the WebSocket connection. If `bAutoStartListening` is true, mic capture starts once `OnAgentConnected` fires. |
|
||||
| `EndConversation()` | Callable | Closes the WebSocket, stops mic, stops audio playback. |
|
||||
| `StartListening()` | Callable | Starts microphone capture. In Client mode, also sends `user_turn_start` to ElevenLabs. |
|
||||
| `StopListening()` | Callable | Stops microphone capture. In Client mode, also sends `user_turn_end`. |
|
||||
| `InterruptAgent()` | Callable | Stops the agent's current utterance immediately. |
|
||||
| `StartListening()` | Callable | Starts microphone capture and streams to ElevenLabs. In Client mode, also sends `user_activity`. |
|
||||
| `StopListening()` | Callable | Stops microphone capture. In Client mode, stops sending `user_activity`. |
|
||||
| `SendTextMessage(Text)` | Callable | Sends a text message to the agent without using the microphone. Agent replies with full audio + text. Useful for testing. |
|
||||
| `InterruptAgent()` | Callable | Stops the agent's current utterance immediately and clears the audio queue. |
|
||||
| `IsConnected()` | Pure | Returns true if the WebSocket is open and the conversation is active. |
|
||||
| `IsListening()` | Pure | Returns true if the microphone is currently capturing. |
|
||||
| `IsAgentSpeaking()` | Pure | Returns true if agent audio is currently playing. |
|
||||
@ -270,13 +294,13 @@ The **main component** — attach this to any Actor that should be able to speak
|
||||
|
||||
| Event | Parameters | Fired when |
|
||||
|---|---|---|
|
||||
| `OnAgentConnected` | `FElevenLabsConversationInfo` | WebSocket handshake + agent initiation complete. |
|
||||
| `OnAgentConnected` | `FElevenLabsConversationInfo` | WebSocket handshake + agent initiation metadata received. Safe to call `SendTextMessage` here. |
|
||||
| `OnAgentDisconnected` | `int32 StatusCode`, `FString Reason` | WebSocket closed (graceful or remote). |
|
||||
| `OnAgentError` | `FString ErrorMessage` | Connection or protocol error. |
|
||||
| `OnAgentTranscript` | `FElevenLabsTranscriptSegment` | Any transcript arrives (user or agent, tentative or final). |
|
||||
| `OnAgentTextResponse` | `FString ResponseText` | Final text response from the agent (complements the audio). |
|
||||
| `OnAgentStartedSpeaking` | — | First audio chunk received from the agent. |
|
||||
| `OnAgentStoppedSpeaking` | — | Audio queue empty for ~0.5 s (agent done speaking). |
|
||||
| `OnAgentTranscript` | `FElevenLabsTranscriptSegment` | User speech-to-text transcript received (speaker is always `"user"`). |
|
||||
| `OnAgentTextResponse` | `FString ResponseText` | Final text response from the agent (mirrors the audio). |
|
||||
| `OnAgentStartedSpeaking` | — | First audio chunk received from the agent (audio playback begins). |
|
||||
| `OnAgentStoppedSpeaking` | — | Audio queue empty for ~0.5 s (heuristic — agent done speaking). |
|
||||
| `OnAgentInterrupted` | — | Agent speech was interrupted (by user or by `InterruptAgent()`). |
|
||||
|
||||
---
|
||||
@ -292,19 +316,19 @@ A lightweight microphone capture component. Managed automatically by `UElevenLab
|
||||
|
||||
| Property | Type | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `VolumeMultiplier` | `float` | `1.0` | Gain applied to captured samples. Range: 0.0 – 4.0. |
|
||||
| `VolumeMultiplier` | `float` | `1.0` | Gain applied to captured samples before resampling. Range: 0.0 – 4.0. |
|
||||
|
||||
#### Functions
|
||||
|
||||
| Function | Blueprint | Description |
|
||||
|---|---|---|
|
||||
| `StartCapture()` | Callable | Opens the default audio input device and starts streaming. |
|
||||
| `StartCapture()` | Callable | Opens the default audio input device and begins streaming. |
|
||||
| `StopCapture()` | Callable | Stops streaming and closes the device. |
|
||||
| `IsCapturing()` | Pure | True while actively capturing. |
|
||||
|
||||
#### Delegate
|
||||
|
||||
`OnAudioCaptured` — fires on the game thread with `TArray<float>` PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually.
|
||||
`OnAudioCaptured` — fires on the **game thread** with `TArray<float>` PCM samples at 16 kHz mono. Bind to this if you want to process or forward audio manually.
|
||||
|
||||
---
|
||||
|
||||
@ -321,10 +345,11 @@ Low-level WebSocket session manager. Used internally by `UElevenLabsConversation
|
||||
|---|---|
|
||||
| `Connect(AgentID, APIKey)` | Open the WS connection. Parameters override project settings when non-empty. |
|
||||
| `Disconnect()` | Send close frame and tear down the connection. |
|
||||
| `SendAudioChunk(PCMData)` | Send raw int16 LE PCM bytes. Called automatically by the agent component. |
|
||||
| `SendUserTurnStart()` | Signal start of user speech (Client turn mode only). |
|
||||
| `SendUserTurnEnd()` | Signal end of user speech (Client turn mode only). |
|
||||
| `SendInterrupt()` | Ask the agent to stop speaking. |
|
||||
| `SendAudioChunk(PCMData)` | Send raw int16 LE PCM bytes as a Base64 JSON frame. Called automatically by the agent component. |
|
||||
| `SendTextMessage(Text)` | Send `{"type":"user_message","text":"..."}`. Agent replies as if it heard speech. |
|
||||
| `SendUserTurnStart()` | Client turn mode: sends `{"type":"user_activity"}` to signal user is speaking. |
|
||||
| `SendUserTurnEnd()` | Client turn mode: stops sending `user_activity` (no explicit message — server detects silence). |
|
||||
| `SendInterrupt()` | Ask the agent to stop speaking: sends `{"type":"interrupt"}`. |
|
||||
| `GetConnectionState()` | Returns `EElevenLabsConnectionState`. |
|
||||
| `GetConversationInfo()` | Returns `FElevenLabsConversationInfo`. |
|
||||
|
||||
@ -336,11 +361,13 @@ Low-level WebSocket session manager. Used internally by `UElevenLabsConversation
|
||||
|
||||
```
|
||||
Disconnected — No active connection
|
||||
Connecting — WebSocket handshake in progress
|
||||
Connected — Conversation active and ready
|
||||
Connecting — WebSocket handshake in progress / awaiting conversation_initiation_metadata
|
||||
Connected — Conversation active and ready (fires OnAgentConnected)
|
||||
Error — Connection or protocol failure
|
||||
```
|
||||
|
||||
> Note: State remains `Connecting` until the server sends `conversation_initiation_metadata`. `OnAgentConnected` fires on transition to `Connected`.
|
||||
|
||||
### EElevenLabsTurnMode
|
||||
|
||||
```
|
||||
@ -352,15 +379,15 @@ Client — Your code calls StartListening/StopListening to define turns (push-t
|
||||
|
||||
```
|
||||
ConversationID FString — Unique session ID assigned by ElevenLabs
|
||||
AgentID FString — The agent that responded
|
||||
AgentID FString — The agent ID for this session
|
||||
```
|
||||
|
||||
### FElevenLabsTranscriptSegment
|
||||
|
||||
```
|
||||
Text FString — Transcribed text
|
||||
Speaker FString — "user" or "agent"
|
||||
bIsFinal bool — false while still speaking, true when the turn is complete
|
||||
Speaker FString — "user" (agent text comes via OnAgentTextResponse, not transcript)
|
||||
bIsFinal bool — Always true for user transcripts (ElevenLabs sends final only)
|
||||
```
|
||||
|
||||
---
|
||||
@ -371,31 +398,31 @@ bIsFinal bool — false while still speaking, true when the turn is complet
|
||||
|
||||
ElevenLabs runs Voice Activity Detection on the server. The plugin streams microphone audio continuously and ElevenLabs decides when the user has finished speaking.
|
||||
|
||||
**When to use**: Casual conversation, hands-free interaction.
|
||||
**When to use**: Casual conversation, hands-free interaction, natural dialogue.
|
||||
|
||||
```
|
||||
StartConversation() → mic streams continuously
|
||||
StartConversation() → mic streams continuously (if bAutoStartListening = true)
|
||||
ElevenLabs detects speech / silence automatically
|
||||
Agent replies when it detects end-of-speech
|
||||
```
|
||||
|
||||
### Client Controlled (push-to-talk)
|
||||
|
||||
Your code explicitly signals turn boundaries with `StartListening()` / `StopListening()`.
|
||||
Your code explicitly signals turn boundaries with `StartListening()` / `StopListening()`. The plugin sends `{"type":"user_activity"}` while the user is speaking; stopping it signals end of turn.
|
||||
|
||||
**When to use**: Noisy environments, precise control, walkie-talkie style.
|
||||
**When to use**: Noisy environments, precise control, walkie-talkie style UI.
|
||||
|
||||
```
|
||||
Input Pressed → StartListening() → sends user_turn_start + begins audio
|
||||
Input Released → StopListening() → stops audio + sends user_turn_end
|
||||
Agent replies after user_turn_end
|
||||
Input Pressed → StartListening() → streams audio + sends user_activity
|
||||
Input Released → StopListening() → stops audio (no explicit end message)
|
||||
Server detects silence and hands turn to agent
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Security — Signed URL Mode
|
||||
|
||||
By default, the API key is stored in Project Settings (Engine.ini). This is fine for development but **should not be shipped in packaged builds** as the key could be extracted.
|
||||
By default, the API key is stored in Project Settings (`DefaultEngine.ini`). This is fine for development but **should not be shipped in packaged builds** as the key could be extracted.
|
||||
|
||||
### Production setup
|
||||
|
||||
@ -407,6 +434,13 @@ By default, the API key is stored in Project Settings (Engine.ini). This is fine
|
||||
```
|
||||
4. The plugin fetches this URL before connecting — the API key never leaves your server.
|
||||
|
||||
### Development workflow (API key in project settings)
|
||||
|
||||
- Set the key in **Project Settings → Plugins → ElevenLabs AI Agent**
|
||||
- UE saves it to `DefaultEngine.ini` under `[/Script/PS_AI_Agent_ElevenLabs.ElevenLabsSettings]`
|
||||
- **Strip this section from `DefaultEngine.ini` before every git commit**
|
||||
- Each developer sets the key locally — it does not go in version control
|
||||
|
||||
---
|
||||
|
||||
## 10. Audio Pipeline
|
||||
@ -415,40 +449,63 @@ By default, the API key is stored in Project Settings (Engine.ini). This is fine
|
||||
|
||||
```
|
||||
Device (any sample rate, any channels)
|
||||
↓ FAudioCapture (UE built-in)
|
||||
↓ Callback: float32 interleaved frames
|
||||
↓ Downmix to mono (average channels)
|
||||
↓ FAudioCapture — UE built-in (UE 5.3+ API: OpenAudioCaptureStream)
|
||||
↓ Callback: const void* → cast to float32 interleaved frames
|
||||
↓ Downmix to mono (average all channels)
|
||||
↓ Resample to 16000 Hz (linear interpolation)
|
||||
↓ Apply VolumeMultiplier
|
||||
↓ Dispatch to Game Thread
|
||||
↓ Convert float32 → int16 LE bytes
|
||||
↓ Dispatch to Game Thread (AsyncTask)
|
||||
↓ Convert float32 → int16 signed, little-endian bytes
|
||||
↓ Base64 encode
|
||||
↓ WebSocket JSON frame: { "user_audio_chunk": "<base64>" }
|
||||
↓ Send as binary WebSocket frame: { "user_audio_chunk": "<base64>" }
|
||||
```
|
||||
|
||||
### Output (agent → player)
|
||||
|
||||
```
|
||||
WebSocket JSON frame: { "type": "audio", "audio_event": { "audio_base_64": "..." } }
|
||||
↓ Base64 decode → int16 LE PCM bytes
|
||||
↓ Enqueue in thread-safe AudioQueue
|
||||
Binary WebSocket frame arrives
|
||||
↓ Peek first byte:
|
||||
• '{' → UTF-8 JSON: parse type field, dispatch to handler
|
||||
• other → raw PCM audio bytes
|
||||
↓ [Audio path] Raw int16 LE PCM bytes at 16000 Hz mono
|
||||
↓ Enqueue in thread-safe AudioQueue (FCriticalSection)
|
||||
↓ USoundWaveProcedural::OnSoundWaveProceduralUnderflow pulls from queue
|
||||
↓ UAudioComponent plays from the Actor's world position (3D spatialized)
|
||||
```
|
||||
|
||||
**Audio format** (both directions): PCM 16-bit signed, 16000 Hz, mono, little-endian.
|
||||
|
||||
### Silence detection heuristic
|
||||
|
||||
`OnAgentStoppedSpeaking` fires when the `AudioQueue` has been empty for **30 consecutive ticks** (~0.5 s at 60 fps). If the agent has natural pauses, increase `SilenceThresholdTicks` in the header:
|
||||
|
||||
```cpp
|
||||
static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Common Patterns
|
||||
|
||||
### Test the connection without a microphone
|
||||
|
||||
```
|
||||
BeginPlay → StartConversation()
|
||||
|
||||
OnAgentConnected → SendTextMessage("Hello, introduce yourself")
|
||||
|
||||
OnAgentTextResponse → Print string (confirms text pipeline works)
|
||||
OnAgentStartedSpeaking → (confirms audio pipeline works)
|
||||
```
|
||||
|
||||
### Show subtitles in UI
|
||||
|
||||
```
|
||||
OnAgentTranscript event:
|
||||
├─ Segment → Speaker == "user" → show in player subtitle widget
|
||||
├─ Segment → Speaker == "agent" → show in NPC speech bubble
|
||||
└─ Segment → bIsFinal == false → show as "..." (in-progress)
|
||||
OnAgentTranscript:
|
||||
Segment → Text → show in player subtitle widget (speaker always "user")
|
||||
|
||||
OnAgentTextResponse:
|
||||
ResponseText → show in NPC speech bubble
|
||||
```
|
||||
|
||||
### Interrupt the agent when the player starts speaking
|
||||
@ -456,13 +513,13 @@ OnAgentTranscript event:
|
||||
In Server VAD mode ElevenLabs handles this automatically. For manual control:
|
||||
|
||||
```
|
||||
OnAgentStartedSpeaking → store "agent is speaking" flag
|
||||
OnAgentStartedSpeaking → set "agent is speaking" flag
|
||||
Input Action (any) → if agent is speaking → InterruptAgent()
|
||||
```
|
||||
|
||||
### Multiple NPCs with different agents
|
||||
|
||||
Each NPC Blueprint has its own `UElevenLabsConversationalAgentComponent`. Set a different `AgentID` on each component. Connections are fully independent.
|
||||
Each NPC Blueprint has its own `UElevenLabsConversationalAgentComponent`. Set a different `AgentID` on each component. WebSocket connections are fully independent.
|
||||
|
||||
### Only start the conversation when the player is nearby
|
||||
|
||||
@ -474,7 +531,7 @@ On End Overlap
|
||||
└─► [ElevenLabs Agent] End Conversation
|
||||
```
|
||||
|
||||
### Adjusting microphone volume
|
||||
### Adjust microphone volume
|
||||
|
||||
Get the `UElevenLabsMicrophoneCaptureComponent` from the owner and set `VolumeMultiplier`:
|
||||
|
||||
@ -495,37 +552,68 @@ Ensure the plugin is enabled in `.uproject` and the project was recompiled after
|
||||
### WebSocket connection fails immediately
|
||||
|
||||
- Check the **API Key** is set correctly in Project Settings.
|
||||
- Check the **Agent ID** exists in your ElevenLabs account.
|
||||
- Enable **Verbose Logging** in Project Settings and check the Output Log for the exact WebSocket URL and error.
|
||||
- Make sure your machine has internet access and port 443 (WSS) is not blocked.
|
||||
- Check the **Agent ID** exists in your ElevenLabs account (find it in the dashboard URL or via `GET /v1/convai/agents`).
|
||||
- Enable **Verbose Logging** in Project Settings and check Output Log for the exact WS URL and error.
|
||||
- Ensure port 443 (WSS) is not blocked by your firewall.
|
||||
|
||||
### `OnAgentConnected` never fires
|
||||
|
||||
- Connection was made but `conversation_initiation_metadata` not received yet — check Verbose Logging.
|
||||
- If you see `"Binary audio frame"` logs but no `"Conversation initiated"` — the initiation JSON frame may be arriving as a non-`{` binary frame. Check the hex prefix logged at Verbose level.
|
||||
|
||||
### No audio from the microphone
|
||||
|
||||
- Windows may require microphone permission. Check **Settings → Privacy → Microphone**.
|
||||
- Try setting `VolumeMultiplier` to `2.0` to rule out a volume issue.
|
||||
- Check the Output Log for `"Failed to open default audio capture stream"`.
|
||||
- Try setting `VolumeMultiplier` to `2.0` on the `MicrophoneCaptureComponent`.
|
||||
- Check Output Log for `"Failed to open default audio capture stream"`.
|
||||
|
||||
### Agent audio is choppy or silent
|
||||
|
||||
- The `USoundWaveProcedural` queue may be underflowing. This can happen if audio chunks arrive with long gaps. Check network latency.
|
||||
- Ensure no other component is consuming the same `UAudioComponent`.
|
||||
- The `USoundWaveProcedural` queue may be underflowing due to network jitter. Check latency.
|
||||
- Verify the audio format matches: plugin expects raw PCM 16-bit 16 kHz mono from the server. If ElevenLabs sends a different format (e.g. mp3_44100), audio will sound garbled — check `agent_output_audio_format` in the `conversation_initiation_metadata` via Verbose Logging.
|
||||
- Ensure no other component is using the same `UAudioComponent`.
|
||||
|
||||
### `OnAgentStoppedSpeaking` fires too early
|
||||
|
||||
The silence detection threshold is 30 ticks (~0.5 s at 60 fps). If the agent has natural pauses in speech, increase `SilenceThresholdTicks` in `ElevenLabsConversationalAgentComponent.h`:
|
||||
Increase `SilenceThresholdTicks` in `ElevenLabsConversationalAgentComponent.h`:
|
||||
|
||||
```cpp
|
||||
static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s
|
||||
static constexpr int32 SilenceThresholdTicks = 60; // ~1.0s at 60fps
|
||||
```
|
||||
|
||||
### Build error: "Plugin AudioCapture not found"
|
||||
|
||||
Make sure the `AudioCapture` plugin is enabled in your project. It should be auto-enabled via the `.uplugin` dependency, but you can also add it manually to `.uproject`:
|
||||
Make sure the `AudioCapture` plugin is enabled. It should be auto-enabled via the `.uplugin` dependency, but you can add it manually to `.uproject`:
|
||||
|
||||
```json
|
||||
{ "Name": "AudioCapture", "Enabled": true }
|
||||
```
|
||||
|
||||
### `"Received unexpected binary WebSocket frame"` in the log
|
||||
|
||||
This warning no longer appears in v1.1.0+. If you see it, you are running an older build — recompile the plugin.
|
||||
|
||||
---
|
||||
|
||||
*Documentation generated 2026-02-19 — Plugin v1.0.0 — UE 5.5*
|
||||
## 13. Changelog
|
||||
|
||||
### v1.1.0 — 2026-02-19
|
||||
|
||||
**Bug fixes:**
|
||||
- **Binary WebSocket frames**: ElevenLabs sends all frames as binary (not text). All frames were previously discarded. Now correctly handled — JSON control frames decoded as UTF-8, raw PCM audio frames routed directly to the audio queue.
|
||||
- **Transcript message**: Wrong message type (`"transcript"` → `"user_transcript"`), wrong event key (`"transcript_event"` → `"user_transcription_event"`), wrong text field (`"message"` → `"user_transcript"`).
|
||||
- **Pong format**: `event_id` was nested inside a `pong_event` object; corrected to top-level field per API spec.
|
||||
- **Client turn mode**: `user_turn_start`/`user_turn_end` are not valid API messages; replaced with `user_activity` (start) and implicit silence (end).
|
||||
|
||||
**New features:**
|
||||
- `SendTextMessage(Text)` on both `UElevenLabsConversationalAgentComponent` and `UElevenLabsWebSocketProxy` — send text to the agent without a microphone. Useful for testing.
|
||||
- Verbose logging shows binary frame hex preview and JSON frame content prefix.
|
||||
- Improved JSON parse error log now shows the first 80 characters of the failing message.
|
||||
|
||||
### v1.0.0 — 2026-02-19
|
||||
|
||||
Initial implementation. Plugin compiles cleanly on UE 5.5 Win64.
|
||||
|
||||
---
|
||||
|
||||
*Documentation updated 2026-02-19 — Plugin v1.1.0 — UE 5.5*
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user