This commit is contained in:
j.foucher 2026-02-19 15:02:55 +01:00
parent 99017f4067
commit 302337b573
2 changed files with 89 additions and 3 deletions

View File

@ -43,6 +43,20 @@
## Plugin Status
- **PS_AI_Agent_ElevenLabs**: compiles cleanly on UE 5.5 Win64 (verified 2026-02-19)
- v1.1.0 — all 3 protocol bugs fixed (transcript fields, pong format, client turn mode)
- Binary WS frame handling implemented (ElevenLabs sends ALL frames as binary, not text)
- First-byte discrimination: `{` = JSON control message, else = raw PCM audio
- `SendTextMessage()` added to both WebSocketProxy and ConversationalAgentComponent
- Connection confirmed working end-to-end; audio receive path functional
## ElevenLabs WebSocket Protocol Notes
- **ALL frames are binary**`OnRawMessage` handles everything; `OnMessage` (text) never fires
- Binary frame discrimination: peek byte[0] → `'{'` (0x7B) = JSON, else = raw PCM audio
- Fragment reassembly: accumulate into `BinaryFrameBuffer` until `BytesRemaining == 0`
- Pong: `{"type":"pong","event_id":N}``event_id` is **top-level**, NOT nested
- Transcript: type=`user_transcript`, key=`user_transcription_event`, field=`user_transcript`
- Client turn mode: `{"type":"user_activity"}` to signal speaking; no explicit end message
- Text input: `{"type":"user_message","text":"..."}` — agent replies with audio + text
## API Keys / Secrets
- ElevenLabs API key is set in **Project Settings → Plugins → ElevenLabs AI Agent** in the Editor

View File

@ -89,6 +89,68 @@ Commit: `1b72026`
---
## Session 2 — 2026-02-19 (continued context)
### 10. API vs Implementation Cross-Check (3 bugs found and fixed)
Cross-referenced `elevenlabs_api_reference.md` against plugin source. Found 3 protocol bugs:
**Bug 1 — Transcript fields wrong:**
- Type: `"transcript"``"user_transcript"`
- Event key: `"transcript_event"``"user_transcription_event"`
- Field: `"message"``"user_transcript"`
**Bug 2 — Pong format wrong:**
- `event_id` was nested in `pong_event{}` → must be top-level
**Bug 3 — Client turn mode messages don't exist:**
- `"user_turn_start"` / `"user_turn_end"` are not valid API types
- Replaced: start → `"user_activity"`, end → no-op (server detects silence)
Commit: `ae2c9b9`
### 11. SendTextMessage Added
User asked for text input to agent for testing (without mic).
Added `SendTextMessage(FString)` to `UElevenLabsWebSocketProxy` and `UElevenLabsConversationalAgentComponent`.
Sends `{"type":"user_message","text":"..."}` — agent replies with audio + text.
Commit: `b489d11`
### 12. Binary WebSocket Frame Fix
User reported: `"Received unexpected binary WebSocket frame"` warnings.
Root cause: ElevenLabs sends **ALL WebSocket frames as binary**, never text.
`OnMessage` (text handler) never fires. `OnRawMessage` must handle everything.
Fix: Implemented `OnWsBinaryMessage` with fragment reassembly (`BinaryFrameBuffer`).
Commit: `669c503`
### 13. JSON vs PCM Discrimination Fix
After binary fix: `"Failed to parse WebSocket message as JSON"` errors.
Root cause: Binary frames contain BOTH JSON control messages AND raw PCM audio.
Fix: Peek at byte[0] of assembled buffer:
- `'{'` (0x7B) → UTF-8 JSON → route to `OnWsMessage()`
- anything else → raw PCM audio → broadcast to `OnAudioReceived`
Commit: `4834567`
### 14. Documentation Updated to v1.1.0
Full rewrite of `.claude/PS_AI_Agent_ElevenLabs_Documentation.md`:
- Added Changelog section (v1.0.0 / v1.1.0)
- Updated audio pipeline (binary PCM path, not Base64 JSON)
- Added `SendTextMessage` to all function tables and examples
- Corrected turn mode docs, transcript docs, `OnAgentConnected` timing
- New troubleshooting entries
Commit: `e464cfe`
### 15. Test Blueprint Asset Updated
`test_AI_Actor.uasset` updated in UE Editor.
Commit: `99017f4`
---
## Git History (this session)
| Hash | Message |
@ -100,6 +162,14 @@ Commit: `1b72026`
| `3b98edc` | Update memory: document confirmed UE 5.5 API patterns and plugin compile status |
| `c833ccd` | Add plugin documentation for PS_AI_Agent_ElevenLabs |
| `1b72026` | Add PowerPoint documentation and update .gitignore |
| `bbeb429` | ElevenLabs API reference doc |
| `dbd6161` | TestMap, test actor, DefaultEngine.ini, memory update |
| `ae2c9b9` | Fix 3 WebSocket protocol bugs |
| `b489d11` | Add SendTextMessage |
| `669c503` | Fix binary WebSocket frames |
| `4834567` | Fix JSON vs binary frame discrimination |
| `e464cfe` | Update documentation to v1.1.0 |
| `99017f4` | Update test_AI_Actor blueprint asset |
---
@ -114,14 +184,16 @@ Commit: `1b72026`
| `EAllowShrinking::No` | Bool overload of `RemoveAt` deprecated in UE 5.5 |
| `USoundWaveProcedural` for playback | Allows pushing raw PCM bytes at runtime without file I/O |
| Silence threshold = 30 ticks | ~0.5s at 60fps heuristic to detect agent finished speaking |
| Binary frame handling | ElevenLabs sends ALL WS frames as binary; peek byte[0] to discriminate JSON vs PCM |
| `user_activity` for client turn | `user_turn_start`/`user_turn_end` don't exist in ElevenLabs API |
---
## Next Steps (not done yet)
- [ ] Open in UE 5.5 Editor and test with a real ElevenLabs agent
- [ ] Verify mic audio actually reaches ElevenLabs (enable Verbose Logging)
- [ ] Test `USoundWaveProcedural` underflow behaviour in practice
- [ ] Verify mic audio actually reaches ElevenLabs (enable Verbose Logging, test in Editor)
- [ ] Test `USoundWaveProcedural` underflow behaviour in practice (check for audio glitches)
- [ ] Test `SendTextMessage` end-to-end in Blueprint
- [ ] Add lip-sync support (future)
- [ ] Add session memory / conversation history (future, matching Convai)
- [ ] Add environment/action context support (future)