diff --git a/.claude/MEMORY.md b/.claude/MEMORY.md index 6c93f3e..79744ba 100644 --- a/.claude/MEMORY.md +++ b/.claude/MEMORY.md @@ -43,6 +43,20 @@ ## Plugin Status - **PS_AI_Agent_ElevenLabs**: compiles cleanly on UE 5.5 Win64 (verified 2026-02-19) +- v1.1.0 — all 3 protocol bugs fixed (transcript fields, pong format, client turn mode) +- Binary WS frame handling implemented (ElevenLabs sends ALL frames as binary, not text) +- First-byte discrimination: `{` = JSON control message, else = raw PCM audio +- `SendTextMessage()` added to both WebSocketProxy and ConversationalAgentComponent +- Connection confirmed working end-to-end; audio receive path functional + +## ElevenLabs WebSocket Protocol Notes +- **ALL frames are binary** — `OnRawMessage` handles everything; `OnMessage` (text) never fires +- Binary frame discrimination: peek byte[0] → `'{'` (0x7B) = JSON, else = raw PCM audio +- Fragment reassembly: accumulate into `BinaryFrameBuffer` until `BytesRemaining == 0` +- Pong: `{"type":"pong","event_id":N}` — `event_id` is **top-level**, NOT nested +- Transcript: type=`user_transcript`, key=`user_transcription_event`, field=`user_transcript` +- Client turn mode: `{"type":"user_activity"}` to signal speaking; no explicit end message +- Text input: `{"type":"user_message","text":"..."}` — agent replies with audio + text ## API Keys / Secrets - ElevenLabs API key is set in **Project Settings → Plugins → ElevenLabs AI Agent** in the Editor diff --git a/.claude/session_log_2026-02-19.md b/.claude/session_log_2026-02-19.md index 8079f0d..4c56ce2 100644 --- a/.claude/session_log_2026-02-19.md +++ b/.claude/session_log_2026-02-19.md @@ -89,6 +89,68 @@ Commit: `1b72026` --- +## Session 2 — 2026-02-19 (continued context) + +### 10. API vs Implementation Cross-Check (3 bugs found and fixed) +Cross-referenced `elevenlabs_api_reference.md` against plugin source. Found 3 protocol bugs: + +**Bug 1 — Transcript fields wrong:** +- Type: `"transcript"` → `"user_transcript"` +- Event key: `"transcript_event"` → `"user_transcription_event"` +- Field: `"message"` → `"user_transcript"` + +**Bug 2 — Pong format wrong:** +- `event_id` was nested in `pong_event{}` → must be top-level + +**Bug 3 — Client turn mode messages don't exist:** +- `"user_turn_start"` / `"user_turn_end"` are not valid API types +- Replaced: start → `"user_activity"`, end → no-op (server detects silence) + +Commit: `ae2c9b9` + +### 11. SendTextMessage Added +User asked for text input to agent for testing (without mic). +Added `SendTextMessage(FString)` to `UElevenLabsWebSocketProxy` and `UElevenLabsConversationalAgentComponent`. +Sends `{"type":"user_message","text":"..."}` — agent replies with audio + text. + +Commit: `b489d11` + +### 12. Binary WebSocket Frame Fix +User reported: `"Received unexpected binary WebSocket frame"` warnings. +Root cause: ElevenLabs sends **ALL WebSocket frames as binary**, never text. +`OnMessage` (text handler) never fires. `OnRawMessage` must handle everything. + +Fix: Implemented `OnWsBinaryMessage` with fragment reassembly (`BinaryFrameBuffer`). + +Commit: `669c503` + +### 13. JSON vs PCM Discrimination Fix +After binary fix: `"Failed to parse WebSocket message as JSON"` errors. +Root cause: Binary frames contain BOTH JSON control messages AND raw PCM audio. + +Fix: Peek at byte[0] of assembled buffer: +- `'{'` (0x7B) → UTF-8 JSON → route to `OnWsMessage()` +- anything else → raw PCM audio → broadcast to `OnAudioReceived` + +Commit: `4834567` + +### 14. Documentation Updated to v1.1.0 +Full rewrite of `.claude/PS_AI_Agent_ElevenLabs_Documentation.md`: +- Added Changelog section (v1.0.0 / v1.1.0) +- Updated audio pipeline (binary PCM path, not Base64 JSON) +- Added `SendTextMessage` to all function tables and examples +- Corrected turn mode docs, transcript docs, `OnAgentConnected` timing +- New troubleshooting entries + +Commit: `e464cfe` + +### 15. Test Blueprint Asset Updated +`test_AI_Actor.uasset` updated in UE Editor. + +Commit: `99017f4` + +--- + ## Git History (this session) | Hash | Message | @@ -100,6 +162,14 @@ Commit: `1b72026` | `3b98edc` | Update memory: document confirmed UE 5.5 API patterns and plugin compile status | | `c833ccd` | Add plugin documentation for PS_AI_Agent_ElevenLabs | | `1b72026` | Add PowerPoint documentation and update .gitignore | +| `bbeb429` | ElevenLabs API reference doc | +| `dbd6161` | TestMap, test actor, DefaultEngine.ini, memory update | +| `ae2c9b9` | Fix 3 WebSocket protocol bugs | +| `b489d11` | Add SendTextMessage | +| `669c503` | Fix binary WebSocket frames | +| `4834567` | Fix JSON vs binary frame discrimination | +| `e464cfe` | Update documentation to v1.1.0 | +| `99017f4` | Update test_AI_Actor blueprint asset | --- @@ -114,14 +184,16 @@ Commit: `1b72026` | `EAllowShrinking::No` | Bool overload of `RemoveAt` deprecated in UE 5.5 | | `USoundWaveProcedural` for playback | Allows pushing raw PCM bytes at runtime without file I/O | | Silence threshold = 30 ticks | ~0.5s at 60fps heuristic to detect agent finished speaking | +| Binary frame handling | ElevenLabs sends ALL WS frames as binary; peek byte[0] to discriminate JSON vs PCM | +| `user_activity` for client turn | `user_turn_start`/`user_turn_end` don't exist in ElevenLabs API | --- ## Next Steps (not done yet) -- [ ] Open in UE 5.5 Editor and test with a real ElevenLabs agent -- [ ] Verify mic audio actually reaches ElevenLabs (enable Verbose Logging) -- [ ] Test `USoundWaveProcedural` underflow behaviour in practice +- [ ] Verify mic audio actually reaches ElevenLabs (enable Verbose Logging, test in Editor) +- [ ] Test `USoundWaveProcedural` underflow behaviour in practice (check for audio glitches) +- [ ] Test `SendTextMessage` end-to-end in Blueprint - [ ] Add lip-sync support (future) - [ ] Add session memory / conversation history (future, matching Convai) - [ ] Add environment/action context support (future)