diff --git a/.claude/MEMORY.md b/.claude/MEMORY.md index fe7eae2..0181b98 100644 --- a/.claude/MEMORY.md +++ b/.claude/MEMORY.md @@ -17,69 +17,70 @@ ## Plugins | Plugin | Path | Purpose | |--------|------|---------| -| Convai (reference) | `/ConvAI/Convai/` | gRPC + protobuf streaming to Convai API. Has ElevenLabs voice type enum in `ConvaiDefinitions.h`. Used as architectural reference. | -| **PS_AI_Agent_ElevenLabs** | `/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/` | Our ElevenLabs Conversational AI integration. See `.claude/elevenlabs_plugin.md` for full details. | +| Convai (reference) | `/ConvAI/Convai/` | gRPC + protobuf streaming to Convai API. Used as architectural reference. | +| **PS_AI_ConvAgent** | `/Unreal/PS_AI_Agent/Plugins/PS_AI_ConvAgent/` | Main plugin — ElevenLabs Conversational AI, posture, gaze, lip sync, facial expressions. | ## User Preferences -- Plugin naming: `PS_AI_Agent_` (e.g. `PS_AI_Agent_ElevenLabs`) +- Plugin naming: `PS_AI_ConvAgent` (renamed from PS_AI_Agent_ElevenLabs) - Save memory frequently during long sessions -- Goal: ElevenLabs Conversational AI integration — simpler than Convai, no gRPC -- Full original ask + intent: see `.claude/project_context.md` - Git remote is a **private server** — no public exposure risk +- Full original ask + intent: see `.claude/project_context.md` + +## Current Branch & Work +- **Branch**: `main` +- **Recent merges**: `feature/multi-player-shared-agent` merged to main + +### Latency Debug HUD (just implemented) +- Separate `bDebugLatency` property + CVar `ps.ai.ConvAgent.Debug.Latency` +- All metrics anchored to `GenerationStartTime` (`agent_response_started` event) +- Metrics: Gen>Audio (LLM+TTS), Pre-buffer, Gen>Ear (user-perceived) +- Reset per turn in `HandleAgentResponseStarted()` +- `DrawLatencyHUD()` separate from `DrawDebugHUD()` + +### Future: Server-Side Latency from ElevenLabs API +**TODO — high-value improvement parked for later:** +- `GET /v1/convai/conversations/{conversation_id}` returns: + - `conversation_turn_metrics` with `elapsed_time` per metric (STT, LLM, TTS breakdown!) + - `tool_latency_secs`, `step_latency_secs`, `rag_latency_secs` + - `time_in_call_secs` per message +- `ping` WS event has `ping_ms` (network round-trip) — could display on HUD +- `vad_score` WS event (0.0-1.0) — could detect real speech start client-side +- Docs: https://elevenlabs.io/docs/api-reference/conversations/get + +### Multi-Player Shared Agent — Key Design +- **Old model**: exclusive lock (one player per agent via `NetConversatingPawn`) +- **New model**: shared array (`NetConnectedPawns`) + active speaker (`NetActiveSpeakerPawn`) +- Speaker arbitration: server-side with `SpeakerSwitchHysteresis` (0.3s) + `SpeakerIdleTimeout` (3.0s) +- In standalone (≤1 player): speaker arbitration bypassed, audio sent directly to WebSocket +- Internal mic (WASAPI thread): direct WebSocket send, no game-thread state access +- `GetCurrentBlendshapes()` thread-safe via `ThreadSafeBlendshapes` snapshot + `BlendshapeLock` ## Key UE5 Plugin Patterns - Settings object: `UCLASS(config=Engine, defaultconfig)` inheriting `UObject`, registered via `ISettingsModule` -- Module startup: `NewObject(..., RF_Standalone)` + `AddToRoot()` - WebSocket: `FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), Headers)` - - `WebSockets` is a **module** (Build.cs only) — NOT a plugin, don't put it in `.uplugin` -- Audio capture: `Audio::FAudioCapture::OpenAudioCaptureStream()` (UE 5.3+, replaces deprecated `OpenCaptureStream`) - - `AudioCapture` IS a plugin — declare it in `.uplugin` Plugins array - - Callback type: `FOnAudioCaptureFunction` = `TFunction` - - Cast `const void*` to `const float*` inside — device sends float32 interleaved -- Procedural audio playback: `USoundWaveProcedural` + `OnSoundWaveProceduralUnderflow` delegate -- Audio capture callbacks arrive on a **background thread** — always marshal to game thread with `AsyncTask(ENamedThreads::GameThread, ...)` +- Audio capture: `Audio::FAudioCapture::OpenAudioCaptureStream()` (UE 5.3+) + - Callback arrives on **background thread** — marshal to game thread +- Procedural audio playback: `USoundWaveProcedural` + `OnSoundWaveProceduralUnderflow` - Resample mic audio to **16000 Hz mono** before sending to ElevenLabs - `TArray::RemoveAt(idx, count, EAllowShrinking::No)` — bool overload deprecated in UE 5.5 -## Plugin Status -- **PS_AI_Agent_ElevenLabs**: compiles cleanly on UE 5.5 Win64 (verified 2026-02-19) -- v1.5.0 — mic audio chunk size fixed: WASAPI 5ms callbacks accumulated to 100ms before sending -- v1.4.0 — push-to-talk fully fixed: bAutoStartListening now ignored in Client turn mode -- Binary WS frame handling implemented (ElevenLabs sends ALL frames as binary, not text) -- First-byte discrimination: `{` = JSON control message, else = raw PCM audio -- `SendTextMessage()` added to both WebSocketProxy and ConversationalAgentComponent -- `conversation_initiation_client_data` now sent immediately on WS connect (required for mic + latency) - -## Audio Chunk Size — CRITICAL -- WASAPI fires mic callbacks every ~5ms → **158 bytes** at 16kHz 16-bit mono -- ElevenLabs VAD/STT requires **≥3200 bytes (100ms)** per chunk; smaller chunks are silently ignored -- Fix: `MicAccumulationBuffer` in component accumulates chunks; sends only when `>= MicChunkMinBytes` (3200) -- `StopListening()` flushes remainder so final partial chunk is never dropped before end-of-turn - ## ElevenLabs WebSocket Protocol Notes -- **ALL frames are binary** — bind ONLY `OnRawMessage`; NEVER bind `OnMessage` (text) — UE fires both for same frame → double audio bug +- **ALL frames are binary** — bind ONLY `OnRawMessage`; NEVER bind `OnMessage` (text) - Binary frame discrimination: peek byte[0] → `'{'` (0x7B) = JSON, else = raw PCM audio -- Fragment reassembly: accumulate into `BinaryFrameBuffer` until `BytesRemaining == 0` - Pong: `{"type":"pong","event_id":N}` — `event_id` is **top-level**, NOT nested -- Transcript: type=`user_transcript`, key=`user_transcription_event`, field=`user_transcript` -- Client turn mode (`client_vad`): send `user_activity` **with every audio chunk** (not just once) — server needs continuous signal to know user is speaking; stopping chunks = silence detected = agent responds -- Text input: `{"type":"user_message","text":"..."}` — agent replies with audio + text -- **MUST send `conversation_initiation_client_data` immediately after WS connect** — without it, server won't process client audio (mic appears dead) -- `conversation_initiation_client_data` payload: `conversation_config_override.agent.turn.mode`, `conversation_config_override.tts.optimize_streaming_latency`, `custom_llm_extra_body.enable_intermediate_response` -- `enable_intermediate_response: true` in `custom_llm_extra_body` reduces time-to-first-audio latency +- `user_transcript` arrives AFTER `agent_response_started` in Server VAD mode +- **MUST send `conversation_initiation_client_data` immediately after WS connect** ## API Keys / Secrets -- ElevenLabs API key is set in **Project Settings → Plugins → ElevenLabs AI Agent** in the Editor -- UE saves it to `DefaultEngine.ini` under `[/Script/PS_AI_Agent_ElevenLabs.ElevenLabsSettings]` -- **The key is stripped from `DefaultEngine.ini` before every commit** — do not commit it -- Each developer sets the key locally; it does not go in git +- ElevenLabs API key: **Project Settings → Plugins → ElevenLabs AI Agent** +- Saved to `DefaultEngine.ini` — **stripped before every commit** ## Claude Memory Files in This Repo | File | Contents | |------|----------| | `.claude/MEMORY.md` | This file — project structure, patterns, status | | `.claude/elevenlabs_plugin.md` | Plugin file map, ElevenLabs WS protocol, design decisions | -| `.claude/elevenlabs_api_reference.md` | Full ElevenLabs API reference (WS messages, REST, signed URL, Agent ID location) | +| `.claude/elevenlabs_api_reference.md` | Full ElevenLabs API reference (WS messages, REST, signed URL) | | `.claude/project_context.md` | Original ask, intent, short/long-term goals | -| `.claude/session_log_2026-02-19.md` | Full session record: steps, commits, technical decisions, next steps | +| `.claude/session_log_2026-02-19.md` | Session record: steps, commits, technical decisions | | `.claude/PS_AI_Agent_ElevenLabs_Documentation.md` | User-facing Markdown reference doc |