PS_AI_Agent/.claude/MEMORY.md
j.foucher b888f7fcb6 Update memory: document v1.5.0 mic chunk size fix
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 18:42:47 +01:00

86 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Project Memory PS_AI_Agent
> This file is committed to the repository so it is available on any machine.
> Claude Code reads it automatically at session start (via the auto-memory system)
> when the working directory is inside this repo.
> **Keep it under ~180 lines** lines beyond 200 are truncated by the system.
---
## Project Location
- Repo root: `<repo_root>/` (wherever this is cloned)
- UE5 project: `<repo_root>/Unreal/PS_AI_Agent/`
- `.uproject`: `<repo_root>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject`
- Engine: **Unreal Engine 5.5** — Win64 primary target
- Default test map: `/Game/TestMap.TestMap`
## Plugins
| Plugin | Path | Purpose |
|--------|------|---------|
| Convai (reference) | `<repo_root>/ConvAI/Convai/` | gRPC + protobuf streaming to Convai API. Has ElevenLabs voice type enum in `ConvaiDefinitions.h`. Used as architectural reference. |
| **PS_AI_Agent_ElevenLabs** | `<repo_root>/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/` | Our ElevenLabs Conversational AI integration. See `.claude/elevenlabs_plugin.md` for full details. |
## User Preferences
- Plugin naming: `PS_AI_Agent_<Service>` (e.g. `PS_AI_Agent_ElevenLabs`)
- Save memory frequently during long sessions
- Goal: ElevenLabs Conversational AI integration — simpler than Convai, no gRPC
- Full original ask + intent: see `.claude/project_context.md`
- Git remote is a **private server** — no public exposure risk
## Key UE5 Plugin Patterns
- Settings object: `UCLASS(config=Engine, defaultconfig)` inheriting `UObject`, registered via `ISettingsModule`
- Module startup: `NewObject<USettings>(..., RF_Standalone)` + `AddToRoot()`
- WebSocket: `FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), Headers)`
- `WebSockets` is a **module** (Build.cs only) — NOT a plugin, don't put it in `.uplugin`
- Audio capture: `Audio::FAudioCapture::OpenAudioCaptureStream()` (UE 5.3+, replaces deprecated `OpenCaptureStream`)
- `AudioCapture` IS a plugin — declare it in `.uplugin` Plugins array
- Callback type: `FOnAudioCaptureFunction` = `TFunction<void(const void*, int32, int32, int32, double, bool)>`
- Cast `const void*` to `const float*` inside — device sends float32 interleaved
- Procedural audio playback: `USoundWaveProcedural` + `OnSoundWaveProceduralUnderflow` delegate
- Audio capture callbacks arrive on a **background thread** — always marshal to game thread with `AsyncTask(ENamedThreads::GameThread, ...)`
- Resample mic audio to **16000 Hz mono** before sending to ElevenLabs
- `TArray::RemoveAt(idx, count, EAllowShrinking::No)` — bool overload deprecated in UE 5.5
## Plugin Status
- **PS_AI_Agent_ElevenLabs**: compiles cleanly on UE 5.5 Win64 (verified 2026-02-19)
- v1.5.0 — mic audio chunk size fixed: WASAPI 5ms callbacks accumulated to 100ms before sending
- v1.4.0 — push-to-talk fully fixed: bAutoStartListening now ignored in Client turn mode
- Binary WS frame handling implemented (ElevenLabs sends ALL frames as binary, not text)
- First-byte discrimination: `{` = JSON control message, else = raw PCM audio
- `SendTextMessage()` added to both WebSocketProxy and ConversationalAgentComponent
- `conversation_initiation_client_data` now sent immediately on WS connect (required for mic + latency)
## Audio Chunk Size — CRITICAL
- WASAPI fires mic callbacks every ~5ms → **158 bytes** at 16kHz 16-bit mono
- ElevenLabs VAD/STT requires **≥3200 bytes (100ms)** per chunk; smaller chunks are silently ignored
- Fix: `MicAccumulationBuffer` in component accumulates chunks; sends only when `>= MicChunkMinBytes` (3200)
- `StopListening()` flushes remainder so final partial chunk is never dropped before end-of-turn
## ElevenLabs WebSocket Protocol Notes
- **ALL frames are binary** — bind ONLY `OnRawMessage`; NEVER bind `OnMessage` (text) — UE fires both for same frame → double audio bug
- Binary frame discrimination: peek byte[0] → `'{'` (0x7B) = JSON, else = raw PCM audio
- Fragment reassembly: accumulate into `BinaryFrameBuffer` until `BytesRemaining == 0`
- Pong: `{"type":"pong","event_id":N}``event_id` is **top-level**, NOT nested
- Transcript: type=`user_transcript`, key=`user_transcription_event`, field=`user_transcript`
- Client turn mode (`client_vad`): send `user_activity` **with every audio chunk** (not just once) — server needs continuous signal to know user is speaking; stopping chunks = silence detected = agent responds
- Text input: `{"type":"user_message","text":"..."}` — agent replies with audio + text
- **MUST send `conversation_initiation_client_data` immediately after WS connect** — without it, server won't process client audio (mic appears dead)
- `conversation_initiation_client_data` payload: `conversation_config_override.agent.turn.mode`, `conversation_config_override.tts.optimize_streaming_latency`, `custom_llm_extra_body.enable_intermediate_response`
- `enable_intermediate_response: true` in `custom_llm_extra_body` reduces time-to-first-audio latency
## API Keys / Secrets
- ElevenLabs API key is set in **Project Settings → Plugins → ElevenLabs AI Agent** in the Editor
- UE saves it to `DefaultEngine.ini` under `[/Script/PS_AI_Agent_ElevenLabs.ElevenLabsSettings]`
- **The key is stripped from `DefaultEngine.ini` before every commit** — do not commit it
- Each developer sets the key locally; it does not go in git
## Claude Memory Files in This Repo
| File | Contents |
|------|----------|
| `.claude/MEMORY.md` | This file — project structure, patterns, status |
| `.claude/elevenlabs_plugin.md` | Plugin file map, ElevenLabs WS protocol, design decisions |
| `.claude/elevenlabs_api_reference.md` | Full ElevenLabs API reference (WS messages, REST, signed URL, Agent ID location) |
| `.claude/project_context.md` | Original ask, intent, short/long-term goals |
| `.claude/session_log_2026-02-19.md` | Full session record: steps, commits, technical decisions, next steps |
| `.claude/PS_AI_Agent_ElevenLabs_Documentation.md` | User-facing Markdown reference doc |