86 lines
5.9 KiB
Markdown
86 lines
5.9 KiB
Markdown
# Project Memory – PS_AI_Agent
|
||
|
||
> This file is committed to the repository so it is available on any machine.
|
||
> Claude Code reads it automatically at session start (via the auto-memory system)
|
||
> when the working directory is inside this repo.
|
||
> **Keep it under ~180 lines** – lines beyond 200 are truncated by the system.
|
||
|
||
---
|
||
|
||
## Project Location
|
||
- Repo root: `<repo_root>/` (wherever this is cloned)
|
||
- UE5 project: `<repo_root>/Unreal/PS_AI_Agent/`
|
||
- `.uproject`: `<repo_root>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject`
|
||
- Engine: **Unreal Engine 5.5** — Win64 primary target
|
||
- Default test map: `/Game/TestMap.TestMap`
|
||
|
||
## Plugins
|
||
| Plugin | Path | Purpose |
|
||
|--------|------|---------|
|
||
| Convai (reference) | `<repo_root>/ConvAI/Convai/` | gRPC + protobuf streaming to Convai API. Has ElevenLabs voice type enum in `ConvaiDefinitions.h`. Used as architectural reference. |
|
||
| **PS_AI_Agent_ElevenLabs** | `<repo_root>/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/` | Our ElevenLabs Conversational AI integration. See `.claude/elevenlabs_plugin.md` for full details. |
|
||
|
||
## User Preferences
|
||
- Plugin naming: `PS_AI_Agent_<Service>` (e.g. `PS_AI_Agent_ElevenLabs`)
|
||
- Save memory frequently during long sessions
|
||
- Goal: ElevenLabs Conversational AI integration — simpler than Convai, no gRPC
|
||
- Full original ask + intent: see `.claude/project_context.md`
|
||
- Git remote is a **private server** — no public exposure risk
|
||
|
||
## Key UE5 Plugin Patterns
|
||
- Settings object: `UCLASS(config=Engine, defaultconfig)` inheriting `UObject`, registered via `ISettingsModule`
|
||
- Module startup: `NewObject<USettings>(..., RF_Standalone)` + `AddToRoot()`
|
||
- WebSocket: `FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), Headers)`
|
||
- `WebSockets` is a **module** (Build.cs only) — NOT a plugin, don't put it in `.uplugin`
|
||
- Audio capture: `Audio::FAudioCapture::OpenAudioCaptureStream()` (UE 5.3+, replaces deprecated `OpenCaptureStream`)
|
||
- `AudioCapture` IS a plugin — declare it in `.uplugin` Plugins array
|
||
- Callback type: `FOnAudioCaptureFunction` = `TFunction<void(const void*, int32, int32, int32, double, bool)>`
|
||
- Cast `const void*` to `const float*` inside — device sends float32 interleaved
|
||
- Procedural audio playback: `USoundWaveProcedural` + `OnSoundWaveProceduralUnderflow` delegate
|
||
- Audio capture callbacks arrive on a **background thread** — always marshal to game thread with `AsyncTask(ENamedThreads::GameThread, ...)`
|
||
- Resample mic audio to **16000 Hz mono** before sending to ElevenLabs
|
||
- `TArray::RemoveAt(idx, count, EAllowShrinking::No)` — bool overload deprecated in UE 5.5
|
||
|
||
## Plugin Status
|
||
- **PS_AI_Agent_ElevenLabs**: compiles cleanly on UE 5.5 Win64 (verified 2026-02-19)
|
||
- v1.5.0 — mic audio chunk size fixed: WASAPI 5ms callbacks accumulated to 100ms before sending
|
||
- v1.4.0 — push-to-talk fully fixed: bAutoStartListening now ignored in Client turn mode
|
||
- Binary WS frame handling implemented (ElevenLabs sends ALL frames as binary, not text)
|
||
- First-byte discrimination: `{` = JSON control message, else = raw PCM audio
|
||
- `SendTextMessage()` added to both WebSocketProxy and ConversationalAgentComponent
|
||
- `conversation_initiation_client_data` now sent immediately on WS connect (required for mic + latency)
|
||
|
||
## Audio Chunk Size — CRITICAL
|
||
- WASAPI fires mic callbacks every ~5ms → **158 bytes** at 16kHz 16-bit mono
|
||
- ElevenLabs VAD/STT requires **≥3200 bytes (100ms)** per chunk; smaller chunks are silently ignored
|
||
- Fix: `MicAccumulationBuffer` in component accumulates chunks; sends only when `>= MicChunkMinBytes` (3200)
|
||
- `StopListening()` flushes remainder so final partial chunk is never dropped before end-of-turn
|
||
|
||
## ElevenLabs WebSocket Protocol Notes
|
||
- **ALL frames are binary** — bind ONLY `OnRawMessage`; NEVER bind `OnMessage` (text) — UE fires both for same frame → double audio bug
|
||
- Binary frame discrimination: peek byte[0] → `'{'` (0x7B) = JSON, else = raw PCM audio
|
||
- Fragment reassembly: accumulate into `BinaryFrameBuffer` until `BytesRemaining == 0`
|
||
- Pong: `{"type":"pong","event_id":N}` — `event_id` is **top-level**, NOT nested
|
||
- Transcript: type=`user_transcript`, key=`user_transcription_event`, field=`user_transcript`
|
||
- Client turn mode (`client_vad`): send `user_activity` **with every audio chunk** (not just once) — server needs continuous signal to know user is speaking; stopping chunks = silence detected = agent responds
|
||
- Text input: `{"type":"user_message","text":"..."}` — agent replies with audio + text
|
||
- **MUST send `conversation_initiation_client_data` immediately after WS connect** — without it, server won't process client audio (mic appears dead)
|
||
- `conversation_initiation_client_data` payload: `conversation_config_override.agent.turn.mode`, `conversation_config_override.tts.optimize_streaming_latency`, `custom_llm_extra_body.enable_intermediate_response`
|
||
- `enable_intermediate_response: true` in `custom_llm_extra_body` reduces time-to-first-audio latency
|
||
|
||
## API Keys / Secrets
|
||
- ElevenLabs API key is set in **Project Settings → Plugins → ElevenLabs AI Agent** in the Editor
|
||
- UE saves it to `DefaultEngine.ini` under `[/Script/PS_AI_Agent_ElevenLabs.ElevenLabsSettings]`
|
||
- **The key is stripped from `DefaultEngine.ini` before every commit** — do not commit it
|
||
- Each developer sets the key locally; it does not go in git
|
||
|
||
## Claude Memory Files in This Repo
|
||
| File | Contents |
|
||
|------|----------|
|
||
| `.claude/MEMORY.md` | This file — project structure, patterns, status |
|
||
| `.claude/elevenlabs_plugin.md` | Plugin file map, ElevenLabs WS protocol, design decisions |
|
||
| `.claude/elevenlabs_api_reference.md` | Full ElevenLabs API reference (WS messages, REST, signed URL, Agent ID location) |
|
||
| `.claude/project_context.md` | Original ask, intent, short/long-term goals |
|
||
| `.claude/session_log_2026-02-19.md` | Full session record: steps, commits, technical decisions, next steps |
|
||
| `.claude/PS_AI_Agent_ElevenLabs_Documentation.md` | User-facing Markdown reference doc |
|