87 lines
4.7 KiB
Markdown
87 lines
4.7 KiB
Markdown
# Project Memory – PS_AI_Agent
|
||
|
||
> This file is committed to the repository so it is available on any machine.
|
||
> Claude Code reads it automatically at session start (via the auto-memory system)
|
||
> when the working directory is inside this repo.
|
||
> **Keep it under ~180 lines** – lines beyond 200 are truncated by the system.
|
||
|
||
---
|
||
|
||
## Project Location
|
||
- Repo root: `<repo_root>/` (wherever this is cloned)
|
||
- UE5 project: `<repo_root>/Unreal/PS_AI_Agent/`
|
||
- `.uproject`: `<repo_root>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject`
|
||
- Engine: **Unreal Engine 5.5** — Win64 primary target
|
||
- Default test map: `/Game/TestMap.TestMap`
|
||
|
||
## Plugins
|
||
| Plugin | Path | Purpose |
|
||
|--------|------|---------|
|
||
| Convai (reference) | `<repo_root>/ConvAI/Convai/` | gRPC + protobuf streaming to Convai API. Used as architectural reference. |
|
||
| **PS_AI_ConvAgent** | `<repo_root>/Unreal/PS_AI_Agent/Plugins/PS_AI_ConvAgent/` | Main plugin — ElevenLabs Conversational AI, posture, gaze, lip sync, facial expressions. |
|
||
|
||
## User Preferences
|
||
- Plugin naming: `PS_AI_ConvAgent` (renamed from PS_AI_Agent_ElevenLabs)
|
||
- Save memory frequently during long sessions
|
||
- Git remote is a **private server** — no public exposure risk
|
||
- Full original ask + intent: see `.claude/project_context.md`
|
||
|
||
## Current Branch & Work
|
||
- **Branch**: `main`
|
||
- **Recent merges**: `feature/multi-player-shared-agent` merged to main
|
||
|
||
### Latency Debug HUD (just implemented)
|
||
- Separate `bDebugLatency` property + CVar `ps.ai.ConvAgent.Debug.Latency`
|
||
- All metrics anchored to `GenerationStartTime` (`agent_response_started` event)
|
||
- Metrics: Gen>Audio (LLM+TTS), Pre-buffer, Gen>Ear (user-perceived)
|
||
- Reset per turn in `HandleAgentResponseStarted()`
|
||
- `DrawLatencyHUD()` separate from `DrawDebugHUD()`
|
||
|
||
### Future: Server-Side Latency from ElevenLabs API
|
||
**TODO — high-value improvement parked for later:**
|
||
- `GET /v1/convai/conversations/{conversation_id}` returns:
|
||
- `conversation_turn_metrics` with `elapsed_time` per metric (STT, LLM, TTS breakdown!)
|
||
- `tool_latency_secs`, `step_latency_secs`, `rag_latency_secs`
|
||
- `time_in_call_secs` per message
|
||
- `ping` WS event has `ping_ms` (network round-trip) — could display on HUD
|
||
- `vad_score` WS event (0.0-1.0) — could detect real speech start client-side
|
||
- Docs: https://elevenlabs.io/docs/api-reference/conversations/get
|
||
|
||
### Multi-Player Shared Agent — Key Design
|
||
- **Old model**: exclusive lock (one player per agent via `NetConversatingPawn`)
|
||
- **New model**: shared array (`NetConnectedPawns`) + active speaker (`NetActiveSpeakerPawn`)
|
||
- Speaker arbitration: server-side with `SpeakerSwitchHysteresis` (0.3s) + `SpeakerIdleTimeout` (3.0s)
|
||
- In standalone (≤1 player): speaker arbitration bypassed, audio sent directly to WebSocket
|
||
- Internal mic (WASAPI thread): direct WebSocket send, no game-thread state access
|
||
- `GetCurrentBlendshapes()` thread-safe via `ThreadSafeBlendshapes` snapshot + `BlendshapeLock`
|
||
|
||
## Key UE5 Plugin Patterns
|
||
- Settings object: `UCLASS(config=Engine, defaultconfig)` inheriting `UObject`, registered via `ISettingsModule`
|
||
- WebSocket: `FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), Headers)`
|
||
- Audio capture: `Audio::FAudioCapture::OpenAudioCaptureStream()` (UE 5.3+)
|
||
- Callback arrives on **background thread** — marshal to game thread
|
||
- Procedural audio playback: `USoundWaveProcedural` + `OnSoundWaveProceduralUnderflow`
|
||
- Resample mic audio to **16000 Hz mono** before sending to ElevenLabs
|
||
- `TArray::RemoveAt(idx, count, EAllowShrinking::No)` — bool overload deprecated in UE 5.5
|
||
|
||
## ElevenLabs WebSocket Protocol Notes
|
||
- **ALL frames are binary** — bind ONLY `OnRawMessage`; NEVER bind `OnMessage` (text)
|
||
- Binary frame discrimination: peek byte[0] → `'{'` (0x7B) = JSON, else = raw PCM audio
|
||
- Pong: `{"type":"pong","event_id":N}` — `event_id` is **top-level**, NOT nested
|
||
- `user_transcript` arrives AFTER `agent_response_started` in Server VAD mode
|
||
- **MUST send `conversation_initiation_client_data` immediately after WS connect**
|
||
|
||
## API Keys / Secrets
|
||
- ElevenLabs API key: **Project Settings → Plugins → ElevenLabs AI Agent**
|
||
- Saved to `DefaultEngine.ini` — **stripped before every commit**
|
||
|
||
## Claude Memory Files in This Repo
|
||
| File | Contents |
|
||
|------|----------|
|
||
| `.claude/MEMORY.md` | This file — project structure, patterns, status |
|
||
| `.claude/elevenlabs_plugin.md` | Plugin file map, ElevenLabs WS protocol, design decisions |
|
||
| `.claude/elevenlabs_api_reference.md` | Full ElevenLabs API reference (WS messages, REST, signed URL) |
|
||
| `.claude/project_context.md` | Original ask, intent, short/long-term goals |
|
||
| `.claude/session_log_2026-02-19.md` | Session record: steps, commits, technical decisions |
|
||
| `.claude/PS_AI_Agent_ElevenLabs_Documentation.md` | User-facing Markdown reference doc |
|