j.foucher 2169c58cd7 Update memory: latency HUD status + future server-side metrics TODO
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 18:47:11 +01:00

87 lines
4.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Project Memory PS_AI_Agent
> This file is committed to the repository so it is available on any machine.
> Claude Code reads it automatically at session start (via the auto-memory system)
> when the working directory is inside this repo.
> **Keep it under ~180 lines** lines beyond 200 are truncated by the system.
---
## Project Location
- Repo root: `<repo_root>/` (wherever this is cloned)
- UE5 project: `<repo_root>/Unreal/PS_AI_Agent/`
- `.uproject`: `<repo_root>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject`
- Engine: **Unreal Engine 5.5** — Win64 primary target
- Default test map: `/Game/TestMap.TestMap`
## Plugins
| Plugin | Path | Purpose |
|--------|------|---------|
| Convai (reference) | `<repo_root>/ConvAI/Convai/` | gRPC + protobuf streaming to Convai API. Used as architectural reference. |
| **PS_AI_ConvAgent** | `<repo_root>/Unreal/PS_AI_Agent/Plugins/PS_AI_ConvAgent/` | Main plugin — ElevenLabs Conversational AI, posture, gaze, lip sync, facial expressions. |
## User Preferences
- Plugin naming: `PS_AI_ConvAgent` (renamed from PS_AI_Agent_ElevenLabs)
- Save memory frequently during long sessions
- Git remote is a **private server** — no public exposure risk
- Full original ask + intent: see `.claude/project_context.md`
## Current Branch & Work
- **Branch**: `main`
- **Recent merges**: `feature/multi-player-shared-agent` merged to main
### Latency Debug HUD (just implemented)
- Separate `bDebugLatency` property + CVar `ps.ai.ConvAgent.Debug.Latency`
- All metrics anchored to `GenerationStartTime` (`agent_response_started` event)
- Metrics: Gen>Audio (LLM+TTS), Pre-buffer, Gen>Ear (user-perceived)
- Reset per turn in `HandleAgentResponseStarted()`
- `DrawLatencyHUD()` separate from `DrawDebugHUD()`
### Future: Server-Side Latency from ElevenLabs API
**TODO — high-value improvement parked for later:**
- `GET /v1/convai/conversations/{conversation_id}` returns:
- `conversation_turn_metrics` with `elapsed_time` per metric (STT, LLM, TTS breakdown!)
- `tool_latency_secs`, `step_latency_secs`, `rag_latency_secs`
- `time_in_call_secs` per message
- `ping` WS event has `ping_ms` (network round-trip) — could display on HUD
- `vad_score` WS event (0.0-1.0) — could detect real speech start client-side
- Docs: https://elevenlabs.io/docs/api-reference/conversations/get
### Multi-Player Shared Agent — Key Design
- **Old model**: exclusive lock (one player per agent via `NetConversatingPawn`)
- **New model**: shared array (`NetConnectedPawns`) + active speaker (`NetActiveSpeakerPawn`)
- Speaker arbitration: server-side with `SpeakerSwitchHysteresis` (0.3s) + `SpeakerIdleTimeout` (3.0s)
- In standalone (≤1 player): speaker arbitration bypassed, audio sent directly to WebSocket
- Internal mic (WASAPI thread): direct WebSocket send, no game-thread state access
- `GetCurrentBlendshapes()` thread-safe via `ThreadSafeBlendshapes` snapshot + `BlendshapeLock`
## Key UE5 Plugin Patterns
- Settings object: `UCLASS(config=Engine, defaultconfig)` inheriting `UObject`, registered via `ISettingsModule`
- WebSocket: `FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), Headers)`
- Audio capture: `Audio::FAudioCapture::OpenAudioCaptureStream()` (UE 5.3+)
- Callback arrives on **background thread** — marshal to game thread
- Procedural audio playback: `USoundWaveProcedural` + `OnSoundWaveProceduralUnderflow`
- Resample mic audio to **16000 Hz mono** before sending to ElevenLabs
- `TArray::RemoveAt(idx, count, EAllowShrinking::No)` — bool overload deprecated in UE 5.5
## ElevenLabs WebSocket Protocol Notes
- **ALL frames are binary** — bind ONLY `OnRawMessage`; NEVER bind `OnMessage` (text)
- Binary frame discrimination: peek byte[0] → `'{'` (0x7B) = JSON, else = raw PCM audio
- Pong: `{"type":"pong","event_id":N}``event_id` is **top-level**, NOT nested
- `user_transcript` arrives AFTER `agent_response_started` in Server VAD mode
- **MUST send `conversation_initiation_client_data` immediately after WS connect**
## API Keys / Secrets
- ElevenLabs API key: **Project Settings → Plugins → ElevenLabs AI Agent**
- Saved to `DefaultEngine.ini`**stripped before every commit**
## Claude Memory Files in This Repo
| File | Contents |
|------|----------|
| `.claude/MEMORY.md` | This file — project structure, patterns, status |
| `.claude/elevenlabs_plugin.md` | Plugin file map, ElevenLabs WS protocol, design decisions |
| `.claude/elevenlabs_api_reference.md` | Full ElevenLabs API reference (WS messages, REST, signed URL) |
| `.claude/project_context.md` | Original ask, intent, short/long-term goals |
| `.claude/session_log_2026-02-19.md` | Session record: steps, commits, technical decisions |
| `.claude/PS_AI_Agent_ElevenLabs_Documentation.md` | User-facing Markdown reference doc |