5.9 KiB
5.9 KiB
Project Memory – PS_AI_Agent
This file is committed to the repository so it is available on any machine. Claude Code reads it automatically at session start (via the auto-memory system) when the working directory is inside this repo. Keep it under ~180 lines – lines beyond 200 are truncated by the system.
Project Location
- Repo root:
<repo_root>/(wherever this is cloned) - UE5 project:
<repo_root>/Unreal/PS_AI_Agent/ .uproject:<repo_root>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject- Engine: Unreal Engine 5.5 — Win64 primary target
- Default test map:
/Game/TestMap.TestMap
Plugins
| Plugin | Path | Purpose |
|---|---|---|
| Convai (reference) | <repo_root>/ConvAI/Convai/ |
gRPC + protobuf streaming to Convai API. Has ElevenLabs voice type enum in ConvaiDefinitions.h. Used as architectural reference. |
| PS_AI_Agent_ElevenLabs | <repo_root>/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/ |
Our ElevenLabs Conversational AI integration. See .claude/elevenlabs_plugin.md for full details. |
User Preferences
- Plugin naming:
PS_AI_Agent_<Service>(e.g.PS_AI_Agent_ElevenLabs) - Save memory frequently during long sessions
- Goal: ElevenLabs Conversational AI integration — simpler than Convai, no gRPC
- Full original ask + intent: see
.claude/project_context.md - Git remote is a private server — no public exposure risk
Key UE5 Plugin Patterns
- Settings object:
UCLASS(config=Engine, defaultconfig)inheritingUObject, registered viaISettingsModule - Module startup:
NewObject<USettings>(..., RF_Standalone)+AddToRoot() - WebSocket:
FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), Headers)WebSocketsis a module (Build.cs only) — NOT a plugin, don't put it in.uplugin
- Audio capture:
Audio::FAudioCapture::OpenAudioCaptureStream()(UE 5.3+, replaces deprecatedOpenCaptureStream)AudioCaptureIS a plugin — declare it in.upluginPlugins array- Callback type:
FOnAudioCaptureFunction=TFunction<void(const void*, int32, int32, int32, double, bool)> - Cast
const void*toconst float*inside — device sends float32 interleaved
- Procedural audio playback:
USoundWaveProcedural+OnSoundWaveProceduralUnderflowdelegate - Audio capture callbacks arrive on a background thread — always marshal to game thread with
AsyncTask(ENamedThreads::GameThread, ...) - Resample mic audio to 16000 Hz mono before sending to ElevenLabs
TArray::RemoveAt(idx, count, EAllowShrinking::No)— bool overload deprecated in UE 5.5
Plugin Status
- PS_AI_Agent_ElevenLabs: compiles cleanly on UE 5.5 Win64 (verified 2026-02-19)
- v1.5.0 — mic audio chunk size fixed: WASAPI 5ms callbacks accumulated to 100ms before sending
- v1.4.0 — push-to-talk fully fixed: bAutoStartListening now ignored in Client turn mode
- Binary WS frame handling implemented (ElevenLabs sends ALL frames as binary, not text)
- First-byte discrimination:
{= JSON control message, else = raw PCM audio SendTextMessage()added to both WebSocketProxy and ConversationalAgentComponentconversation_initiation_client_datanow sent immediately on WS connect (required for mic + latency)
Audio Chunk Size — CRITICAL
- WASAPI fires mic callbacks every ~5ms → 158 bytes at 16kHz 16-bit mono
- ElevenLabs VAD/STT requires ≥3200 bytes (100ms) per chunk; smaller chunks are silently ignored
- Fix:
MicAccumulationBufferin component accumulates chunks; sends only when>= MicChunkMinBytes(3200) StopListening()flushes remainder so final partial chunk is never dropped before end-of-turn
ElevenLabs WebSocket Protocol Notes
- ALL frames are binary — bind ONLY
OnRawMessage; NEVER bindOnMessage(text) — UE fires both for same frame → double audio bug - Binary frame discrimination: peek byte[0] →
'{'(0x7B) = JSON, else = raw PCM audio - Fragment reassembly: accumulate into
BinaryFrameBufferuntilBytesRemaining == 0 - Pong:
{"type":"pong","event_id":N}—event_idis top-level, NOT nested - Transcript: type=
user_transcript, key=user_transcription_event, field=user_transcript - Client turn mode (
client_vad): senduser_activitywith every audio chunk (not just once) — server needs continuous signal to know user is speaking; stopping chunks = silence detected = agent responds - Text input:
{"type":"user_message","text":"..."}— agent replies with audio + text - MUST send
conversation_initiation_client_dataimmediately after WS connect — without it, server won't process client audio (mic appears dead) conversation_initiation_client_datapayload:conversation_config_override.agent.turn.mode,conversation_config_override.tts.optimize_streaming_latency,custom_llm_extra_body.enable_intermediate_responseenable_intermediate_response: trueincustom_llm_extra_bodyreduces time-to-first-audio latency
API Keys / Secrets
- ElevenLabs API key is set in Project Settings → Plugins → ElevenLabs AI Agent in the Editor
- UE saves it to
DefaultEngine.iniunder[/Script/PS_AI_Agent_ElevenLabs.ElevenLabsSettings] - The key is stripped from
DefaultEngine.inibefore every commit — do not commit it - Each developer sets the key locally; it does not go in git
Claude Memory Files in This Repo
| File | Contents |
|---|---|
.claude/MEMORY.md |
This file — project structure, patterns, status |
.claude/elevenlabs_plugin.md |
Plugin file map, ElevenLabs WS protocol, design decisions |
.claude/elevenlabs_api_reference.md |
Full ElevenLabs API reference (WS messages, REST, signed URL, Agent ID location) |
.claude/project_context.md |
Original ask, intent, short/long-term goals |
.claude/session_log_2026-02-19.md |
Full session record: steps, commits, technical decisions, next steps |
.claude/PS_AI_Agent_ElevenLabs_Documentation.md |
User-facing Markdown reference doc |