j.foucher b888f7fcb6 Update memory: document v1.5.0 mic chunk size fix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-19 18:42:47 +01:00

5.9 KiB

Raw Blame History

Project Memory – PS_AI_Agent

This file is committed to the repository so it is available on any machine. Claude Code reads it automatically at session start (via the auto-memory system) when the working directory is inside this repo. Keep it under ~180 lines – lines beyond 200 are truncated by the system.

Project Location

Repo root: <repo_root>/ (wherever this is cloned)
UE5 project: <repo_root>/Unreal/PS_AI_Agent/
.uproject: <repo_root>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject
Engine: Unreal Engine 5.5 — Win64 primary target
Default test map: /Game/TestMap.TestMap

Plugins

Plugin	Path	Purpose
Convai (reference)	`<repo_root>/ConvAI/Convai/`	gRPC + protobuf streaming to Convai API. Has ElevenLabs voice type enum in `ConvaiDefinitions.h`. Used as architectural reference.
PS_AI_Agent_ElevenLabs	`<repo_root>/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/`	Our ElevenLabs Conversational AI integration. See `.claude/elevenlabs_plugin.md` for full details.

User Preferences

Plugin naming: PS_AI_Agent_<Service> (e.g. PS_AI_Agent_ElevenLabs)
Save memory frequently during long sessions
Goal: ElevenLabs Conversational AI integration — simpler than Convai, no gRPC
Full original ask + intent: see .claude/project_context.md
Git remote is a private server — no public exposure risk

Key UE5 Plugin Patterns

Settings object: UCLASS(config=Engine, defaultconfig) inheriting UObject, registered via ISettingsModule
Module startup: NewObject<USettings>(..., RF_Standalone) + AddToRoot()
WebSocket: FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), Headers)
- WebSockets is a module (Build.cs only) — NOT a plugin, don't put it in .uplugin
Audio capture: Audio::FAudioCapture::OpenAudioCaptureStream() (UE 5.3+, replaces deprecated OpenCaptureStream)
- AudioCapture IS a plugin — declare it in .uplugin Plugins array
- Callback type: FOnAudioCaptureFunction = TFunction<void(const void*, int32, int32, int32, double, bool)>
- Cast const void* to const float* inside — device sends float32 interleaved
Procedural audio playback: USoundWaveProcedural + OnSoundWaveProceduralUnderflow delegate
Audio capture callbacks arrive on a background thread — always marshal to game thread with AsyncTask(ENamedThreads::GameThread, ...)
Resample mic audio to 16000 Hz mono before sending to ElevenLabs
TArray::RemoveAt(idx, count, EAllowShrinking::No) — bool overload deprecated in UE 5.5

Plugin Status

PS_AI_Agent_ElevenLabs: compiles cleanly on UE 5.5 Win64 (verified 2026-02-19)
v1.5.0 — mic audio chunk size fixed: WASAPI 5ms callbacks accumulated to 100ms before sending
v1.4.0 — push-to-talk fully fixed: bAutoStartListening now ignored in Client turn mode
Binary WS frame handling implemented (ElevenLabs sends ALL frames as binary, not text)
First-byte discrimination: { = JSON control message, else = raw PCM audio
SendTextMessage() added to both WebSocketProxy and ConversationalAgentComponent
conversation_initiation_client_data now sent immediately on WS connect (required for mic + latency)

Audio Chunk Size — CRITICAL

WASAPI fires mic callbacks every ~5ms → 158 bytes at 16kHz 16-bit mono
ElevenLabs VAD/STT requires ≥3200 bytes (100ms) per chunk; smaller chunks are silently ignored
Fix: MicAccumulationBuffer in component accumulates chunks; sends only when >= MicChunkMinBytes (3200)
StopListening() flushes remainder so final partial chunk is never dropped before end-of-turn

ElevenLabs WebSocket Protocol Notes

ALL frames are binary — bind ONLY OnRawMessage; NEVER bind OnMessage (text) — UE fires both for same frame → double audio bug
Binary frame discrimination: peek byte[0] → '{' (0x7B) = JSON, else = raw PCM audio
Fragment reassembly: accumulate into BinaryFrameBuffer until BytesRemaining == 0
Pong: {"type":"pong","event_id":N} — event_id is top-level, NOT nested
Transcript: type=user_transcript, key=user_transcription_event, field=user_transcript
Client turn mode (client_vad): send user_activity with every audio chunk (not just once) — server needs continuous signal to know user is speaking; stopping chunks = silence detected = agent responds
Text input: {"type":"user_message","text":"..."} — agent replies with audio + text
MUST send conversation_initiation_client_data immediately after WS connect — without it, server won't process client audio (mic appears dead)
conversation_initiation_client_data payload: conversation_config_override.agent.turn.mode, conversation_config_override.tts.optimize_streaming_latency, custom_llm_extra_body.enable_intermediate_response
enable_intermediate_response: true in custom_llm_extra_body reduces time-to-first-audio latency

API Keys / Secrets

ElevenLabs API key is set in Project Settings → Plugins → ElevenLabs AI Agent in the Editor
UE saves it to DefaultEngine.ini under [/Script/PS_AI_Agent_ElevenLabs.ElevenLabsSettings]
The key is stripped from DefaultEngine.ini before every commit — do not commit it
Each developer sets the key locally; it does not go in git

Claude Memory Files in This Repo

File	Contents
`.claude/MEMORY.md`	This file — project structure, patterns, status
`.claude/elevenlabs_plugin.md`	Plugin file map, ElevenLabs WS protocol, design decisions
`.claude/elevenlabs_api_reference.md`	Full ElevenLabs API reference (WS messages, REST, signed URL, Agent ID location)
`.claude/project_context.md`	Original ask, intent, short/long-term goals
`.claude/session_log_2026-02-19.md`	Full session record: steps, commits, technical decisions, next steps
`.claude/PS_AI_Agent_ElevenLabs_Documentation.md`	User-facing Markdown reference doc

5.9 KiB Raw Blame History Unescape Escape

Project Memory – PS_AI_Agent

Project Location

Plugins

User Preferences

Key UE5 Plugin Patterns

Plugin Status

Audio Chunk Size — CRITICAL

ElevenLabs WebSocket Protocol Notes

API Keys / Secrets

Claude Memory Files in This Repo

5.9 KiB

Raw Blame History