PS_AI_Agent/.claude/MEMORY.md
j.foucher b888f7fcb6 Update memory: document v1.5.0 mic chunk size fix
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 18:42:47 +01:00

5.9 KiB
Raw Blame History

Project Memory PS_AI_Agent

This file is committed to the repository so it is available on any machine. Claude Code reads it automatically at session start (via the auto-memory system) when the working directory is inside this repo. Keep it under ~180 lines lines beyond 200 are truncated by the system.


Project Location

  • Repo root: <repo_root>/ (wherever this is cloned)
  • UE5 project: <repo_root>/Unreal/PS_AI_Agent/
  • .uproject: <repo_root>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject
  • Engine: Unreal Engine 5.5 — Win64 primary target
  • Default test map: /Game/TestMap.TestMap

Plugins

Plugin Path Purpose
Convai (reference) <repo_root>/ConvAI/Convai/ gRPC + protobuf streaming to Convai API. Has ElevenLabs voice type enum in ConvaiDefinitions.h. Used as architectural reference.
PS_AI_Agent_ElevenLabs <repo_root>/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/ Our ElevenLabs Conversational AI integration. See .claude/elevenlabs_plugin.md for full details.

User Preferences

  • Plugin naming: PS_AI_Agent_<Service> (e.g. PS_AI_Agent_ElevenLabs)
  • Save memory frequently during long sessions
  • Goal: ElevenLabs Conversational AI integration — simpler than Convai, no gRPC
  • Full original ask + intent: see .claude/project_context.md
  • Git remote is a private server — no public exposure risk

Key UE5 Plugin Patterns

  • Settings object: UCLASS(config=Engine, defaultconfig) inheriting UObject, registered via ISettingsModule
  • Module startup: NewObject<USettings>(..., RF_Standalone) + AddToRoot()
  • WebSocket: FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), Headers)
    • WebSockets is a module (Build.cs only) — NOT a plugin, don't put it in .uplugin
  • Audio capture: Audio::FAudioCapture::OpenAudioCaptureStream() (UE 5.3+, replaces deprecated OpenCaptureStream)
    • AudioCapture IS a plugin — declare it in .uplugin Plugins array
    • Callback type: FOnAudioCaptureFunction = TFunction<void(const void*, int32, int32, int32, double, bool)>
    • Cast const void* to const float* inside — device sends float32 interleaved
  • Procedural audio playback: USoundWaveProcedural + OnSoundWaveProceduralUnderflow delegate
  • Audio capture callbacks arrive on a background thread — always marshal to game thread with AsyncTask(ENamedThreads::GameThread, ...)
  • Resample mic audio to 16000 Hz mono before sending to ElevenLabs
  • TArray::RemoveAt(idx, count, EAllowShrinking::No) — bool overload deprecated in UE 5.5

Plugin Status

  • PS_AI_Agent_ElevenLabs: compiles cleanly on UE 5.5 Win64 (verified 2026-02-19)
  • v1.5.0 — mic audio chunk size fixed: WASAPI 5ms callbacks accumulated to 100ms before sending
  • v1.4.0 — push-to-talk fully fixed: bAutoStartListening now ignored in Client turn mode
  • Binary WS frame handling implemented (ElevenLabs sends ALL frames as binary, not text)
  • First-byte discrimination: { = JSON control message, else = raw PCM audio
  • SendTextMessage() added to both WebSocketProxy and ConversationalAgentComponent
  • conversation_initiation_client_data now sent immediately on WS connect (required for mic + latency)

Audio Chunk Size — CRITICAL

  • WASAPI fires mic callbacks every ~5ms → 158 bytes at 16kHz 16-bit mono
  • ElevenLabs VAD/STT requires ≥3200 bytes (100ms) per chunk; smaller chunks are silently ignored
  • Fix: MicAccumulationBuffer in component accumulates chunks; sends only when >= MicChunkMinBytes (3200)
  • StopListening() flushes remainder so final partial chunk is never dropped before end-of-turn

ElevenLabs WebSocket Protocol Notes

  • ALL frames are binary — bind ONLY OnRawMessage; NEVER bind OnMessage (text) — UE fires both for same frame → double audio bug
  • Binary frame discrimination: peek byte[0] → '{' (0x7B) = JSON, else = raw PCM audio
  • Fragment reassembly: accumulate into BinaryFrameBuffer until BytesRemaining == 0
  • Pong: {"type":"pong","event_id":N}event_id is top-level, NOT nested
  • Transcript: type=user_transcript, key=user_transcription_event, field=user_transcript
  • Client turn mode (client_vad): send user_activity with every audio chunk (not just once) — server needs continuous signal to know user is speaking; stopping chunks = silence detected = agent responds
  • Text input: {"type":"user_message","text":"..."} — agent replies with audio + text
  • MUST send conversation_initiation_client_data immediately after WS connect — without it, server won't process client audio (mic appears dead)
  • conversation_initiation_client_data payload: conversation_config_override.agent.turn.mode, conversation_config_override.tts.optimize_streaming_latency, custom_llm_extra_body.enable_intermediate_response
  • enable_intermediate_response: true in custom_llm_extra_body reduces time-to-first-audio latency

API Keys / Secrets

  • ElevenLabs API key is set in Project Settings → Plugins → ElevenLabs AI Agent in the Editor
  • UE saves it to DefaultEngine.ini under [/Script/PS_AI_Agent_ElevenLabs.ElevenLabsSettings]
  • The key is stripped from DefaultEngine.ini before every commit — do not commit it
  • Each developer sets the key locally; it does not go in git

Claude Memory Files in This Repo

File Contents
.claude/MEMORY.md This file — project structure, patterns, status
.claude/elevenlabs_plugin.md Plugin file map, ElevenLabs WS protocol, design decisions
.claude/elevenlabs_api_reference.md Full ElevenLabs API reference (WS messages, REST, signed URL, Agent ID location)
.claude/project_context.md Original ask, intent, short/long-term goals
.claude/session_log_2026-02-19.md Full session record: steps, commits, technical decisions, next steps
.claude/PS_AI_Agent_ElevenLabs_Documentation.md User-facing Markdown reference doc