# Project Context & Original Ask ## What the user wants to build A **UE5 plugin** that integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5, allowing an in-game NPC (or any Actor) to hold a real-time voice conversation with a player. ### The original request (paraphrased) > "I want to create a plugin to use ElevenLabs Conversational Agent in Unreal Engine 5.5. > I previously used the Convai plugin which does what I want, but I prefer ElevenLabs quality. > The goal is to create a plugin in the existing Unreal Project to make a first step for integration. > Convai AI plugin may be too big in terms of functionality for the new project, but it is the final goal. > You can use the Convai source code to find the right way to make the ElevenLabs version — > it should be very similar." ### Plugin name `PS_AI_Agent_ElevenLabs` --- ## User's mental model / intent 1. **Short-term**: A working first-step plugin — minimal but functional — that can: - Connect to ElevenLabs Conversational AI via WebSocket - Capture microphone audio from the player - Stream it to ElevenLabs in real time - Play back the agent's voice response - Surface key events (transcript, agent text, speaking state) to Blueprint 2. **Long-term**: Match the full feature set of Convai — character IDs, session memory, actions/environment context, lip-sync, etc. — but powered by ElevenLabs instead. 3. **Key preference**: Simpler than Convai. No gRPC, no protobuf, no ThirdParty precompiled libraries. ElevenLabs' Conversational AI API uses plain WebSocket + JSON, which maps naturally to UE's built-in `WebSockets` module. --- ## How we used Convai as a reference We studied the Convai plugin source (`ConvAI/Convai/`) to understand: - **Module structure**: `UConvaiSettings` + `IModuleInterface` + `ISettingsModule` registration - **Audio capture pattern**: `Audio::FAudioCapture`, ring buffers, thread-safe dispatch to game thread - **Audio playback pattern**: `USoundWaveProcedural` fed from a queue - **Component architecture**: `UConvaiChatbotComponent` (NPC side) + `UConvaiPlayerComponent` (player side) - **HTTP proxy pattern**: `UConvaiAPIBaseProxy` base class for async REST calls - **Voice type enum**: Convai already had `EVoiceType::ElevenLabsVoices` — confirming ElevenLabs is a natural fit We then replaced gRPC/protobuf with **WebSocket + JSON** to match the ElevenLabs API, and simplified the architecture to the minimum needed for a first working version. --- ## What was built (Session 1 — 2026-02-19) All source files created and registered. See `.claude/elevenlabs_plugin.md` for full file map and protocol details. ### Components created | Class | Role | |---|---| | `UElevenLabsSettings` | Project Settings UI — API key, Agent ID, security options | | `UElevenLabsWebSocketProxy` | Manages one WS session: connect, send audio, handle all server message types | | `UElevenLabsConversationalAgentComponent` | ActorComponent to attach to any NPC — orchestrates mic + WS + playback | | `UElevenLabsMicrophoneCaptureComponent` | Wraps `Audio::FAudioCapture`, resamples to 16 kHz mono | ### Not yet done (next sessions) - Compile & test in UE 5.5 Editor - Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature for UE 5.5 - Add lip-sync support (future) - Add session memory / conversation history (future) - Add environment/action context support (future, matching Convai's full feature set) --- ## Notes on the ElevenLabs API - Docs: https://elevenlabs.io/docs/conversational-ai - Create agents at: https://elevenlabs.io/app/conversational-ai - API keys at: https://elevenlabs.io (dashboard)