PS_AI_Agent/.claude/project_context.md
j.foucher f0055e85ed Add PS_AI_Agent_ElevenLabs plugin (initial implementation)
Adds a new UE5.5 plugin integrating the ElevenLabs Conversational AI Agent
via WebSocket. No gRPC or third-party libs required.

Plugin components:
- UElevenLabsSettings: API key + Agent ID in Project Settings
- UElevenLabsWebSocketProxy: full WS session lifecycle, JSON message handling,
  ping/pong keepalive, Base64 PCM audio send/receive
- UElevenLabsConversationalAgentComponent: ActorComponent for NPC voice
  conversation, orchestrates mic capture -> WS -> procedural audio playback
- UElevenLabsMicrophoneCaptureComponent: wraps Audio::FAudioCapture,
  resamples to 16kHz mono, dispatches on game thread

Also adds .claude/ memory files (project context, plugin notes, patterns)
so Claude Code can restore full context on any machine after a git pull.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 12:57:48 +01:00

3.6 KiB

Project Context & Original Ask

What the user wants to build

A UE5 plugin that integrates the ElevenLabs Conversational AI Agent API into Unreal Engine 5.5, allowing an in-game NPC (or any Actor) to hold a real-time voice conversation with a player.

The original request (paraphrased)

"I want to create a plugin to use ElevenLabs Conversational Agent in Unreal Engine 5.5. I previously used the Convai plugin which does what I want, but I prefer ElevenLabs quality. The goal is to create a plugin in the existing Unreal Project to make a first step for integration. Convai AI plugin may be too big in terms of functionality for the new project, but it is the final goal. You can use the Convai source code to find the right way to make the ElevenLabs version — it should be very similar."

Plugin name

PS_AI_Agent_ElevenLabs


User's mental model / intent

  1. Short-term: A working first-step plugin — minimal but functional — that can:

    • Connect to ElevenLabs Conversational AI via WebSocket
    • Capture microphone audio from the player
    • Stream it to ElevenLabs in real time
    • Play back the agent's voice response
    • Surface key events (transcript, agent text, speaking state) to Blueprint
  2. Long-term: Match the full feature set of Convai — character IDs, session memory, actions/environment context, lip-sync, etc. — but powered by ElevenLabs instead.

  3. Key preference: Simpler than Convai. No gRPC, no protobuf, no ThirdParty precompiled libraries. ElevenLabs' Conversational AI API uses plain WebSocket + JSON, which maps naturally to UE's built-in WebSockets module.


How we used Convai as a reference

We studied the Convai plugin source (ConvAI/Convai/) to understand:

  • Module structure: UConvaiSettings + IModuleInterface + ISettingsModule registration
  • Audio capture pattern: Audio::FAudioCapture, ring buffers, thread-safe dispatch to game thread
  • Audio playback pattern: USoundWaveProcedural fed from a queue
  • Component architecture: UConvaiChatbotComponent (NPC side) + UConvaiPlayerComponent (player side)
  • HTTP proxy pattern: UConvaiAPIBaseProxy base class for async REST calls
  • Voice type enum: Convai already had EVoiceType::ElevenLabsVoices — confirming ElevenLabs is a natural fit

We then replaced gRPC/protobuf with WebSocket + JSON to match the ElevenLabs API, and simplified the architecture to the minimum needed for a first working version.


What was built (Session 1 — 2026-02-19)

All source files created and registered. See .claude/elevenlabs_plugin.md for full file map and protocol details.

Components created

Class Role
UElevenLabsSettings Project Settings UI — API key, Agent ID, security options
UElevenLabsWebSocketProxy Manages one WS session: connect, send audio, handle all server message types
UElevenLabsConversationalAgentComponent ActorComponent to attach to any NPC — orchestrates mic + WS + playback
UElevenLabsMicrophoneCaptureComponent Wraps Audio::FAudioCapture, resamples to 16 kHz mono

Not yet done (next sessions)

  • Compile & test in UE 5.5 Editor
  • Verify USoundWaveProcedural::OnSoundWaveProceduralUnderflow delegate signature for UE 5.5
  • Add lip-sync support (future)
  • Add session memory / conversation history (future)
  • Add environment/action context support (future, matching Convai's full feature set)

Notes on the ElevenLabs API