Adds a new UE5.5 plugin integrating the ElevenLabs Conversational AI Agent via WebSocket. No gRPC or third-party libs required. Plugin components: - UElevenLabsSettings: API key + Agent ID in Project Settings - UElevenLabsWebSocketProxy: full WS session lifecycle, JSON message handling, ping/pong keepalive, Base64 PCM audio send/receive - UElevenLabsConversationalAgentComponent: ActorComponent for NPC voice conversation, orchestrates mic capture -> WS -> procedural audio playback - UElevenLabsMicrophoneCaptureComponent: wraps Audio::FAudioCapture, resamples to 16kHz mono, dispatches on game thread Also adds .claude/ memory files (project context, plugin notes, patterns) so Claude Code can restore full context on any machine after a git pull. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
80 lines
3.6 KiB
Markdown
80 lines
3.6 KiB
Markdown
# Project Context & Original Ask
|
|
|
|
## What the user wants to build
|
|
|
|
A **UE5 plugin** that integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5,
|
|
allowing an in-game NPC (or any Actor) to hold a real-time voice conversation with a player.
|
|
|
|
### The original request (paraphrased)
|
|
> "I want to create a plugin to use ElevenLabs Conversational Agent in Unreal Engine 5.5.
|
|
> I previously used the Convai plugin which does what I want, but I prefer ElevenLabs quality.
|
|
> The goal is to create a plugin in the existing Unreal Project to make a first step for integration.
|
|
> Convai AI plugin may be too big in terms of functionality for the new project, but it is the final goal.
|
|
> You can use the Convai source code to find the right way to make the ElevenLabs version —
|
|
> it should be very similar."
|
|
|
|
### Plugin name
|
|
`PS_AI_Agent_ElevenLabs`
|
|
|
|
---
|
|
|
|
## User's mental model / intent
|
|
|
|
1. **Short-term**: A working first-step plugin — minimal but functional — that can:
|
|
- Connect to ElevenLabs Conversational AI via WebSocket
|
|
- Capture microphone audio from the player
|
|
- Stream it to ElevenLabs in real time
|
|
- Play back the agent's voice response
|
|
- Surface key events (transcript, agent text, speaking state) to Blueprint
|
|
|
|
2. **Long-term**: Match the full feature set of Convai — character IDs, session memory,
|
|
actions/environment context, lip-sync, etc. — but powered by ElevenLabs instead.
|
|
|
|
3. **Key preference**: Simpler than Convai. No gRPC, no protobuf, no ThirdParty precompiled
|
|
libraries. ElevenLabs' Conversational AI API uses plain WebSocket + JSON, which maps
|
|
naturally to UE's built-in `WebSockets` module.
|
|
|
|
---
|
|
|
|
## How we used Convai as a reference
|
|
|
|
We studied the Convai plugin source (`ConvAI/Convai/`) to understand:
|
|
- **Module structure**: `UConvaiSettings` + `IModuleInterface` + `ISettingsModule` registration
|
|
- **Audio capture pattern**: `Audio::FAudioCapture`, ring buffers, thread-safe dispatch to game thread
|
|
- **Audio playback pattern**: `USoundWaveProcedural` fed from a queue
|
|
- **Component architecture**: `UConvaiChatbotComponent` (NPC side) + `UConvaiPlayerComponent` (player side)
|
|
- **HTTP proxy pattern**: `UConvaiAPIBaseProxy` base class for async REST calls
|
|
- **Voice type enum**: Convai already had `EVoiceType::ElevenLabsVoices` — confirming ElevenLabs
|
|
is a natural fit
|
|
|
|
We then replaced gRPC/protobuf with **WebSocket + JSON** to match the ElevenLabs API, and
|
|
simplified the architecture to the minimum needed for a first working version.
|
|
|
|
---
|
|
|
|
## What was built (Session 1 — 2026-02-19)
|
|
|
|
All source files created and registered. See `.claude/elevenlabs_plugin.md` for full file map and protocol details.
|
|
|
|
### Components created
|
|
| Class | Role |
|
|
|---|---|
|
|
| `UElevenLabsSettings` | Project Settings UI — API key, Agent ID, security options |
|
|
| `UElevenLabsWebSocketProxy` | Manages one WS session: connect, send audio, handle all server message types |
|
|
| `UElevenLabsConversationalAgentComponent` | ActorComponent to attach to any NPC — orchestrates mic + WS + playback |
|
|
| `UElevenLabsMicrophoneCaptureComponent` | Wraps `Audio::FAudioCapture`, resamples to 16 kHz mono |
|
|
|
|
### Not yet done (next sessions)
|
|
- Compile & test in UE 5.5 Editor
|
|
- Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature for UE 5.5
|
|
- Add lip-sync support (future)
|
|
- Add session memory / conversation history (future)
|
|
- Add environment/action context support (future, matching Convai's full feature set)
|
|
|
|
---
|
|
|
|
## Notes on the ElevenLabs API
|
|
- Docs: https://elevenlabs.io/docs/conversational-ai
|
|
- Create agents at: https://elevenlabs.io/app/conversational-ai
|
|
- API keys at: https://elevenlabs.io (dashboard)
|