PS_AI_Agent/.claude/project_context.md
j.foucher f0055e85ed Add PS_AI_Agent_ElevenLabs plugin (initial implementation)
Adds a new UE5.5 plugin integrating the ElevenLabs Conversational AI Agent
via WebSocket. No gRPC or third-party libs required.

Plugin components:
- UElevenLabsSettings: API key + Agent ID in Project Settings
- UElevenLabsWebSocketProxy: full WS session lifecycle, JSON message handling,
  ping/pong keepalive, Base64 PCM audio send/receive
- UElevenLabsConversationalAgentComponent: ActorComponent for NPC voice
  conversation, orchestrates mic capture -> WS -> procedural audio playback
- UElevenLabsMicrophoneCaptureComponent: wraps Audio::FAudioCapture,
  resamples to 16kHz mono, dispatches on game thread

Also adds .claude/ memory files (project context, plugin notes, patterns)
so Claude Code can restore full context on any machine after a git pull.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 12:57:48 +01:00

80 lines
3.6 KiB
Markdown

# Project Context & Original Ask
## What the user wants to build
A **UE5 plugin** that integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5,
allowing an in-game NPC (or any Actor) to hold a real-time voice conversation with a player.
### The original request (paraphrased)
> "I want to create a plugin to use ElevenLabs Conversational Agent in Unreal Engine 5.5.
> I previously used the Convai plugin which does what I want, but I prefer ElevenLabs quality.
> The goal is to create a plugin in the existing Unreal Project to make a first step for integration.
> Convai AI plugin may be too big in terms of functionality for the new project, but it is the final goal.
> You can use the Convai source code to find the right way to make the ElevenLabs version —
> it should be very similar."
### Plugin name
`PS_AI_Agent_ElevenLabs`
---
## User's mental model / intent
1. **Short-term**: A working first-step plugin — minimal but functional — that can:
- Connect to ElevenLabs Conversational AI via WebSocket
- Capture microphone audio from the player
- Stream it to ElevenLabs in real time
- Play back the agent's voice response
- Surface key events (transcript, agent text, speaking state) to Blueprint
2. **Long-term**: Match the full feature set of Convai — character IDs, session memory,
actions/environment context, lip-sync, etc. — but powered by ElevenLabs instead.
3. **Key preference**: Simpler than Convai. No gRPC, no protobuf, no ThirdParty precompiled
libraries. ElevenLabs' Conversational AI API uses plain WebSocket + JSON, which maps
naturally to UE's built-in `WebSockets` module.
---
## How we used Convai as a reference
We studied the Convai plugin source (`ConvAI/Convai/`) to understand:
- **Module structure**: `UConvaiSettings` + `IModuleInterface` + `ISettingsModule` registration
- **Audio capture pattern**: `Audio::FAudioCapture`, ring buffers, thread-safe dispatch to game thread
- **Audio playback pattern**: `USoundWaveProcedural` fed from a queue
- **Component architecture**: `UConvaiChatbotComponent` (NPC side) + `UConvaiPlayerComponent` (player side)
- **HTTP proxy pattern**: `UConvaiAPIBaseProxy` base class for async REST calls
- **Voice type enum**: Convai already had `EVoiceType::ElevenLabsVoices` — confirming ElevenLabs
is a natural fit
We then replaced gRPC/protobuf with **WebSocket + JSON** to match the ElevenLabs API, and
simplified the architecture to the minimum needed for a first working version.
---
## What was built (Session 1 — 2026-02-19)
All source files created and registered. See `.claude/elevenlabs_plugin.md` for full file map and protocol details.
### Components created
| Class | Role |
|---|---|
| `UElevenLabsSettings` | Project Settings UI — API key, Agent ID, security options |
| `UElevenLabsWebSocketProxy` | Manages one WS session: connect, send audio, handle all server message types |
| `UElevenLabsConversationalAgentComponent` | ActorComponent to attach to any NPC — orchestrates mic + WS + playback |
| `UElevenLabsMicrophoneCaptureComponent` | Wraps `Audio::FAudioCapture`, resamples to 16 kHz mono |
### Not yet done (next sessions)
- Compile & test in UE 5.5 Editor
- Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature for UE 5.5
- Add lip-sync support (future)
- Add session memory / conversation history (future)
- Add environment/action context support (future, matching Convai's full feature set)
---
## Notes on the ElevenLabs API
- Docs: https://elevenlabs.io/docs/conversational-ai
- Create agents at: https://elevenlabs.io/app/conversational-ai
- API keys at: https://elevenlabs.io (dashboard)