PS_AI_Agent/.claude/project_context.md

# Project Context & Original Ask

## What the user wants to build

A **UE5 plugin** that integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5,
allowing an in-game NPC (or any Actor) to hold a real-time voice conversation with a player.

### The original request (paraphrased)
> "I want to create a plugin to use ElevenLabs Conversational Agent in Unreal Engine 5.5.
> I previously used the Convai plugin which does what I want, but I prefer ElevenLabs quality.
> The goal is to create a plugin in the existing Unreal Project to make a first step for integration.
> Convai AI plugin may be too big in terms of functionality for the new project, but it is the final goal.
> You can use the Convai source code to find the right way to make the ElevenLabs version —
> it should be very similar."

### Plugin name
`PS_AI_Agent_ElevenLabs`

---

## User's mental model / intent

1. **Short-term**: A working first-step plugin — minimal but functional — that can:
   - Connect to ElevenLabs Conversational AI via WebSocket
   - Capture microphone audio from the player
   - Stream it to ElevenLabs in real time
   - Play back the agent's voice response
   - Surface key events (transcript, agent text, speaking state) to Blueprint

2. **Long-term**: Match the full feature set of Convai — character IDs, session memory,
   actions/environment context, lip-sync, etc. — but powered by ElevenLabs instead.

3. **Key preference**: Simpler than Convai. No gRPC, no protobuf, no ThirdParty precompiled
   libraries. ElevenLabs' Conversational AI API uses plain WebSocket + JSON, which maps
   naturally to UE's built-in `WebSockets` module.

---

## How we used Convai as a reference

We studied the Convai plugin source (`ConvAI/Convai/`) to understand:
- **Module structure**: `UConvaiSettings` + `IModuleInterface` + `ISettingsModule` registration
- **Audio capture pattern**: `Audio::FAudioCapture`, ring buffers, thread-safe dispatch to game thread
- **Audio playback pattern**: `USoundWaveProcedural` fed from a queue
- **Component architecture**: `UConvaiChatbotComponent` (NPC side) + `UConvaiPlayerComponent` (player side)
- **HTTP proxy pattern**: `UConvaiAPIBaseProxy` base class for async REST calls
- **Voice type enum**: Convai already had `EVoiceType::ElevenLabsVoices` — confirming ElevenLabs
  is a natural fit

We then replaced gRPC/protobuf with **WebSocket + JSON** to match the ElevenLabs API, and
simplified the architecture to the minimum needed for a first working version.

---

## What was built (Session 1 — 2026-02-19)

All source files created and registered. See `.claude/elevenlabs_plugin.md` for full file map and protocol details.

### Components created
| Class | Role |
|---|---|
| `UElevenLabsSettings` | Project Settings UI — API key, Agent ID, security options |
| `UElevenLabsWebSocketProxy` | Manages one WS session: connect, send audio, handle all server message types |
| `UElevenLabsConversationalAgentComponent` | ActorComponent to attach to any NPC — orchestrates mic + WS + playback |
| `UElevenLabsMicrophoneCaptureComponent` | Wraps `Audio::FAudioCapture`, resamples to 16 kHz mono |

### Not yet done (next sessions)
- Compile & test in UE 5.5 Editor
- Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature for UE 5.5
- Add lip-sync support (future)
- Add session memory / conversation history (future)
- Add environment/action context support (future, matching Convai's full feature set)

---

## Notes on the ElevenLabs API
- Docs: https://elevenlabs.io/docs/conversational-ai
- Create agents at: https://elevenlabs.io/app/conversational-ai
- API keys at: https://elevenlabs.io (dashboard)