Adds a new UE5.5 plugin integrating the ElevenLabs Conversational AI Agent via WebSocket. No gRPC or third-party libs required. Plugin components: - UElevenLabsSettings: API key + Agent ID in Project Settings - UElevenLabsWebSocketProxy: full WS session lifecycle, JSON message handling, ping/pong keepalive, Base64 PCM audio send/receive - UElevenLabsConversationalAgentComponent: ActorComponent for NPC voice conversation, orchestrates mic capture -> WS -> procedural audio playback - UElevenLabsMicrophoneCaptureComponent: wraps Audio::FAudioCapture, resamples to 16kHz mono, dispatches on game thread Also adds .claude/ memory files (project context, plugin notes, patterns) so Claude Code can restore full context on any machine after a git pull. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
62 lines
3.1 KiB
Markdown
62 lines
3.1 KiB
Markdown
# PS_AI_Agent_ElevenLabs Plugin
|
||
|
||
## Location
|
||
`Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/`
|
||
|
||
## File Map
|
||
```
|
||
PS_AI_Agent_ElevenLabs.uplugin
|
||
Source/PS_AI_Agent_ElevenLabs/
|
||
PS_AI_Agent_ElevenLabs.Build.cs
|
||
Public/
|
||
PS_AI_Agent_ElevenLabs.h – FPS_AI_Agent_ElevenLabsModule + UElevenLabsSettings
|
||
ElevenLabsDefinitions.h – Enums, structs, ElevenLabsMessageType/Audio constants
|
||
ElevenLabsWebSocketProxy.h/.cpp – UObject managing one WS session
|
||
ElevenLabsConversationalAgentComponent.h/.cpp – Main ActorComponent (attach to NPC)
|
||
ElevenLabsMicrophoneCaptureComponent.h/.cpp – Mic capture, resample, dispatch to game thread
|
||
Private/
|
||
(implementations of the above)
|
||
```
|
||
|
||
## ElevenLabs Conversational AI Protocol
|
||
- **WebSocket URL**: `wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<ID>`
|
||
- **Auth**: HTTP upgrade header `xi-api-key: <key>` (set in Project Settings)
|
||
- **All frames**: JSON text (no binary frames used by the API)
|
||
- **Audio format**: PCM 16-bit signed, 16000 Hz, mono, little-endian — Base64-encoded in JSON
|
||
|
||
### Client → Server messages
|
||
| Type field value | Payload |
|
||
|---|---|
|
||
| *(none – key is the type)* `user_audio_chunk` | `{ "user_audio_chunk": "<base64 PCM>" }` |
|
||
| `user_turn_start` | `{ "type": "user_turn_start" }` |
|
||
| `user_turn_end` | `{ "type": "user_turn_end" }` |
|
||
| `interrupt` | `{ "type": "interrupt" }` |
|
||
| `pong` | `{ "type": "pong", "pong_event": { "event_id": N } }` |
|
||
|
||
### Server → Client messages (field: `type`)
|
||
| type value | Key nested object | Notes |
|
||
|---|---|---|
|
||
| `conversation_initiation_metadata` | `conversation_initiation_metadata_event.conversation_id` | Marks WS ready |
|
||
| `audio` | `audio_event.audio_base_64` | Base64 PCM from agent |
|
||
| `transcript` | `transcript_event.{speaker, message, is_final}` | User or agent speech |
|
||
| `agent_response` | `agent_response_event.agent_response` | Final agent text |
|
||
| `interruption` | — | Agent stopped mid-sentence |
|
||
| `ping` | `ping_event.event_id` | Must reply with pong |
|
||
|
||
## Key Design Decisions
|
||
- **No gRPC / no ThirdParty libs** — pure UE WebSockets + HTTP, builds out of the box
|
||
- Audio resampled in-plugin: device rate → 16000 Hz mono (linear interpolation)
|
||
- `USoundWaveProcedural` for real-time agent audio playback (queue-driven)
|
||
- Silence heuristic: 30 game-thread ticks (~0.5 s at 60 fps) with no new audio → agent done speaking
|
||
- `bSignedURLMode` setting: fetch a signed WS URL from your own backend (keeps API key off client)
|
||
- Two turn modes: `Server VAD` (ElevenLabs detects speech end) and `Client Controlled` (push-to-talk)
|
||
|
||
## Build Dependencies (Build.cs)
|
||
Core, CoreUObject, Engine, InputCore, Json, JsonUtilities, WebSockets, HTTP,
|
||
AudioMixer, AudioCaptureCore, AudioCapture, Voice, SignalProcessing
|
||
|
||
## Status
|
||
- **Session 1** (2026-02-19): All source files written, registered in .uproject. Not yet compiled.
|
||
- **TODO**: Open in UE 5.5 Editor → compile → test basic WS connection with a test agent ID.
|
||
- **Watch out**: Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature vs UE 5.5 API.
|