Adds a new UE5.5 plugin integrating the ElevenLabs Conversational AI Agent via WebSocket. No gRPC or third-party libs required. Plugin components: - UElevenLabsSettings: API key + Agent ID in Project Settings - UElevenLabsWebSocketProxy: full WS session lifecycle, JSON message handling, ping/pong keepalive, Base64 PCM audio send/receive - UElevenLabsConversationalAgentComponent: ActorComponent for NPC voice conversation, orchestrates mic capture -> WS -> procedural audio playback - UElevenLabsMicrophoneCaptureComponent: wraps Audio::FAudioCapture, resamples to 16kHz mono, dispatches on game thread Also adds .claude/ memory files (project context, plugin notes, patterns) so Claude Code can restore full context on any machine after a git pull. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3.1 KiB
3.1 KiB
PS_AI_Agent_ElevenLabs Plugin
Location
Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/
File Map
PS_AI_Agent_ElevenLabs.uplugin
Source/PS_AI_Agent_ElevenLabs/
PS_AI_Agent_ElevenLabs.Build.cs
Public/
PS_AI_Agent_ElevenLabs.h – FPS_AI_Agent_ElevenLabsModule + UElevenLabsSettings
ElevenLabsDefinitions.h – Enums, structs, ElevenLabsMessageType/Audio constants
ElevenLabsWebSocketProxy.h/.cpp – UObject managing one WS session
ElevenLabsConversationalAgentComponent.h/.cpp – Main ActorComponent (attach to NPC)
ElevenLabsMicrophoneCaptureComponent.h/.cpp – Mic capture, resample, dispatch to game thread
Private/
(implementations of the above)
ElevenLabs Conversational AI Protocol
- WebSocket URL:
wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<ID> - Auth: HTTP upgrade header
xi-api-key: <key>(set in Project Settings) - All frames: JSON text (no binary frames used by the API)
- Audio format: PCM 16-bit signed, 16000 Hz, mono, little-endian — Base64-encoded in JSON
Client → Server messages
| Type field value | Payload |
|---|---|
(none – key is the type) user_audio_chunk |
{ "user_audio_chunk": "<base64 PCM>" } |
user_turn_start |
{ "type": "user_turn_start" } |
user_turn_end |
{ "type": "user_turn_end" } |
interrupt |
{ "type": "interrupt" } |
pong |
{ "type": "pong", "pong_event": { "event_id": N } } |
Server → Client messages (field: type)
| type value | Key nested object | Notes |
|---|---|---|
conversation_initiation_metadata |
conversation_initiation_metadata_event.conversation_id |
Marks WS ready |
audio |
audio_event.audio_base_64 |
Base64 PCM from agent |
transcript |
transcript_event.{speaker, message, is_final} |
User or agent speech |
agent_response |
agent_response_event.agent_response |
Final agent text |
interruption |
— | Agent stopped mid-sentence |
ping |
ping_event.event_id |
Must reply with pong |
Key Design Decisions
- No gRPC / no ThirdParty libs — pure UE WebSockets + HTTP, builds out of the box
- Audio resampled in-plugin: device rate → 16000 Hz mono (linear interpolation)
USoundWaveProceduralfor real-time agent audio playback (queue-driven)- Silence heuristic: 30 game-thread ticks (~0.5 s at 60 fps) with no new audio → agent done speaking
bSignedURLModesetting: fetch a signed WS URL from your own backend (keeps API key off client)- Two turn modes:
Server VAD(ElevenLabs detects speech end) andClient Controlled(push-to-talk)
Build Dependencies (Build.cs)
Core, CoreUObject, Engine, InputCore, Json, JsonUtilities, WebSockets, HTTP, AudioMixer, AudioCaptureCore, AudioCapture, Voice, SignalProcessing
Status
- Session 1 (2026-02-19): All source files written, registered in .uproject. Not yet compiled.
- TODO: Open in UE 5.5 Editor → compile → test basic WS connection with a test agent ID.
- Watch out: Verify
USoundWaveProcedural::OnSoundWaveProceduralUnderflowdelegate signature vs UE 5.5 API.