# PS_AI_Agent_ElevenLabs Plugin ## Location `Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/` ## File Map ``` PS_AI_Agent_ElevenLabs.uplugin Source/PS_AI_Agent_ElevenLabs/ PS_AI_Agent_ElevenLabs.Build.cs Public/ PS_AI_Agent_ElevenLabs.h – FPS_AI_Agent_ElevenLabsModule + UElevenLabsSettings ElevenLabsDefinitions.h – Enums, structs, ElevenLabsMessageType/Audio constants ElevenLabsWebSocketProxy.h/.cpp – UObject managing one WS session ElevenLabsConversationalAgentComponent.h/.cpp – Main ActorComponent (attach to NPC) ElevenLabsMicrophoneCaptureComponent.h/.cpp – Mic capture, resample, dispatch to game thread Private/ (implementations of the above) ``` ## ElevenLabs Conversational AI Protocol - **WebSocket URL**: `wss://api.elevenlabs.io/v1/convai/conversation?agent_id=` - **Auth**: HTTP upgrade header `xi-api-key: ` (set in Project Settings) - **All frames**: JSON text (no binary frames used by the API) - **Audio format**: PCM 16-bit signed, 16000 Hz, mono, little-endian — Base64-encoded in JSON ### Client → Server messages | Type field value | Payload | |---|---| | *(none – key is the type)* `user_audio_chunk` | `{ "user_audio_chunk": "" }` | | `user_turn_start` | `{ "type": "user_turn_start" }` | | `user_turn_end` | `{ "type": "user_turn_end" }` | | `interrupt` | `{ "type": "interrupt" }` | | `pong` | `{ "type": "pong", "pong_event": { "event_id": N } }` | ### Server → Client messages (field: `type`) | type value | Key nested object | Notes | |---|---|---| | `conversation_initiation_metadata` | `conversation_initiation_metadata_event.conversation_id` | Marks WS ready | | `audio` | `audio_event.audio_base_64` | Base64 PCM from agent | | `transcript` | `transcript_event.{speaker, message, is_final}` | User or agent speech | | `agent_response` | `agent_response_event.agent_response` | Final agent text | | `interruption` | — | Agent stopped mid-sentence | | `ping` | `ping_event.event_id` | Must reply with pong | ## Key Design Decisions - **No gRPC / no ThirdParty libs** — pure UE WebSockets + HTTP, builds out of the box - Audio resampled in-plugin: device rate → 16000 Hz mono (linear interpolation) - `USoundWaveProcedural` for real-time agent audio playback (queue-driven) - Silence heuristic: 30 game-thread ticks (~0.5 s at 60 fps) with no new audio → agent done speaking - `bSignedURLMode` setting: fetch a signed WS URL from your own backend (keeps API key off client) - Two turn modes: `Server VAD` (ElevenLabs detects speech end) and `Client Controlled` (push-to-talk) ## Build Dependencies (Build.cs) Core, CoreUObject, Engine, InputCore, Json, JsonUtilities, WebSockets, HTTP, AudioMixer, AudioCaptureCore, AudioCapture, Voice, SignalProcessing ## Status - **Session 1** (2026-02-19): All source files written, registered in .uproject. Not yet compiled. - **TODO**: Open in UE 5.5 Editor → compile → test basic WS connection with a test agent ID. - **Watch out**: Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature vs UE 5.5 API.