j.foucher f0055e85ed Add PS_AI_Agent_ElevenLabs plugin (initial implementation)

Adds a new UE5.5 plugin integrating the ElevenLabs Conversational AI Agent
via WebSocket. No gRPC or third-party libs required.

Plugin components:
- UElevenLabsSettings: API key + Agent ID in Project Settings
- UElevenLabsWebSocketProxy: full WS session lifecycle, JSON message handling,
  ping/pong keepalive, Base64 PCM audio send/receive
- UElevenLabsConversationalAgentComponent: ActorComponent for NPC voice
  conversation, orchestrates mic capture -> WS -> procedural audio playback
- UElevenLabsMicrophoneCaptureComponent: wraps Audio::FAudioCapture,
  resamples to 16kHz mono, dispatches on game thread

Also adds .claude/ memory files (project context, plugin notes, patterns)
so Claude Code can restore full context on any machine after a git pull.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-19 12:57:48 +01:00

3.1 KiB

Raw Blame History

PS_AI_Agent_ElevenLabs Plugin

Location

Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/

File Map

PS_AI_Agent_ElevenLabs.uplugin
Source/PS_AI_Agent_ElevenLabs/
  PS_AI_Agent_ElevenLabs.Build.cs
  Public/
    PS_AI_Agent_ElevenLabs.h                  – FPS_AI_Agent_ElevenLabsModule + UElevenLabsSettings
    ElevenLabsDefinitions.h                   – Enums, structs, ElevenLabsMessageType/Audio constants
    ElevenLabsWebSocketProxy.h/.cpp           – UObject managing one WS session
    ElevenLabsConversationalAgentComponent.h/.cpp  – Main ActorComponent (attach to NPC)
    ElevenLabsMicrophoneCaptureComponent.h/.cpp    – Mic capture, resample, dispatch to game thread
  Private/
    (implementations of the above)

ElevenLabs Conversational AI Protocol

WebSocket URL: wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<ID>
Auth: HTTP upgrade header xi-api-key: <key> (set in Project Settings)
All frames: JSON text (no binary frames used by the API)
Audio format: PCM 16-bit signed, 16000 Hz, mono, little-endian — Base64-encoded in JSON

Client → Server messages

Type field value	Payload
(none – key is the type) `user_audio_chunk`	`{ "user_audio_chunk": "<base64 PCM>" }`
`user_turn_start`	`{ "type": "user_turn_start" }`
`user_turn_end`	`{ "type": "user_turn_end" }`
`interrupt`	`{ "type": "interrupt" }`
`pong`	`{ "type": "pong", "pong_event": { "event_id": N } }`

Server → Client messages (field: `type`)

type value	Key nested object	Notes
`conversation_initiation_metadata`	`conversation_initiation_metadata_event.conversation_id`	Marks WS ready
`audio`	`audio_event.audio_base_64`	Base64 PCM from agent
`transcript`	`transcript_event.{speaker, message, is_final}`	User or agent speech
`agent_response`	`agent_response_event.agent_response`	Final agent text
`interruption`	—	Agent stopped mid-sentence
`ping`	`ping_event.event_id`	Must reply with pong

Key Design Decisions

No gRPC / no ThirdParty libs — pure UE WebSockets + HTTP, builds out of the box
Audio resampled in-plugin: device rate → 16000 Hz mono (linear interpolation)
USoundWaveProcedural for real-time agent audio playback (queue-driven)
Silence heuristic: 30 game-thread ticks (~0.5 s at 60 fps) with no new audio → agent done speaking
bSignedURLMode setting: fetch a signed WS URL from your own backend (keeps API key off client)
Two turn modes: Server VAD (ElevenLabs detects speech end) and Client Controlled (push-to-talk)

Build Dependencies (Build.cs)

Core, CoreUObject, Engine, InputCore, Json, JsonUtilities, WebSockets, HTTP, AudioMixer, AudioCaptureCore, AudioCapture, Voice, SignalProcessing

Status

Session 1 (2026-02-19): All source files written, registered in .uproject. Not yet compiled.
TODO: Open in UE 5.5 Editor → compile → test basic WS connection with a test agent ID.
Watch out: Verify USoundWaveProcedural::OnSoundWaveProceduralUnderflow delegate signature vs UE 5.5 API.

3.1 KiB Raw Blame History Unescape Escape