PS_AI_Agent/.claude/elevenlabs_plugin.md
j.foucher f0055e85ed Add PS_AI_Agent_ElevenLabs plugin (initial implementation)
Adds a new UE5.5 plugin integrating the ElevenLabs Conversational AI Agent
via WebSocket. No gRPC or third-party libs required.

Plugin components:
- UElevenLabsSettings: API key + Agent ID in Project Settings
- UElevenLabsWebSocketProxy: full WS session lifecycle, JSON message handling,
  ping/pong keepalive, Base64 PCM audio send/receive
- UElevenLabsConversationalAgentComponent: ActorComponent for NPC voice
  conversation, orchestrates mic capture -> WS -> procedural audio playback
- UElevenLabsMicrophoneCaptureComponent: wraps Audio::FAudioCapture,
  resamples to 16kHz mono, dispatches on game thread

Also adds .claude/ memory files (project context, plugin notes, patterns)
so Claude Code can restore full context on any machine after a git pull.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 12:57:48 +01:00

3.1 KiB
Raw Permalink Blame History

PS_AI_Agent_ElevenLabs Plugin

Location

Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/

File Map

PS_AI_Agent_ElevenLabs.uplugin
Source/PS_AI_Agent_ElevenLabs/
  PS_AI_Agent_ElevenLabs.Build.cs
  Public/
    PS_AI_Agent_ElevenLabs.h                   FPS_AI_Agent_ElevenLabsModule + UElevenLabsSettings
    ElevenLabsDefinitions.h                    Enums, structs, ElevenLabsMessageType/Audio constants
    ElevenLabsWebSocketProxy.h/.cpp            UObject managing one WS session
    ElevenLabsConversationalAgentComponent.h/.cpp   Main ActorComponent (attach to NPC)
    ElevenLabsMicrophoneCaptureComponent.h/.cpp     Mic capture, resample, dispatch to game thread
  Private/
    (implementations of the above)

ElevenLabs Conversational AI Protocol

  • WebSocket URL: wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<ID>
  • Auth: HTTP upgrade header xi-api-key: <key> (set in Project Settings)
  • All frames: JSON text (no binary frames used by the API)
  • Audio format: PCM 16-bit signed, 16000 Hz, mono, little-endian — Base64-encoded in JSON

Client → Server messages

Type field value Payload
(none key is the type) user_audio_chunk { "user_audio_chunk": "<base64 PCM>" }
user_turn_start { "type": "user_turn_start" }
user_turn_end { "type": "user_turn_end" }
interrupt { "type": "interrupt" }
pong { "type": "pong", "pong_event": { "event_id": N } }

Server → Client messages (field: type)

type value Key nested object Notes
conversation_initiation_metadata conversation_initiation_metadata_event.conversation_id Marks WS ready
audio audio_event.audio_base_64 Base64 PCM from agent
transcript transcript_event.{speaker, message, is_final} User or agent speech
agent_response agent_response_event.agent_response Final agent text
interruption Agent stopped mid-sentence
ping ping_event.event_id Must reply with pong

Key Design Decisions

  • No gRPC / no ThirdParty libs — pure UE WebSockets + HTTP, builds out of the box
  • Audio resampled in-plugin: device rate → 16000 Hz mono (linear interpolation)
  • USoundWaveProcedural for real-time agent audio playback (queue-driven)
  • Silence heuristic: 30 game-thread ticks (~0.5 s at 60 fps) with no new audio → agent done speaking
  • bSignedURLMode setting: fetch a signed WS URL from your own backend (keeps API key off client)
  • Two turn modes: Server VAD (ElevenLabs detects speech end) and Client Controlled (push-to-talk)

Build Dependencies (Build.cs)

Core, CoreUObject, Engine, InputCore, Json, JsonUtilities, WebSockets, HTTP, AudioMixer, AudioCaptureCore, AudioCapture, Voice, SignalProcessing

Status

  • Session 1 (2026-02-19): All source files written, registered in .uproject. Not yet compiled.
  • TODO: Open in UE 5.5 Editor → compile → test basic WS connection with a test agent ID.
  • Watch out: Verify USoundWaveProcedural::OnSoundWaveProceduralUnderflow delegate signature vs UE 5.5 API.