PS_AI_Agent/.claude/elevenlabs_plugin.md
j.foucher f0055e85ed Add PS_AI_Agent_ElevenLabs plugin (initial implementation)
Adds a new UE5.5 plugin integrating the ElevenLabs Conversational AI Agent
via WebSocket. No gRPC or third-party libs required.

Plugin components:
- UElevenLabsSettings: API key + Agent ID in Project Settings
- UElevenLabsWebSocketProxy: full WS session lifecycle, JSON message handling,
  ping/pong keepalive, Base64 PCM audio send/receive
- UElevenLabsConversationalAgentComponent: ActorComponent for NPC voice
  conversation, orchestrates mic capture -> WS -> procedural audio playback
- UElevenLabsMicrophoneCaptureComponent: wraps Audio::FAudioCapture,
  resamples to 16kHz mono, dispatches on game thread

Also adds .claude/ memory files (project context, plugin notes, patterns)
so Claude Code can restore full context on any machine after a git pull.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 12:57:48 +01:00

62 lines
3.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# PS_AI_Agent_ElevenLabs Plugin
## Location
`Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/`
## File Map
```
PS_AI_Agent_ElevenLabs.uplugin
Source/PS_AI_Agent_ElevenLabs/
PS_AI_Agent_ElevenLabs.Build.cs
Public/
PS_AI_Agent_ElevenLabs.h FPS_AI_Agent_ElevenLabsModule + UElevenLabsSettings
ElevenLabsDefinitions.h Enums, structs, ElevenLabsMessageType/Audio constants
ElevenLabsWebSocketProxy.h/.cpp UObject managing one WS session
ElevenLabsConversationalAgentComponent.h/.cpp Main ActorComponent (attach to NPC)
ElevenLabsMicrophoneCaptureComponent.h/.cpp Mic capture, resample, dispatch to game thread
Private/
(implementations of the above)
```
## ElevenLabs Conversational AI Protocol
- **WebSocket URL**: `wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<ID>`
- **Auth**: HTTP upgrade header `xi-api-key: <key>` (set in Project Settings)
- **All frames**: JSON text (no binary frames used by the API)
- **Audio format**: PCM 16-bit signed, 16000 Hz, mono, little-endian — Base64-encoded in JSON
### Client → Server messages
| Type field value | Payload |
|---|---|
| *(none key is the type)* `user_audio_chunk` | `{ "user_audio_chunk": "<base64 PCM>" }` |
| `user_turn_start` | `{ "type": "user_turn_start" }` |
| `user_turn_end` | `{ "type": "user_turn_end" }` |
| `interrupt` | `{ "type": "interrupt" }` |
| `pong` | `{ "type": "pong", "pong_event": { "event_id": N } }` |
### Server → Client messages (field: `type`)
| type value | Key nested object | Notes |
|---|---|---|
| `conversation_initiation_metadata` | `conversation_initiation_metadata_event.conversation_id` | Marks WS ready |
| `audio` | `audio_event.audio_base_64` | Base64 PCM from agent |
| `transcript` | `transcript_event.{speaker, message, is_final}` | User or agent speech |
| `agent_response` | `agent_response_event.agent_response` | Final agent text |
| `interruption` | — | Agent stopped mid-sentence |
| `ping` | `ping_event.event_id` | Must reply with pong |
## Key Design Decisions
- **No gRPC / no ThirdParty libs** — pure UE WebSockets + HTTP, builds out of the box
- Audio resampled in-plugin: device rate → 16000 Hz mono (linear interpolation)
- `USoundWaveProcedural` for real-time agent audio playback (queue-driven)
- Silence heuristic: 30 game-thread ticks (~0.5 s at 60 fps) with no new audio → agent done speaking
- `bSignedURLMode` setting: fetch a signed WS URL from your own backend (keeps API key off client)
- Two turn modes: `Server VAD` (ElevenLabs detects speech end) and `Client Controlled` (push-to-talk)
## Build Dependencies (Build.cs)
Core, CoreUObject, Engine, InputCore, Json, JsonUtilities, WebSockets, HTTP,
AudioMixer, AudioCaptureCore, AudioCapture, Voice, SignalProcessing
## Status
- **Session 1** (2026-02-19): All source files written, registered in .uproject. Not yet compiled.
- **TODO**: Open in UE 5.5 Editor → compile → test basic WS connection with a test agent ID.
- **Watch out**: Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature vs UE 5.5 API.