# PS_AI_Agent_ElevenLabs Plugin

## Location
`Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/`

## File Map
```
PS_AI_Agent_ElevenLabs.uplugin
Source/PS_AI_Agent_ElevenLabs/
  PS_AI_Agent_ElevenLabs.Build.cs
  Public/
    PS_AI_Agent_ElevenLabs.h                  – FPS_AI_Agent_ElevenLabsModule + UElevenLabsSettings
    ElevenLabsDefinitions.h                   – Enums, structs, ElevenLabsMessageType/Audio constants
    ElevenLabsWebSocketProxy.h/.cpp           – UObject managing one WS session
    ElevenLabsConversationalAgentComponent.h/.cpp  – Main ActorComponent (attach to NPC)
    ElevenLabsMicrophoneCaptureComponent.h/.cpp    – Mic capture, resample, dispatch to game thread
  Private/
    (implementations of the above)
```

## ElevenLabs Conversational AI Protocol
- **WebSocket URL**: `wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<ID>`
- **Auth**: HTTP upgrade header `xi-api-key: <key>` (set in Project Settings)
- **All frames**: JSON text (no binary frames used by the API)
- **Audio format**: PCM 16-bit signed, 16000 Hz, mono, little-endian — Base64-encoded in JSON

### Client → Server messages
| Type field value | Payload |
|---|---|
| *(none – key is the type)* `user_audio_chunk` | `{ "user_audio_chunk": "<base64 PCM>" }` |
| `user_turn_start` | `{ "type": "user_turn_start" }` |
| `user_turn_end` | `{ "type": "user_turn_end" }` |
| `interrupt` | `{ "type": "interrupt" }` |
| `pong` | `{ "type": "pong", "pong_event": { "event_id": N } }` |

### Server → Client messages (field: `type`)
| type value | Key nested object | Notes |
|---|---|---|
| `conversation_initiation_metadata` | `conversation_initiation_metadata_event.conversation_id` | Marks WS ready |
| `audio` | `audio_event.audio_base_64` | Base64 PCM from agent |
| `transcript` | `transcript_event.{speaker, message, is_final}` | User or agent speech |
| `agent_response` | `agent_response_event.agent_response` | Final agent text |
| `interruption` | — | Agent stopped mid-sentence |
| `ping` | `ping_event.event_id` | Must reply with pong |

## Key Design Decisions
- **No gRPC / no ThirdParty libs** — pure UE WebSockets + HTTP, builds out of the box
- Audio resampled in-plugin: device rate → 16000 Hz mono (linear interpolation)
- `USoundWaveProcedural` for real-time agent audio playback (queue-driven)
- Silence heuristic: 30 game-thread ticks (~0.5 s at 60 fps) with no new audio → agent done speaking
- `bSignedURLMode` setting: fetch a signed WS URL from your own backend (keeps API key off client)
- Two turn modes: `Server VAD` (ElevenLabs detects speech end) and `Client Controlled` (push-to-talk)

## Build Dependencies (Build.cs)
Core, CoreUObject, Engine, InputCore, Json, JsonUtilities, WebSockets, HTTP,
AudioMixer, AudioCaptureCore, AudioCapture, Voice, SignalProcessing

## Status
- **Session 1** (2026-02-19): All source files written, registered in .uproject. Not yet compiled.
- **TODO**: Open in UE 5.5 Editor → compile → test basic WS connection with a test agent ID.
- **Watch out**: Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature vs UE 5.5 API.