Add PS_AI_Agent_ElevenLabs plugin (initial implementation)
Adds a new UE5.5 plugin integrating the ElevenLabs Conversational AI Agent via WebSocket. No gRPC or third-party libs required. Plugin components: - UElevenLabsSettings: API key + Agent ID in Project Settings - UElevenLabsWebSocketProxy: full WS session lifecycle, JSON message handling, ping/pong keepalive, Base64 PCM audio send/receive - UElevenLabsConversationalAgentComponent: ActorComponent for NPC voice conversation, orchestrates mic capture -> WS -> procedural audio playback - UElevenLabsMicrophoneCaptureComponent: wraps Audio::FAudioCapture, resamples to 16kHz mono, dispatches on game thread Also adds .claude/ memory files (project context, plugin notes, patterns) so Claude Code can restore full context on any machine after a git pull. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
61710c9fde
commit
f0055e85ed
35
.claude/MEMORY.md
Normal file
35
.claude/MEMORY.md
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
# Project Memory – PS_AI_Agent
|
||||||
|
|
||||||
|
> This file is committed to the repository so it is available on any machine.
|
||||||
|
> Claude Code reads it automatically at session start (via the auto-memory system)
|
||||||
|
> when the working directory is inside this repo.
|
||||||
|
> **Keep it under ~180 lines** – lines beyond 200 are truncated by the system.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Project Location
|
||||||
|
- Repo root: `<repo_root>/` (wherever this is cloned)
|
||||||
|
- UE5 project: `<repo_root>/Unreal/PS_AI_Agent/`
|
||||||
|
- `.uproject`: `<repo_root>/Unreal/PS_AI_Agent/PS_AI_Agent.uproject`
|
||||||
|
- Engine: **Unreal Engine 5.5** — Win64 primary target
|
||||||
|
|
||||||
|
## Plugins
|
||||||
|
| Plugin | Path | Purpose |
|
||||||
|
|--------|------|---------|
|
||||||
|
| Convai (reference) | `<repo_root>/ConvAI/Convai/` | gRPC + protobuf streaming to Convai API. Has ElevenLabs voice type enum in `ConvaiDefinitions.h`. Used as architectural reference. |
|
||||||
|
| **PS_AI_Agent_ElevenLabs** | `<repo_root>/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/` | Our ElevenLabs Conversational AI integration. See `.claude/elevenlabs_plugin.md` for full details. |
|
||||||
|
|
||||||
|
## User Preferences
|
||||||
|
- Plugin naming: `PS_AI_Agent_<Service>` (e.g. `PS_AI_Agent_ElevenLabs`)
|
||||||
|
- Save memory frequently during long sessions
|
||||||
|
- Goal: ElevenLabs Conversational AI integration — simpler than Convai, no gRPC
|
||||||
|
- Full original ask + intent: see `.claude/project_context.md`
|
||||||
|
|
||||||
|
## Key UE5 Plugin Patterns
|
||||||
|
- Settings object: `UCLASS(config=Engine, defaultconfig)` inheriting `UObject`, registered via `ISettingsModule`
|
||||||
|
- Module startup: `NewObject<USettings>(..., RF_Standalone)` + `AddToRoot()`
|
||||||
|
- WebSocket: `FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), Headers)`
|
||||||
|
- Audio capture: `Audio::FAudioCapture` from the `AudioCapture` module
|
||||||
|
- Procedural audio playback: `USoundWaveProcedural` + `OnSoundWaveProceduralUnderflow` delegate
|
||||||
|
- Audio capture callbacks arrive on a **background thread** — always marshal to game thread with `AsyncTask(ENamedThreads::GameThread, ...)`
|
||||||
|
- Resample mic audio to **16000 Hz mono** before sending to ElevenLabs
|
||||||
61
.claude/elevenlabs_plugin.md
Normal file
61
.claude/elevenlabs_plugin.md
Normal file
@ -0,0 +1,61 @@
|
|||||||
|
# PS_AI_Agent_ElevenLabs Plugin
|
||||||
|
|
||||||
|
## Location
|
||||||
|
`Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/`
|
||||||
|
|
||||||
|
## File Map
|
||||||
|
```
|
||||||
|
PS_AI_Agent_ElevenLabs.uplugin
|
||||||
|
Source/PS_AI_Agent_ElevenLabs/
|
||||||
|
PS_AI_Agent_ElevenLabs.Build.cs
|
||||||
|
Public/
|
||||||
|
PS_AI_Agent_ElevenLabs.h – FPS_AI_Agent_ElevenLabsModule + UElevenLabsSettings
|
||||||
|
ElevenLabsDefinitions.h – Enums, structs, ElevenLabsMessageType/Audio constants
|
||||||
|
ElevenLabsWebSocketProxy.h/.cpp – UObject managing one WS session
|
||||||
|
ElevenLabsConversationalAgentComponent.h/.cpp – Main ActorComponent (attach to NPC)
|
||||||
|
ElevenLabsMicrophoneCaptureComponent.h/.cpp – Mic capture, resample, dispatch to game thread
|
||||||
|
Private/
|
||||||
|
(implementations of the above)
|
||||||
|
```
|
||||||
|
|
||||||
|
## ElevenLabs Conversational AI Protocol
|
||||||
|
- **WebSocket URL**: `wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<ID>`
|
||||||
|
- **Auth**: HTTP upgrade header `xi-api-key: <key>` (set in Project Settings)
|
||||||
|
- **All frames**: JSON text (no binary frames used by the API)
|
||||||
|
- **Audio format**: PCM 16-bit signed, 16000 Hz, mono, little-endian — Base64-encoded in JSON
|
||||||
|
|
||||||
|
### Client → Server messages
|
||||||
|
| Type field value | Payload |
|
||||||
|
|---|---|
|
||||||
|
| *(none – key is the type)* `user_audio_chunk` | `{ "user_audio_chunk": "<base64 PCM>" }` |
|
||||||
|
| `user_turn_start` | `{ "type": "user_turn_start" }` |
|
||||||
|
| `user_turn_end` | `{ "type": "user_turn_end" }` |
|
||||||
|
| `interrupt` | `{ "type": "interrupt" }` |
|
||||||
|
| `pong` | `{ "type": "pong", "pong_event": { "event_id": N } }` |
|
||||||
|
|
||||||
|
### Server → Client messages (field: `type`)
|
||||||
|
| type value | Key nested object | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| `conversation_initiation_metadata` | `conversation_initiation_metadata_event.conversation_id` | Marks WS ready |
|
||||||
|
| `audio` | `audio_event.audio_base_64` | Base64 PCM from agent |
|
||||||
|
| `transcript` | `transcript_event.{speaker, message, is_final}` | User or agent speech |
|
||||||
|
| `agent_response` | `agent_response_event.agent_response` | Final agent text |
|
||||||
|
| `interruption` | — | Agent stopped mid-sentence |
|
||||||
|
| `ping` | `ping_event.event_id` | Must reply with pong |
|
||||||
|
|
||||||
|
## Key Design Decisions
|
||||||
|
- **No gRPC / no ThirdParty libs** — pure UE WebSockets + HTTP, builds out of the box
|
||||||
|
- Audio resampled in-plugin: device rate → 16000 Hz mono (linear interpolation)
|
||||||
|
- `USoundWaveProcedural` for real-time agent audio playback (queue-driven)
|
||||||
|
- Silence heuristic: 30 game-thread ticks (~0.5 s at 60 fps) with no new audio → agent done speaking
|
||||||
|
- `bSignedURLMode` setting: fetch a signed WS URL from your own backend (keeps API key off client)
|
||||||
|
- Two turn modes: `Server VAD` (ElevenLabs detects speech end) and `Client Controlled` (push-to-talk)
|
||||||
|
|
||||||
|
## Build Dependencies (Build.cs)
|
||||||
|
Core, CoreUObject, Engine, InputCore, Json, JsonUtilities, WebSockets, HTTP,
|
||||||
|
AudioMixer, AudioCaptureCore, AudioCapture, Voice, SignalProcessing
|
||||||
|
|
||||||
|
## Status
|
||||||
|
- **Session 1** (2026-02-19): All source files written, registered in .uproject. Not yet compiled.
|
||||||
|
- **TODO**: Open in UE 5.5 Editor → compile → test basic WS connection with a test agent ID.
|
||||||
|
- **Watch out**: Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature vs UE 5.5 API.
|
||||||
79
.claude/project_context.md
Normal file
79
.claude/project_context.md
Normal file
@ -0,0 +1,79 @@
|
|||||||
|
# Project Context & Original Ask
|
||||||
|
|
||||||
|
## What the user wants to build
|
||||||
|
|
||||||
|
A **UE5 plugin** that integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5,
|
||||||
|
allowing an in-game NPC (or any Actor) to hold a real-time voice conversation with a player.
|
||||||
|
|
||||||
|
### The original request (paraphrased)
|
||||||
|
> "I want to create a plugin to use ElevenLabs Conversational Agent in Unreal Engine 5.5.
|
||||||
|
> I previously used the Convai plugin which does what I want, but I prefer ElevenLabs quality.
|
||||||
|
> The goal is to create a plugin in the existing Unreal Project to make a first step for integration.
|
||||||
|
> Convai AI plugin may be too big in terms of functionality for the new project, but it is the final goal.
|
||||||
|
> You can use the Convai source code to find the right way to make the ElevenLabs version —
|
||||||
|
> it should be very similar."
|
||||||
|
|
||||||
|
### Plugin name
|
||||||
|
`PS_AI_Agent_ElevenLabs`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## User's mental model / intent
|
||||||
|
|
||||||
|
1. **Short-term**: A working first-step plugin — minimal but functional — that can:
|
||||||
|
- Connect to ElevenLabs Conversational AI via WebSocket
|
||||||
|
- Capture microphone audio from the player
|
||||||
|
- Stream it to ElevenLabs in real time
|
||||||
|
- Play back the agent's voice response
|
||||||
|
- Surface key events (transcript, agent text, speaking state) to Blueprint
|
||||||
|
|
||||||
|
2. **Long-term**: Match the full feature set of Convai — character IDs, session memory,
|
||||||
|
actions/environment context, lip-sync, etc. — but powered by ElevenLabs instead.
|
||||||
|
|
||||||
|
3. **Key preference**: Simpler than Convai. No gRPC, no protobuf, no ThirdParty precompiled
|
||||||
|
libraries. ElevenLabs' Conversational AI API uses plain WebSocket + JSON, which maps
|
||||||
|
naturally to UE's built-in `WebSockets` module.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How we used Convai as a reference
|
||||||
|
|
||||||
|
We studied the Convai plugin source (`ConvAI/Convai/`) to understand:
|
||||||
|
- **Module structure**: `UConvaiSettings` + `IModuleInterface` + `ISettingsModule` registration
|
||||||
|
- **Audio capture pattern**: `Audio::FAudioCapture`, ring buffers, thread-safe dispatch to game thread
|
||||||
|
- **Audio playback pattern**: `USoundWaveProcedural` fed from a queue
|
||||||
|
- **Component architecture**: `UConvaiChatbotComponent` (NPC side) + `UConvaiPlayerComponent` (player side)
|
||||||
|
- **HTTP proxy pattern**: `UConvaiAPIBaseProxy` base class for async REST calls
|
||||||
|
- **Voice type enum**: Convai already had `EVoiceType::ElevenLabsVoices` — confirming ElevenLabs
|
||||||
|
is a natural fit
|
||||||
|
|
||||||
|
We then replaced gRPC/protobuf with **WebSocket + JSON** to match the ElevenLabs API, and
|
||||||
|
simplified the architecture to the minimum needed for a first working version.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What was built (Session 1 — 2026-02-19)
|
||||||
|
|
||||||
|
All source files created and registered. See `.claude/elevenlabs_plugin.md` for full file map and protocol details.
|
||||||
|
|
||||||
|
### Components created
|
||||||
|
| Class | Role |
|
||||||
|
|---|---|
|
||||||
|
| `UElevenLabsSettings` | Project Settings UI — API key, Agent ID, security options |
|
||||||
|
| `UElevenLabsWebSocketProxy` | Manages one WS session: connect, send audio, handle all server message types |
|
||||||
|
| `UElevenLabsConversationalAgentComponent` | ActorComponent to attach to any NPC — orchestrates mic + WS + playback |
|
||||||
|
| `UElevenLabsMicrophoneCaptureComponent` | Wraps `Audio::FAudioCapture`, resamples to 16 kHz mono |
|
||||||
|
|
||||||
|
### Not yet done (next sessions)
|
||||||
|
- Compile & test in UE 5.5 Editor
|
||||||
|
- Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature for UE 5.5
|
||||||
|
- Add lip-sync support (future)
|
||||||
|
- Add session memory / conversation history (future)
|
||||||
|
- Add environment/action context support (future, matching Convai's full feature set)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes on the ElevenLabs API
|
||||||
|
- Docs: https://elevenlabs.io/docs/conversational-ai
|
||||||
|
- Create agents at: https://elevenlabs.io/app/conversational-ai
|
||||||
|
- API keys at: https://elevenlabs.io (dashboard)
|
||||||
7
.claude/settings.local.json
Normal file
7
.claude/settings.local.json
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
{
|
||||||
|
"permissions": {
|
||||||
|
"allow": [
|
||||||
|
"Bash(dir /s \"E:\\\\ASTERION\\\\GIT\\\\PS_AI_Agent\")"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -17,6 +17,14 @@
|
|||||||
"TargetAllowList": [
|
"TargetAllowList": [
|
||||||
"Editor"
|
"Editor"
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Name": "PS_AI_Agent_ElevenLabs",
|
||||||
|
"Enabled": true
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Name": "WebSockets",
|
||||||
|
"Enabled": true
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
@ -0,0 +1,35 @@
|
|||||||
|
{
|
||||||
|
"FileVersion": 3,
|
||||||
|
"Version": 1,
|
||||||
|
"VersionName": "1.0.0",
|
||||||
|
"FriendlyName": "PS AI Agent - ElevenLabs",
|
||||||
|
"Description": "Integrates ElevenLabs Conversational AI Agent into Unreal Engine 5.5. Supports real-time voice conversation via WebSocket, microphone capture, and audio playback.",
|
||||||
|
"Category": "AI",
|
||||||
|
"CreatedBy": "ASTERION",
|
||||||
|
"CreatedByURL": "",
|
||||||
|
"DocsURL": "https://elevenlabs.io/docs/conversational-ai",
|
||||||
|
"MarketplaceURL": "",
|
||||||
|
"SupportURL": "",
|
||||||
|
"CanContainContent": false,
|
||||||
|
"IsBetaVersion": true,
|
||||||
|
"IsExperimentalVersion": false,
|
||||||
|
"Installed": false,
|
||||||
|
"Modules": [
|
||||||
|
{
|
||||||
|
"Name": "PS_AI_Agent_ElevenLabs",
|
||||||
|
"Type": "Runtime",
|
||||||
|
"LoadingPhase": "PreDefault",
|
||||||
|
"PlatformAllowList": [
|
||||||
|
"Win64",
|
||||||
|
"Mac",
|
||||||
|
"Linux"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"Plugins": [
|
||||||
|
{
|
||||||
|
"Name": "WebSockets",
|
||||||
|
"Enabled": true
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
@ -0,0 +1,40 @@
|
|||||||
|
// Copyright ASTERION. All Rights Reserved.
|
||||||
|
|
||||||
|
using UnrealBuildTool;
|
||||||
|
|
||||||
|
public class PS_AI_Agent_ElevenLabs : ModuleRules
|
||||||
|
{
|
||||||
|
public PS_AI_Agent_ElevenLabs(ReadOnlyTargetRules Target) : base(Target)
|
||||||
|
{
|
||||||
|
DefaultBuildSettings = BuildSettingsVersion.Latest;
|
||||||
|
PCHUsage = PCHUsageMode.UseExplicitOrSharedPCHs;
|
||||||
|
|
||||||
|
PublicDependencyModuleNames.AddRange(new string[]
|
||||||
|
{
|
||||||
|
"Core",
|
||||||
|
"CoreUObject",
|
||||||
|
"Engine",
|
||||||
|
"InputCore",
|
||||||
|
// JSON serialization for WebSocket message payloads
|
||||||
|
"Json",
|
||||||
|
"JsonUtilities",
|
||||||
|
// WebSocket for ElevenLabs Conversational AI real-time API
|
||||||
|
"WebSockets",
|
||||||
|
// HTTP for REST calls (agent metadata, auth, etc.)
|
||||||
|
"HTTP",
|
||||||
|
// Audio capture (microphone input)
|
||||||
|
"AudioMixer",
|
||||||
|
"AudioCaptureCore",
|
||||||
|
"AudioCapture",
|
||||||
|
"Voice",
|
||||||
|
"SignalProcessing",
|
||||||
|
});
|
||||||
|
|
||||||
|
PrivateDependencyModuleNames.AddRange(new string[]
|
||||||
|
{
|
||||||
|
"Projects",
|
||||||
|
// For ISettingsModule (Project Settings integration)
|
||||||
|
"Settings",
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -0,0 +1,335 @@
|
|||||||
|
// Copyright ASTERION. All Rights Reserved.
|
||||||
|
|
||||||
|
#include "ElevenLabsConversationalAgentComponent.h"
|
||||||
|
#include "ElevenLabsMicrophoneCaptureComponent.h"
|
||||||
|
#include "PS_AI_Agent_ElevenLabs.h"
|
||||||
|
|
||||||
|
#include "Components/AudioComponent.h"
|
||||||
|
#include "Sound/SoundWaveProcedural.h"
|
||||||
|
#include "GameFramework/Actor.h"
|
||||||
|
#include "Engine/World.h"
|
||||||
|
|
||||||
|
DEFINE_LOG_CATEGORY_STATIC(LogElevenLabsAgent, Log, All);
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Constructor
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
UElevenLabsConversationalAgentComponent::UElevenLabsConversationalAgentComponent()
|
||||||
|
{
|
||||||
|
PrimaryComponentTick.bCanEverTick = true;
|
||||||
|
// Tick is used only to detect silence (agent stopped speaking).
|
||||||
|
// Disable if not needed for perf.
|
||||||
|
PrimaryComponentTick.TickInterval = 1.0f / 60.0f;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Lifecycle
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsConversationalAgentComponent::BeginPlay()
|
||||||
|
{
|
||||||
|
Super::BeginPlay();
|
||||||
|
InitAudioPlayback();
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::EndPlay(const EEndPlayReason::Type EndPlayReason)
|
||||||
|
{
|
||||||
|
EndConversation();
|
||||||
|
Super::EndPlay(EndPlayReason);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::TickComponent(float DeltaTime, ELevelTick TickType,
|
||||||
|
FActorComponentTickFunction* ThisTickFunction)
|
||||||
|
{
|
||||||
|
Super::TickComponent(DeltaTime, TickType, ThisTickFunction);
|
||||||
|
|
||||||
|
if (bAgentSpeaking)
|
||||||
|
{
|
||||||
|
FScopeLock Lock(&AudioQueueLock);
|
||||||
|
if (AudioQueue.Num() == 0)
|
||||||
|
{
|
||||||
|
SilentTickCount++;
|
||||||
|
if (SilentTickCount >= SilenceThresholdTicks)
|
||||||
|
{
|
||||||
|
bAgentSpeaking = false;
|
||||||
|
SilentTickCount = 0;
|
||||||
|
OnAgentStoppedSpeaking.Broadcast();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
SilentTickCount = 0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Control
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsConversationalAgentComponent::StartConversation()
|
||||||
|
{
|
||||||
|
if (!WebSocketProxy)
|
||||||
|
{
|
||||||
|
WebSocketProxy = NewObject<UElevenLabsWebSocketProxy>(this);
|
||||||
|
WebSocketProxy->OnConnected.AddDynamic(this,
|
||||||
|
&UElevenLabsConversationalAgentComponent::HandleConnected);
|
||||||
|
WebSocketProxy->OnDisconnected.AddDynamic(this,
|
||||||
|
&UElevenLabsConversationalAgentComponent::HandleDisconnected);
|
||||||
|
WebSocketProxy->OnError.AddDynamic(this,
|
||||||
|
&UElevenLabsConversationalAgentComponent::HandleError);
|
||||||
|
WebSocketProxy->OnAudioReceived.AddDynamic(this,
|
||||||
|
&UElevenLabsConversationalAgentComponent::HandleAudioReceived);
|
||||||
|
WebSocketProxy->OnTranscript.AddDynamic(this,
|
||||||
|
&UElevenLabsConversationalAgentComponent::HandleTranscript);
|
||||||
|
WebSocketProxy->OnAgentResponse.AddDynamic(this,
|
||||||
|
&UElevenLabsConversationalAgentComponent::HandleAgentResponse);
|
||||||
|
WebSocketProxy->OnInterrupted.AddDynamic(this,
|
||||||
|
&UElevenLabsConversationalAgentComponent::HandleInterrupted);
|
||||||
|
}
|
||||||
|
|
||||||
|
WebSocketProxy->Connect(AgentID);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::EndConversation()
|
||||||
|
{
|
||||||
|
StopListening();
|
||||||
|
StopAgentAudio();
|
||||||
|
|
||||||
|
if (WebSocketProxy)
|
||||||
|
{
|
||||||
|
WebSocketProxy->Disconnect();
|
||||||
|
WebSocketProxy = nullptr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::StartListening()
|
||||||
|
{
|
||||||
|
if (!IsConnected())
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsAgent, Warning, TEXT("StartListening: not connected."));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (bIsListening) return;
|
||||||
|
bIsListening = true;
|
||||||
|
|
||||||
|
if (TurnMode == EElevenLabsTurnMode::Client)
|
||||||
|
{
|
||||||
|
WebSocketProxy->SendUserTurnStart();
|
||||||
|
}
|
||||||
|
|
||||||
|
// Find the microphone component on our owner actor, or create one.
|
||||||
|
UElevenLabsMicrophoneCaptureComponent* Mic =
|
||||||
|
GetOwner()->FindComponentByClass<UElevenLabsMicrophoneCaptureComponent>();
|
||||||
|
|
||||||
|
if (!Mic)
|
||||||
|
{
|
||||||
|
Mic = NewObject<UElevenLabsMicrophoneCaptureComponent>(GetOwner(),
|
||||||
|
TEXT("ElevenLabsMicrophone"));
|
||||||
|
Mic->RegisterComponent();
|
||||||
|
}
|
||||||
|
|
||||||
|
Mic->OnAudioCaptured.AddUObject(this,
|
||||||
|
&UElevenLabsConversationalAgentComponent::OnMicrophoneDataCaptured);
|
||||||
|
Mic->StartCapture();
|
||||||
|
|
||||||
|
UE_LOG(LogElevenLabsAgent, Log, TEXT("Microphone capture started."));
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::StopListening()
|
||||||
|
{
|
||||||
|
if (!bIsListening) return;
|
||||||
|
bIsListening = false;
|
||||||
|
|
||||||
|
if (UElevenLabsMicrophoneCaptureComponent* Mic =
|
||||||
|
GetOwner() ? GetOwner()->FindComponentByClass<UElevenLabsMicrophoneCaptureComponent>() : nullptr)
|
||||||
|
{
|
||||||
|
Mic->StopCapture();
|
||||||
|
Mic->OnAudioCaptured.RemoveAll(this);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (WebSocketProxy && TurnMode == EElevenLabsTurnMode::Client)
|
||||||
|
{
|
||||||
|
WebSocketProxy->SendUserTurnEnd();
|
||||||
|
}
|
||||||
|
|
||||||
|
UE_LOG(LogElevenLabsAgent, Log, TEXT("Microphone capture stopped."));
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::InterruptAgent()
|
||||||
|
{
|
||||||
|
if (WebSocketProxy) WebSocketProxy->SendInterrupt();
|
||||||
|
StopAgentAudio();
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// State queries
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
bool UElevenLabsConversationalAgentComponent::IsConnected() const
|
||||||
|
{
|
||||||
|
return WebSocketProxy && WebSocketProxy->IsConnected();
|
||||||
|
}
|
||||||
|
|
||||||
|
const FElevenLabsConversationInfo& UElevenLabsConversationalAgentComponent::GetConversationInfo() const
|
||||||
|
{
|
||||||
|
static FElevenLabsConversationInfo Empty;
|
||||||
|
return WebSocketProxy ? WebSocketProxy->GetConversationInfo() : Empty;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// WebSocket event handlers
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsConversationalAgentComponent::HandleConnected(const FElevenLabsConversationInfo& Info)
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsAgent, Log, TEXT("Agent connected. ConversationID=%s"), *Info.ConversationID);
|
||||||
|
OnAgentConnected.Broadcast(Info);
|
||||||
|
|
||||||
|
if (bAutoStartListening)
|
||||||
|
{
|
||||||
|
StartListening();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::HandleDisconnected(int32 StatusCode, const FString& Reason)
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsAgent, Log, TEXT("Agent disconnected. Code=%d Reason=%s"), StatusCode, *Reason);
|
||||||
|
bIsListening = false;
|
||||||
|
bAgentSpeaking = false;
|
||||||
|
OnAgentDisconnected.Broadcast(StatusCode, Reason);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::HandleError(const FString& ErrorMessage)
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsAgent, Error, TEXT("Agent error: %s"), *ErrorMessage);
|
||||||
|
OnAgentError.Broadcast(ErrorMessage);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::HandleAudioReceived(const TArray<uint8>& PCMData)
|
||||||
|
{
|
||||||
|
EnqueueAgentAudio(PCMData);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::HandleTranscript(const FElevenLabsTranscriptSegment& Segment)
|
||||||
|
{
|
||||||
|
OnAgentTranscript.Broadcast(Segment);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::HandleAgentResponse(const FString& ResponseText)
|
||||||
|
{
|
||||||
|
OnAgentTextResponse.Broadcast(ResponseText);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::HandleInterrupted()
|
||||||
|
{
|
||||||
|
StopAgentAudio();
|
||||||
|
OnAgentInterrupted.Broadcast();
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Audio playback
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsConversationalAgentComponent::InitAudioPlayback()
|
||||||
|
{
|
||||||
|
AActor* Owner = GetOwner();
|
||||||
|
if (!Owner) return;
|
||||||
|
|
||||||
|
// USoundWaveProcedural lets us push raw PCM data at runtime.
|
||||||
|
ProceduralSoundWave = NewObject<USoundWaveProcedural>(this);
|
||||||
|
ProceduralSoundWave->SetSampleRate(ElevenLabsAudio::SampleRate);
|
||||||
|
ProceduralSoundWave->NumChannels = ElevenLabsAudio::Channels;
|
||||||
|
ProceduralSoundWave->Duration = INDEFINITELY_LOOPING_DURATION;
|
||||||
|
ProceduralSoundWave->SoundGroup = SOUNDGROUP_Voice;
|
||||||
|
ProceduralSoundWave->bLooping = false;
|
||||||
|
|
||||||
|
// Create the audio component attached to the owner.
|
||||||
|
AudioPlaybackComponent = NewObject<UAudioComponent>(Owner, TEXT("ElevenLabsAudioPlayback"));
|
||||||
|
AudioPlaybackComponent->RegisterComponent();
|
||||||
|
AudioPlaybackComponent->bAutoActivate = false;
|
||||||
|
AudioPlaybackComponent->SetSound(ProceduralSoundWave);
|
||||||
|
|
||||||
|
// When the procedural sound wave needs more audio data, pull from our queue.
|
||||||
|
ProceduralSoundWave->OnSoundWaveProceduralUnderflow =
|
||||||
|
FOnSoundWaveProceduralUnderflow::CreateUObject(
|
||||||
|
this, &UElevenLabsConversationalAgentComponent::OnProceduralUnderflow);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::OnProceduralUnderflow(
|
||||||
|
USoundWaveProcedural* InProceduralWave, const int32 SamplesRequired)
|
||||||
|
{
|
||||||
|
FScopeLock Lock(&AudioQueueLock);
|
||||||
|
if (AudioQueue.Num() == 0) return;
|
||||||
|
|
||||||
|
const int32 BytesRequired = SamplesRequired * sizeof(int16);
|
||||||
|
const int32 BytesToPush = FMath::Min(AudioQueue.Num(), BytesRequired);
|
||||||
|
|
||||||
|
InProceduralWave->QueueAudio(AudioQueue.GetData(), BytesToPush);
|
||||||
|
AudioQueue.RemoveAt(0, BytesToPush, false);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::EnqueueAgentAudio(const TArray<uint8>& PCMData)
|
||||||
|
{
|
||||||
|
{
|
||||||
|
FScopeLock Lock(&AudioQueueLock);
|
||||||
|
AudioQueue.Append(PCMData);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Start playback if not already playing.
|
||||||
|
if (!bAgentSpeaking)
|
||||||
|
{
|
||||||
|
bAgentSpeaking = true;
|
||||||
|
SilentTickCount = 0;
|
||||||
|
OnAgentStartedSpeaking.Broadcast();
|
||||||
|
|
||||||
|
if (AudioPlaybackComponent && !AudioPlaybackComponent->IsPlaying())
|
||||||
|
{
|
||||||
|
AudioPlaybackComponent->Play();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsConversationalAgentComponent::StopAgentAudio()
|
||||||
|
{
|
||||||
|
if (AudioPlaybackComponent && AudioPlaybackComponent->IsPlaying())
|
||||||
|
{
|
||||||
|
AudioPlaybackComponent->Stop();
|
||||||
|
}
|
||||||
|
|
||||||
|
FScopeLock Lock(&AudioQueueLock);
|
||||||
|
AudioQueue.Empty();
|
||||||
|
|
||||||
|
if (bAgentSpeaking)
|
||||||
|
{
|
||||||
|
bAgentSpeaking = false;
|
||||||
|
SilentTickCount = 0;
|
||||||
|
OnAgentStoppedSpeaking.Broadcast();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Microphone → WebSocket
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsConversationalAgentComponent::OnMicrophoneDataCaptured(const TArray<float>& FloatPCM)
|
||||||
|
{
|
||||||
|
if (!IsConnected() || !bIsListening) return;
|
||||||
|
|
||||||
|
TArray<uint8> PCMBytes = FloatPCMToInt16Bytes(FloatPCM);
|
||||||
|
WebSocketProxy->SendAudioChunk(PCMBytes);
|
||||||
|
}
|
||||||
|
|
||||||
|
TArray<uint8> UElevenLabsConversationalAgentComponent::FloatPCMToInt16Bytes(const TArray<float>& FloatPCM)
|
||||||
|
{
|
||||||
|
TArray<uint8> Out;
|
||||||
|
Out.Reserve(FloatPCM.Num() * 2);
|
||||||
|
|
||||||
|
for (float Sample : FloatPCM)
|
||||||
|
{
|
||||||
|
// Clamp to [-1,1] then scale to int16 range
|
||||||
|
const float Clamped = FMath::Clamp(Sample, -1.0f, 1.0f);
|
||||||
|
const int16 Int16Sample = static_cast<int16>(Clamped * 32767.0f);
|
||||||
|
|
||||||
|
// Little-endian
|
||||||
|
Out.Add(static_cast<uint8>(Int16Sample & 0xFF));
|
||||||
|
Out.Add(static_cast<uint8>((Int16Sample >> 8) & 0xFF));
|
||||||
|
}
|
||||||
|
|
||||||
|
return Out;
|
||||||
|
}
|
||||||
@ -0,0 +1,168 @@
|
|||||||
|
// Copyright ASTERION. All Rights Reserved.
|
||||||
|
|
||||||
|
#include "ElevenLabsMicrophoneCaptureComponent.h"
|
||||||
|
#include "ElevenLabsDefinitions.h"
|
||||||
|
|
||||||
|
#include "AudioCaptureCore.h"
|
||||||
|
#include "Async/Async.h"
|
||||||
|
|
||||||
|
DEFINE_LOG_CATEGORY_STATIC(LogElevenLabsMic, Log, All);
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Constructor
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
UElevenLabsMicrophoneCaptureComponent::UElevenLabsMicrophoneCaptureComponent()
|
||||||
|
{
|
||||||
|
PrimaryComponentTick.bCanEverTick = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Lifecycle
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsMicrophoneCaptureComponent::EndPlay(const EEndPlayReason::Type EndPlayReason)
|
||||||
|
{
|
||||||
|
StopCapture();
|
||||||
|
Super::EndPlay(EndPlayReason);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Capture control
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsMicrophoneCaptureComponent::StartCapture()
|
||||||
|
{
|
||||||
|
if (bCapturing)
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsMic, Warning, TEXT("StartCapture called while already capturing."));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Open the default audio capture stream.
|
||||||
|
// FAudioCapture discovers the default device and its sample rate automatically.
|
||||||
|
Audio::FOnAudioCaptureFunction CaptureCallback =
|
||||||
|
[this](const float* InAudio, int32 NumSamples, int32 InNumChannels,
|
||||||
|
int32 InSampleRate, double StreamTime, bool bOverflow)
|
||||||
|
{
|
||||||
|
OnAudioGenerate(InAudio, NumSamples, InNumChannels, InSampleRate, StreamTime, bOverflow);
|
||||||
|
};
|
||||||
|
|
||||||
|
if (!AudioCapture.OpenDefaultCaptureStream(DeviceParams, MoveTemp(CaptureCallback), 1024))
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsMic, Error, TEXT("Failed to open default audio capture stream."));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Retrieve the actual device parameters after opening the stream.
|
||||||
|
Audio::FCaptureDeviceInfo DeviceInfo;
|
||||||
|
if (AudioCapture.GetCaptureDeviceInfo(DeviceInfo))
|
||||||
|
{
|
||||||
|
DeviceSampleRate = DeviceInfo.PreferredSampleRate;
|
||||||
|
DeviceChannels = DeviceInfo.InputChannels;
|
||||||
|
UE_LOG(LogElevenLabsMic, Log, TEXT("Capture device: %s | Rate=%d | Channels=%d"),
|
||||||
|
*DeviceInfo.DeviceName, DeviceSampleRate, DeviceChannels);
|
||||||
|
}
|
||||||
|
|
||||||
|
AudioCapture.StartStream();
|
||||||
|
bCapturing = true;
|
||||||
|
UE_LOG(LogElevenLabsMic, Log, TEXT("Audio capture started."));
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsMicrophoneCaptureComponent::StopCapture()
|
||||||
|
{
|
||||||
|
if (!bCapturing) return;
|
||||||
|
|
||||||
|
AudioCapture.StopStream();
|
||||||
|
AudioCapture.CloseStream();
|
||||||
|
bCapturing = false;
|
||||||
|
UE_LOG(LogElevenLabsMic, Log, TEXT("Audio capture stopped."));
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Audio callback (background thread)
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsMicrophoneCaptureComponent::OnAudioGenerate(
|
||||||
|
const float* InAudio, int32 NumSamples,
|
||||||
|
int32 InNumChannels, int32 InSampleRate,
|
||||||
|
double StreamTime, bool bOverflow)
|
||||||
|
{
|
||||||
|
if (bOverflow)
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsMic, Verbose, TEXT("Audio capture buffer overflow."));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Resample + downmix to 16000 Hz mono.
|
||||||
|
TArray<float> Resampled = ResampleTo16000(InAudio, NumSamples, InNumChannels, InSampleRate);
|
||||||
|
|
||||||
|
// Apply volume multiplier.
|
||||||
|
if (!FMath::IsNearlyEqual(VolumeMultiplier, 1.0f))
|
||||||
|
{
|
||||||
|
for (float& S : Resampled)
|
||||||
|
{
|
||||||
|
S *= VolumeMultiplier;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fire the delegate on the game thread so subscribers don't need to be
|
||||||
|
// thread-safe (WebSocket Send is not thread-safe in UE's implementation).
|
||||||
|
AsyncTask(ENamedThreads::GameThread, [this, Data = MoveTemp(Resampled)]()
|
||||||
|
{
|
||||||
|
if (bCapturing)
|
||||||
|
{
|
||||||
|
OnAudioCaptured.Broadcast(Data);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Resampling
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
TArray<float> UElevenLabsMicrophoneCaptureComponent::ResampleTo16000(
|
||||||
|
const float* InAudio, int32 NumSamples,
|
||||||
|
int32 InChannels, int32 InSampleRate)
|
||||||
|
{
|
||||||
|
const int32 TargetRate = ElevenLabsAudio::SampleRate; // 16000
|
||||||
|
|
||||||
|
// --- Step 1: Downmix to mono ---
|
||||||
|
TArray<float> Mono;
|
||||||
|
if (InChannels == 1)
|
||||||
|
{
|
||||||
|
Mono = TArray<float>(InAudio, NumSamples);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
const int32 NumFrames = NumSamples / InChannels;
|
||||||
|
Mono.Reserve(NumFrames);
|
||||||
|
for (int32 i = 0; i < NumFrames; i++)
|
||||||
|
{
|
||||||
|
float Sum = 0.0f;
|
||||||
|
for (int32 c = 0; c < InChannels; c++)
|
||||||
|
{
|
||||||
|
Sum += InAudio[i * InChannels + c];
|
||||||
|
}
|
||||||
|
Mono.Add(Sum / static_cast<float>(InChannels));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- Step 2: Resample via linear interpolation ---
|
||||||
|
if (InSampleRate == TargetRate)
|
||||||
|
{
|
||||||
|
return Mono;
|
||||||
|
}
|
||||||
|
|
||||||
|
const float Ratio = static_cast<float>(InSampleRate) / static_cast<float>(TargetRate);
|
||||||
|
const int32 OutSamples = FMath::FloorToInt(static_cast<float>(Mono.Num()) / Ratio);
|
||||||
|
|
||||||
|
TArray<float> Out;
|
||||||
|
Out.Reserve(OutSamples);
|
||||||
|
|
||||||
|
for (int32 i = 0; i < OutSamples; i++)
|
||||||
|
{
|
||||||
|
const float SrcIndex = static_cast<float>(i) * Ratio;
|
||||||
|
const int32 SrcLow = FMath::FloorToInt(SrcIndex);
|
||||||
|
const int32 SrcHigh = FMath::Min(SrcLow + 1, Mono.Num() - 1);
|
||||||
|
const float Alpha = SrcIndex - static_cast<float>(SrcLow);
|
||||||
|
|
||||||
|
Out.Add(FMath::Lerp(Mono[SrcLow], Mono[SrcHigh], Alpha));
|
||||||
|
}
|
||||||
|
|
||||||
|
return Out;
|
||||||
|
}
|
||||||
@ -0,0 +1,382 @@
|
|||||||
|
// Copyright ASTERION. All Rights Reserved.
|
||||||
|
|
||||||
|
#include "ElevenLabsWebSocketProxy.h"
|
||||||
|
#include "PS_AI_Agent_ElevenLabs.h"
|
||||||
|
|
||||||
|
#include "WebSocketsModule.h"
|
||||||
|
#include "IWebSocket.h"
|
||||||
|
|
||||||
|
#include "Json.h"
|
||||||
|
#include "JsonUtilities.h"
|
||||||
|
#include "Misc/Base64.h"
|
||||||
|
|
||||||
|
DEFINE_LOG_CATEGORY_STATIC(LogElevenLabsWS, Log, All);
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Helpers
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
static void EL_LOG(bool bVerbose, const TCHAR* Format, ...)
|
||||||
|
{
|
||||||
|
if (!bVerbose) return;
|
||||||
|
va_list Args;
|
||||||
|
va_start(Args, Format);
|
||||||
|
// Forward to UE_LOG at Verbose level
|
||||||
|
TCHAR Buffer[2048];
|
||||||
|
FCString::GetVarArgs(Buffer, UE_ARRAY_COUNT(Buffer), Format, Args);
|
||||||
|
va_end(Args);
|
||||||
|
UE_LOG(LogElevenLabsWS, Verbose, TEXT("%s"), Buffer);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Connect / Disconnect
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsWebSocketProxy::Connect(const FString& AgentIDOverride, const FString& APIKeyOverride)
|
||||||
|
{
|
||||||
|
if (ConnectionState == EElevenLabsConnectionState::Connected ||
|
||||||
|
ConnectionState == EElevenLabsConnectionState::Connecting)
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Warning, TEXT("Connect called but already connecting/connected. Ignoring."));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!FModuleManager::Get().IsModuleLoaded("WebSockets"))
|
||||||
|
{
|
||||||
|
FModuleManager::LoadModuleChecked<FWebSocketsModule>("WebSockets");
|
||||||
|
}
|
||||||
|
|
||||||
|
const FString URL = BuildWebSocketURL(AgentIDOverride, APIKeyOverride);
|
||||||
|
if (URL.IsEmpty())
|
||||||
|
{
|
||||||
|
const FString Msg = TEXT("Cannot connect: no Agent ID configured. Set it in Project Settings or pass it to Connect().");
|
||||||
|
UE_LOG(LogElevenLabsWS, Error, TEXT("%s"), *Msg);
|
||||||
|
OnError.Broadcast(Msg);
|
||||||
|
ConnectionState = EElevenLabsConnectionState::Error;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
UE_LOG(LogElevenLabsWS, Log, TEXT("Connecting to ElevenLabs: %s"), *URL);
|
||||||
|
ConnectionState = EElevenLabsConnectionState::Connecting;
|
||||||
|
|
||||||
|
// Headers: the ElevenLabs Conversational AI WS endpoint accepts the
|
||||||
|
// xi-api-key header on the initial HTTP upgrade request.
|
||||||
|
TMap<FString, FString> UpgradeHeaders;
|
||||||
|
const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings();
|
||||||
|
const FString ResolvedKey = APIKeyOverride.IsEmpty() ? Settings->API_Key : APIKeyOverride;
|
||||||
|
if (!ResolvedKey.IsEmpty())
|
||||||
|
{
|
||||||
|
UpgradeHeaders.Add(TEXT("xi-api-key"), ResolvedKey);
|
||||||
|
}
|
||||||
|
|
||||||
|
WebSocket = FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), UpgradeHeaders);
|
||||||
|
|
||||||
|
WebSocket->OnConnected().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsConnected);
|
||||||
|
WebSocket->OnConnectionError().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsConnectionError);
|
||||||
|
WebSocket->OnClosed().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsClosed);
|
||||||
|
WebSocket->OnMessage().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsMessage);
|
||||||
|
WebSocket->OnRawMessage().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsBinaryMessage);
|
||||||
|
|
||||||
|
WebSocket->Connect();
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::Disconnect()
|
||||||
|
{
|
||||||
|
if (WebSocket.IsValid() && WebSocket->IsConnected())
|
||||||
|
{
|
||||||
|
WebSocket->Close(1000, TEXT("Client disconnected"));
|
||||||
|
}
|
||||||
|
ConnectionState = EElevenLabsConnectionState::Disconnected;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Audio & turn control
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsWebSocketProxy::SendAudioChunk(const TArray<uint8>& PCMData)
|
||||||
|
{
|
||||||
|
if (!IsConnected())
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Warning, TEXT("SendAudioChunk: not connected."));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (PCMData.Num() == 0) return;
|
||||||
|
|
||||||
|
// ElevenLabs expects: { "user_audio_chunk": "<base64 PCM>" }
|
||||||
|
const FString Base64Audio = FBase64::Encode(PCMData.GetData(), PCMData.Num());
|
||||||
|
|
||||||
|
TSharedPtr<FJsonObject> Msg = MakeShareable(new FJsonObject());
|
||||||
|
Msg->SetStringField(ElevenLabsMessageType::AudioChunk, Base64Audio);
|
||||||
|
SendJsonMessage(Msg);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::SendUserTurnStart()
|
||||||
|
{
|
||||||
|
if (!IsConnected()) return;
|
||||||
|
TSharedPtr<FJsonObject> Msg = MakeShareable(new FJsonObject());
|
||||||
|
Msg->SetStringField(TEXT("type"), ElevenLabsMessageType::UserTurnStart);
|
||||||
|
SendJsonMessage(Msg);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::SendUserTurnEnd()
|
||||||
|
{
|
||||||
|
if (!IsConnected()) return;
|
||||||
|
TSharedPtr<FJsonObject> Msg = MakeShareable(new FJsonObject());
|
||||||
|
Msg->SetStringField(TEXT("type"), ElevenLabsMessageType::UserTurnEnd);
|
||||||
|
SendJsonMessage(Msg);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::SendInterrupt()
|
||||||
|
{
|
||||||
|
if (!IsConnected()) return;
|
||||||
|
TSharedPtr<FJsonObject> Msg = MakeShareable(new FJsonObject());
|
||||||
|
Msg->SetStringField(TEXT("type"), ElevenLabsMessageType::Interrupt);
|
||||||
|
SendJsonMessage(Msg);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// WebSocket callbacks
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsWebSocketProxy::OnWsConnected()
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Log, TEXT("WebSocket connected. Waiting for conversation_initiation_metadata..."));
|
||||||
|
// State stays Connecting until we receive the initiation metadata from the server.
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::OnWsConnectionError(const FString& Error)
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Error, TEXT("WebSocket connection error: %s"), *Error);
|
||||||
|
ConnectionState = EElevenLabsConnectionState::Error;
|
||||||
|
OnError.Broadcast(Error);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::OnWsClosed(int32 StatusCode, const FString& Reason, bool bWasClean)
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Log, TEXT("WebSocket closed. Code=%d Reason=%s Clean=%d"), StatusCode, *Reason, bWasClean);
|
||||||
|
ConnectionState = EElevenLabsConnectionState::Disconnected;
|
||||||
|
WebSocket.Reset();
|
||||||
|
OnDisconnected.Broadcast(StatusCode, Reason);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::OnWsMessage(const FString& Message)
|
||||||
|
{
|
||||||
|
const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings();
|
||||||
|
if (Settings->bVerboseLogging)
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Verbose, TEXT(">> %s"), *Message);
|
||||||
|
}
|
||||||
|
|
||||||
|
TSharedPtr<FJsonObject> Root;
|
||||||
|
TSharedRef<TJsonReader<>> Reader = TJsonReaderFactory<>::Create(Message);
|
||||||
|
if (!FJsonSerializer::Deserialize(Reader, Root) || !Root.IsValid())
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Warning, TEXT("Failed to parse WebSocket message as JSON."));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
FString MsgType;
|
||||||
|
// ElevenLabs wraps the type in a "type" field
|
||||||
|
if (!Root->TryGetStringField(TEXT("type"), MsgType))
|
||||||
|
{
|
||||||
|
// Fallback: some messages use the top-level key as the type
|
||||||
|
// e.g. { "user_audio_chunk": "..." } from ourselves (shouldn't arrive)
|
||||||
|
UE_LOG(LogElevenLabsWS, Verbose, TEXT("Message has no 'type' field, ignoring."));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (MsgType == ElevenLabsMessageType::ConversationInitiation)
|
||||||
|
{
|
||||||
|
HandleConversationInitiation(Root);
|
||||||
|
}
|
||||||
|
else if (MsgType == ElevenLabsMessageType::AudioResponse)
|
||||||
|
{
|
||||||
|
HandleAudioResponse(Root);
|
||||||
|
}
|
||||||
|
else if (MsgType == ElevenLabsMessageType::Transcript)
|
||||||
|
{
|
||||||
|
HandleTranscript(Root);
|
||||||
|
}
|
||||||
|
else if (MsgType == ElevenLabsMessageType::AgentResponse)
|
||||||
|
{
|
||||||
|
HandleAgentResponse(Root);
|
||||||
|
}
|
||||||
|
else if (MsgType == ElevenLabsMessageType::InterruptionEvent)
|
||||||
|
{
|
||||||
|
HandleInterruption(Root);
|
||||||
|
}
|
||||||
|
else if (MsgType == ElevenLabsMessageType::PingEvent)
|
||||||
|
{
|
||||||
|
HandlePing(Root);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Verbose, TEXT("Unhandled message type: %s"), *MsgType);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::OnWsBinaryMessage(const void* Data, SIZE_T Size, SIZE_T BytesRemaining)
|
||||||
|
{
|
||||||
|
// ElevenLabs Conversational AI uses text (JSON) frames only.
|
||||||
|
// If binary frames arrive in future API versions, handle here.
|
||||||
|
UE_LOG(LogElevenLabsWS, Warning, TEXT("Received unexpected binary WebSocket frame (%llu bytes)."), (uint64)Size);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Message handlers
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsWebSocketProxy::HandleConversationInitiation(const TSharedPtr<FJsonObject>& Root)
|
||||||
|
{
|
||||||
|
// Expected structure:
|
||||||
|
// { "type": "conversation_initiation_metadata",
|
||||||
|
// "conversation_initiation_metadata_event": {
|
||||||
|
// "conversation_id": "...",
|
||||||
|
// "agent_output_audio_format": "pcm_16000"
|
||||||
|
// }
|
||||||
|
// }
|
||||||
|
const TSharedPtr<FJsonObject>* MetaObj = nullptr;
|
||||||
|
if (Root->TryGetObjectField(TEXT("conversation_initiation_metadata_event"), MetaObj) && MetaObj)
|
||||||
|
{
|
||||||
|
(*MetaObj)->TryGetStringField(TEXT("conversation_id"), ConversationInfo.ConversationID);
|
||||||
|
}
|
||||||
|
|
||||||
|
UE_LOG(LogElevenLabsWS, Log, TEXT("Conversation initiated. ID=%s"), *ConversationInfo.ConversationID);
|
||||||
|
ConnectionState = EElevenLabsConnectionState::Connected;
|
||||||
|
OnConnected.Broadcast(ConversationInfo);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::HandleAudioResponse(const TSharedPtr<FJsonObject>& Root)
|
||||||
|
{
|
||||||
|
// Expected structure:
|
||||||
|
// { "type": "audio",
|
||||||
|
// "audio_event": { "audio_base_64": "<base64 PCM>", "event_id": 1 }
|
||||||
|
// }
|
||||||
|
const TSharedPtr<FJsonObject>* AudioEvent = nullptr;
|
||||||
|
if (!Root->TryGetObjectField(TEXT("audio_event"), AudioEvent) || !AudioEvent)
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Warning, TEXT("audio message missing 'audio_event' field."));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
FString Base64Audio;
|
||||||
|
if (!(*AudioEvent)->TryGetStringField(TEXT("audio_base_64"), Base64Audio))
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Warning, TEXT("audio_event missing 'audio_base_64' field."));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
TArray<uint8> PCMData;
|
||||||
|
if (!FBase64::Decode(Base64Audio, PCMData))
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Warning, TEXT("Failed to Base64-decode audio data."));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
OnAudioReceived.Broadcast(PCMData);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::HandleTranscript(const TSharedPtr<FJsonObject>& Root)
|
||||||
|
{
|
||||||
|
// Expected structure:
|
||||||
|
// { "type": "transcript",
|
||||||
|
// "transcript_event": { "speaker": "user"|"agent", "message": "...", "event_id": 1 }
|
||||||
|
// }
|
||||||
|
const TSharedPtr<FJsonObject>* TranscriptEvent = nullptr;
|
||||||
|
if (!Root->TryGetObjectField(TEXT("transcript_event"), TranscriptEvent) || !TranscriptEvent)
|
||||||
|
{
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
FElevenLabsTranscriptSegment Segment;
|
||||||
|
(*TranscriptEvent)->TryGetStringField(TEXT("speaker"), Segment.Speaker);
|
||||||
|
(*TranscriptEvent)->TryGetStringField(TEXT("message"), Segment.Text);
|
||||||
|
|
||||||
|
// ElevenLabs marks final vs. interim via "is_final"
|
||||||
|
(*TranscriptEvent)->TryGetBoolField(TEXT("is_final"), Segment.bIsFinal);
|
||||||
|
|
||||||
|
OnTranscript.Broadcast(Segment);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::HandleAgentResponse(const TSharedPtr<FJsonObject>& Root)
|
||||||
|
{
|
||||||
|
// { "type": "agent_response",
|
||||||
|
// "agent_response_event": { "agent_response": "..." }
|
||||||
|
// }
|
||||||
|
const TSharedPtr<FJsonObject>* ResponseEvent = nullptr;
|
||||||
|
if (!Root->TryGetObjectField(TEXT("agent_response_event"), ResponseEvent) || !ResponseEvent)
|
||||||
|
{
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
FString ResponseText;
|
||||||
|
(*ResponseEvent)->TryGetStringField(TEXT("agent_response"), ResponseText);
|
||||||
|
OnAgentResponse.Broadcast(ResponseText);
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::HandleInterruption(const TSharedPtr<FJsonObject>& Root)
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Log, TEXT("Agent interrupted."));
|
||||||
|
OnInterrupted.Broadcast();
|
||||||
|
}
|
||||||
|
|
||||||
|
void UElevenLabsWebSocketProxy::HandlePing(const TSharedPtr<FJsonObject>& Root)
|
||||||
|
{
|
||||||
|
// Reply with a pong to keep the connection alive.
|
||||||
|
// { "type": "ping", "ping_event": { "event_id": 1 } }
|
||||||
|
int32 EventID = 0;
|
||||||
|
const TSharedPtr<FJsonObject>* PingEvent = nullptr;
|
||||||
|
if (Root->TryGetObjectField(TEXT("ping_event"), PingEvent) && PingEvent)
|
||||||
|
{
|
||||||
|
(*PingEvent)->TryGetNumberField(TEXT("event_id"), EventID);
|
||||||
|
}
|
||||||
|
|
||||||
|
TSharedPtr<FJsonObject> Pong = MakeShareable(new FJsonObject());
|
||||||
|
Pong->SetStringField(TEXT("type"), TEXT("pong"));
|
||||||
|
TSharedPtr<FJsonObject> PongEvent = MakeShareable(new FJsonObject());
|
||||||
|
PongEvent->SetNumberField(TEXT("event_id"), EventID);
|
||||||
|
Pong->SetObjectField(TEXT("pong_event"), PongEvent);
|
||||||
|
SendJsonMessage(Pong);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Helpers
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
void UElevenLabsWebSocketProxy::SendJsonMessage(const TSharedPtr<FJsonObject>& JsonObj)
|
||||||
|
{
|
||||||
|
if (!WebSocket.IsValid() || !WebSocket->IsConnected())
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Warning, TEXT("SendJsonMessage: WebSocket not connected."));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
FString Out;
|
||||||
|
TSharedRef<TJsonWriter<>> Writer = TJsonWriterFactory<>::Create(&Out);
|
||||||
|
FJsonSerializer::Serialize(JsonObj.ToSharedRef(), Writer);
|
||||||
|
|
||||||
|
const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings();
|
||||||
|
if (Settings->bVerboseLogging)
|
||||||
|
{
|
||||||
|
UE_LOG(LogElevenLabsWS, Verbose, TEXT("<< %s"), *Out);
|
||||||
|
}
|
||||||
|
|
||||||
|
WebSocket->Send(Out);
|
||||||
|
}
|
||||||
|
|
||||||
|
FString UElevenLabsWebSocketProxy::BuildWebSocketURL(const FString& AgentIDOverride, const FString& APIKeyOverride) const
|
||||||
|
{
|
||||||
|
const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings();
|
||||||
|
|
||||||
|
// Custom URL override takes full precedence
|
||||||
|
if (!Settings->CustomWebSocketURL.IsEmpty())
|
||||||
|
{
|
||||||
|
return Settings->CustomWebSocketURL;
|
||||||
|
}
|
||||||
|
|
||||||
|
const FString ResolvedAgentID = AgentIDOverride.IsEmpty() ? Settings->AgentID : AgentIDOverride;
|
||||||
|
if (ResolvedAgentID.IsEmpty())
|
||||||
|
{
|
||||||
|
return FString();
|
||||||
|
}
|
||||||
|
|
||||||
|
// Official ElevenLabs Conversational AI WebSocket endpoint
|
||||||
|
// wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<ID>
|
||||||
|
return FString::Printf(
|
||||||
|
TEXT("wss://api.elevenlabs.io/v1/convai/conversation?agent_id=%s"),
|
||||||
|
*ResolvedAgentID);
|
||||||
|
}
|
||||||
@ -0,0 +1,50 @@
|
|||||||
|
// Copyright ASTERION. All Rights Reserved.
|
||||||
|
|
||||||
|
#include "PS_AI_Agent_ElevenLabs.h"
|
||||||
|
#include "Developer/Settings/Public/ISettingsModule.h"
|
||||||
|
#include "UObject/UObjectGlobals.h"
|
||||||
|
#include "UObject/Package.h"
|
||||||
|
|
||||||
|
IMPLEMENT_MODULE(FPS_AI_Agent_ElevenLabsModule, PS_AI_Agent_ElevenLabs)
|
||||||
|
|
||||||
|
#define LOCTEXT_NAMESPACE "PS_AI_Agent_ElevenLabs"
|
||||||
|
|
||||||
|
void FPS_AI_Agent_ElevenLabsModule::StartupModule()
|
||||||
|
{
|
||||||
|
Settings = NewObject<UElevenLabsSettings>(GetTransientPackage(), "ElevenLabsSettings", RF_Standalone);
|
||||||
|
Settings->AddToRoot();
|
||||||
|
|
||||||
|
if (ISettingsModule* SettingsModule = FModuleManager::GetModulePtr<ISettingsModule>("Settings"))
|
||||||
|
{
|
||||||
|
SettingsModule->RegisterSettings(
|
||||||
|
"Project", "Plugins", "ElevenLabsAIAgent",
|
||||||
|
LOCTEXT("SettingsName", "ElevenLabs AI Agent"),
|
||||||
|
LOCTEXT("SettingsDescription", "Configure the ElevenLabs Conversational AI Agent plugin"),
|
||||||
|
Settings);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void FPS_AI_Agent_ElevenLabsModule::ShutdownModule()
|
||||||
|
{
|
||||||
|
if (ISettingsModule* SettingsModule = FModuleManager::GetModulePtr<ISettingsModule>("Settings"))
|
||||||
|
{
|
||||||
|
SettingsModule->UnregisterSettings("Project", "Plugins", "ElevenLabsAIAgent");
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!GExitPurge)
|
||||||
|
{
|
||||||
|
Settings->RemoveFromRoot();
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
Settings = nullptr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
UElevenLabsSettings* FPS_AI_Agent_ElevenLabsModule::GetSettings() const
|
||||||
|
{
|
||||||
|
check(Settings);
|
||||||
|
return Settings;
|
||||||
|
}
|
||||||
|
|
||||||
|
#undef LOCTEXT_NAMESPACE
|
||||||
@ -0,0 +1,225 @@
|
|||||||
|
// Copyright ASTERION. All Rights Reserved.
|
||||||
|
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include "CoreMinimal.h"
|
||||||
|
#include "Components/ActorComponent.h"
|
||||||
|
#include "ElevenLabsDefinitions.h"
|
||||||
|
#include "ElevenLabsWebSocketProxy.h"
|
||||||
|
#include "Sound/SoundWaveProcedural.h"
|
||||||
|
#include "ElevenLabsConversationalAgentComponent.generated.h"
|
||||||
|
|
||||||
|
class UAudioComponent;
|
||||||
|
class UElevenLabsMicrophoneCaptureComponent;
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Delegates exposed to Blueprint
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnAgentConnected,
|
||||||
|
const FElevenLabsConversationInfo&, ConversationInfo);
|
||||||
|
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE_TwoParams(FOnAgentDisconnected,
|
||||||
|
int32, StatusCode, const FString&, Reason);
|
||||||
|
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnAgentError,
|
||||||
|
const FString&, ErrorMessage);
|
||||||
|
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnAgentTranscript,
|
||||||
|
const FElevenLabsTranscriptSegment&, Segment);
|
||||||
|
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnAgentTextResponse,
|
||||||
|
const FString&, ResponseText);
|
||||||
|
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE(FOnAgentStartedSpeaking);
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE(FOnAgentStoppedSpeaking);
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE(FOnAgentInterrupted);
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// UElevenLabsConversationalAgentComponent
|
||||||
|
//
|
||||||
|
// Attach this to any Actor (e.g. a character NPC) to give it a voice powered by
|
||||||
|
// the ElevenLabs Conversational AI API.
|
||||||
|
//
|
||||||
|
// Workflow:
|
||||||
|
// 1. Set AgentID (or rely on project default).
|
||||||
|
// 2. Call StartConversation() to open the WebSocket.
|
||||||
|
// 3. Call StartListening() / StopListening() to control microphone capture.
|
||||||
|
// 4. React to events (OnAgentTranscript, OnAgentTextResponse, etc.) in Blueprint.
|
||||||
|
// 5. Call EndConversation() when done.
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
UCLASS(ClassGroup = "ElevenLabs", meta = (BlueprintSpawnableComponent),
|
||||||
|
DisplayName = "ElevenLabs Conversational Agent")
|
||||||
|
class PS_AI_AGENT_ELEVENLABS_API UElevenLabsConversationalAgentComponent : public UActorComponent
|
||||||
|
{
|
||||||
|
GENERATED_BODY()
|
||||||
|
|
||||||
|
public:
|
||||||
|
UElevenLabsConversationalAgentComponent();
|
||||||
|
|
||||||
|
// ── Configuration ─────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/**
|
||||||
|
* ElevenLabs Agent ID. Overrides the project-level default in Project Settings.
|
||||||
|
* Leave empty to use the project default.
|
||||||
|
*/
|
||||||
|
UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs")
|
||||||
|
FString AgentID;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Turn mode:
|
||||||
|
* - Server VAD: ElevenLabs detects end-of-speech automatically (recommended).
|
||||||
|
* - Client Controlled: you call StartListening/StopListening manually (push-to-talk).
|
||||||
|
*/
|
||||||
|
UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs")
|
||||||
|
EElevenLabsTurnMode TurnMode = EElevenLabsTurnMode::Server;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Automatically start listening (microphone capture) once the WebSocket is
|
||||||
|
* connected and the conversation is initiated.
|
||||||
|
*/
|
||||||
|
UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs")
|
||||||
|
bool bAutoStartListening = true;
|
||||||
|
|
||||||
|
// ── Events ────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnAgentConnected OnAgentConnected;
|
||||||
|
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnAgentDisconnected OnAgentDisconnected;
|
||||||
|
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnAgentError OnAgentError;
|
||||||
|
|
||||||
|
/** Fired for every transcript segment (user speech or agent speech, tentative and final). */
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnAgentTranscript OnAgentTranscript;
|
||||||
|
|
||||||
|
/** Final text response produced by the agent (mirrors the audio). */
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnAgentTextResponse OnAgentTextResponse;
|
||||||
|
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnAgentStartedSpeaking OnAgentStartedSpeaking;
|
||||||
|
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnAgentStoppedSpeaking OnAgentStoppedSpeaking;
|
||||||
|
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnAgentInterrupted OnAgentInterrupted;
|
||||||
|
|
||||||
|
// ── Control ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Open the WebSocket connection and start the conversation.
|
||||||
|
* If bAutoStartListening is true, microphone capture also starts once connected.
|
||||||
|
*/
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void StartConversation();
|
||||||
|
|
||||||
|
/** Close the WebSocket and stop all audio. */
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void EndConversation();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Start capturing microphone audio and streaming it to ElevenLabs.
|
||||||
|
* In Client turn mode, also sends a UserTurnStart signal.
|
||||||
|
*/
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void StartListening();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Stop capturing microphone audio.
|
||||||
|
* In Client turn mode, also sends a UserTurnEnd signal.
|
||||||
|
*/
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void StopListening();
|
||||||
|
|
||||||
|
/** Interrupt the agent's current utterance. */
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void InterruptAgent();
|
||||||
|
|
||||||
|
// ── State queries ─────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
|
||||||
|
bool IsConnected() const;
|
||||||
|
|
||||||
|
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
|
||||||
|
bool IsListening() const { return bIsListening; }
|
||||||
|
|
||||||
|
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
|
||||||
|
bool IsAgentSpeaking() const { return bAgentSpeaking; }
|
||||||
|
|
||||||
|
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
|
||||||
|
const FElevenLabsConversationInfo& GetConversationInfo() const;
|
||||||
|
|
||||||
|
/** Access the underlying WebSocket proxy (advanced use). */
|
||||||
|
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
|
||||||
|
UElevenLabsWebSocketProxy* GetWebSocketProxy() const { return WebSocketProxy; }
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────
|
||||||
|
// UActorComponent overrides
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────
|
||||||
|
virtual void BeginPlay() override;
|
||||||
|
virtual void EndPlay(const EEndPlayReason::Type EndPlayReason) override;
|
||||||
|
virtual void TickComponent(float DeltaTime, ELevelTick TickType,
|
||||||
|
FActorComponentTickFunction* ThisTickFunction) override;
|
||||||
|
|
||||||
|
private:
|
||||||
|
// ── Internal event handlers ───────────────────────────────────────────────
|
||||||
|
UFUNCTION()
|
||||||
|
void HandleConnected(const FElevenLabsConversationInfo& Info);
|
||||||
|
|
||||||
|
UFUNCTION()
|
||||||
|
void HandleDisconnected(int32 StatusCode, const FString& Reason);
|
||||||
|
|
||||||
|
UFUNCTION()
|
||||||
|
void HandleError(const FString& ErrorMessage);
|
||||||
|
|
||||||
|
UFUNCTION()
|
||||||
|
void HandleAudioReceived(const TArray<uint8>& PCMData);
|
||||||
|
|
||||||
|
UFUNCTION()
|
||||||
|
void HandleTranscript(const FElevenLabsTranscriptSegment& Segment);
|
||||||
|
|
||||||
|
UFUNCTION()
|
||||||
|
void HandleAgentResponse(const FString& ResponseText);
|
||||||
|
|
||||||
|
UFUNCTION()
|
||||||
|
void HandleInterrupted();
|
||||||
|
|
||||||
|
// ── Audio playback ────────────────────────────────────────────────────────
|
||||||
|
void InitAudioPlayback();
|
||||||
|
void EnqueueAgentAudio(const TArray<uint8>& PCMData);
|
||||||
|
void StopAgentAudio();
|
||||||
|
/** Called by USoundWaveProcedural when it needs more PCM data. */
|
||||||
|
void OnProceduralUnderflow(USoundWaveProcedural* InProceduralWave, const int32 SamplesRequired);
|
||||||
|
|
||||||
|
// ── Microphone streaming ──────────────────────────────────────────────────
|
||||||
|
void OnMicrophoneDataCaptured(const TArray<float>& FloatPCM);
|
||||||
|
/** Convert float PCM to int16 little-endian bytes for ElevenLabs. */
|
||||||
|
static TArray<uint8> FloatPCMToInt16Bytes(const TArray<float>& FloatPCM);
|
||||||
|
|
||||||
|
// ── Sub-objects ───────────────────────────────────────────────────────────
|
||||||
|
UPROPERTY()
|
||||||
|
UElevenLabsWebSocketProxy* WebSocketProxy = nullptr;
|
||||||
|
|
||||||
|
UPROPERTY()
|
||||||
|
UAudioComponent* AudioPlaybackComponent = nullptr;
|
||||||
|
|
||||||
|
UPROPERTY()
|
||||||
|
USoundWaveProcedural* ProceduralSoundWave = nullptr;
|
||||||
|
|
||||||
|
// ── State ─────────────────────────────────────────────────────────────────
|
||||||
|
bool bIsListening = false;
|
||||||
|
bool bAgentSpeaking = false;
|
||||||
|
|
||||||
|
// Accumulates incoming PCM bytes until the audio component needs data.
|
||||||
|
TArray<uint8> AudioQueue;
|
||||||
|
FCriticalSection AudioQueueLock;
|
||||||
|
|
||||||
|
// Simple heuristic: if we haven't received audio data for this many ticks,
|
||||||
|
// consider the agent done speaking.
|
||||||
|
int32 SilentTickCount = 0;
|
||||||
|
static constexpr int32 SilenceThresholdTicks = 30; // ~0.5s at 60fps
|
||||||
|
};
|
||||||
@ -0,0 +1,104 @@
|
|||||||
|
// Copyright ASTERION. All Rights Reserved.
|
||||||
|
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include "CoreMinimal.h"
|
||||||
|
#include "ElevenLabsDefinitions.generated.h"
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Connection state
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
UENUM(BlueprintType)
|
||||||
|
enum class EElevenLabsConnectionState : uint8
|
||||||
|
{
|
||||||
|
Disconnected UMETA(DisplayName = "Disconnected"),
|
||||||
|
Connecting UMETA(DisplayName = "Connecting"),
|
||||||
|
Connected UMETA(DisplayName = "Connected"),
|
||||||
|
Error UMETA(DisplayName = "Error"),
|
||||||
|
};
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Agent turn mode
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
UENUM(BlueprintType)
|
||||||
|
enum class EElevenLabsTurnMode : uint8
|
||||||
|
{
|
||||||
|
/** ElevenLabs server decides when the user has finished speaking (default). */
|
||||||
|
Server UMETA(DisplayName = "Server VAD"),
|
||||||
|
/** Client explicitly signals turn start/end (manual push-to-talk). */
|
||||||
|
Client UMETA(DisplayName = "Client Controlled"),
|
||||||
|
};
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// WebSocket message type helpers (internal, not exposed to Blueprint)
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
namespace ElevenLabsMessageType
|
||||||
|
{
|
||||||
|
// Client → Server
|
||||||
|
static const FString AudioChunk = TEXT("user_audio_chunk");
|
||||||
|
static const FString UserTurnStart = TEXT("user_turn_start");
|
||||||
|
static const FString UserTurnEnd = TEXT("user_turn_end");
|
||||||
|
static const FString Interrupt = TEXT("interrupt");
|
||||||
|
static const FString ClientToolResult = TEXT("client_tool_result");
|
||||||
|
|
||||||
|
// Server → Client
|
||||||
|
static const FString ConversationInitiation = TEXT("conversation_initiation_metadata");
|
||||||
|
static const FString AudioResponse = TEXT("audio");
|
||||||
|
static const FString Transcript = TEXT("transcript");
|
||||||
|
static const FString AgentResponse = TEXT("agent_response");
|
||||||
|
static const FString InterruptionEvent = TEXT("interruption");
|
||||||
|
static const FString PingEvent = TEXT("ping");
|
||||||
|
static const FString ClientToolCall = TEXT("client_tool_call");
|
||||||
|
static const FString InternalTentativeAgent = TEXT("internal_tentative_agent_response");
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Audio format exchanged with ElevenLabs
|
||||||
|
// PCM 16-bit signed, 16000 Hz, mono, little-endian.
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
namespace ElevenLabsAudio
|
||||||
|
{
|
||||||
|
static constexpr int32 SampleRate = 16000;
|
||||||
|
static constexpr int32 Channels = 1;
|
||||||
|
static constexpr int32 BitsPerSample = 16;
|
||||||
|
// Chunk size sent per WebSocket frame: 100 ms of audio
|
||||||
|
static constexpr int32 ChunkSamples = SampleRate / 10; // 1600 samples = 3200 bytes
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Conversation metadata received on successful connection
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
USTRUCT(BlueprintType)
|
||||||
|
struct PS_AI_AGENT_ELEVENLABS_API FElevenLabsConversationInfo
|
||||||
|
{
|
||||||
|
GENERATED_BODY()
|
||||||
|
|
||||||
|
/** Unique ID of this conversation session assigned by ElevenLabs. */
|
||||||
|
UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs")
|
||||||
|
FString ConversationID;
|
||||||
|
|
||||||
|
/** Agent ID that is responding. */
|
||||||
|
UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs")
|
||||||
|
FString AgentID;
|
||||||
|
};
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Transcript segment
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
USTRUCT(BlueprintType)
|
||||||
|
struct PS_AI_AGENT_ELEVENLABS_API FElevenLabsTranscriptSegment
|
||||||
|
{
|
||||||
|
GENERATED_BODY()
|
||||||
|
|
||||||
|
/** Transcribed text. */
|
||||||
|
UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs")
|
||||||
|
FString Text;
|
||||||
|
|
||||||
|
/** "user" or "agent". */
|
||||||
|
UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs")
|
||||||
|
FString Speaker;
|
||||||
|
|
||||||
|
/** Whether this is a final transcript or a tentative (in-progress) one. */
|
||||||
|
UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs")
|
||||||
|
bool bIsFinal = false;
|
||||||
|
};
|
||||||
@ -0,0 +1,73 @@
|
|||||||
|
// Copyright ASTERION. All Rights Reserved.
|
||||||
|
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include "CoreMinimal.h"
|
||||||
|
#include "Components/ActorComponent.h"
|
||||||
|
#include "AudioCapture.h"
|
||||||
|
#include "ElevenLabsMicrophoneCaptureComponent.generated.h"
|
||||||
|
|
||||||
|
// Delivers captured float PCM samples (16000 Hz mono, resampled from device rate).
|
||||||
|
DECLARE_MULTICAST_DELEGATE_OneParam(FOnElevenLabsAudioCaptured, const TArray<float>& /*FloatPCM*/);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lightweight microphone capture component.
|
||||||
|
* Captures from the default audio input device, resamples to 16000 Hz mono,
|
||||||
|
* and delivers chunks via FOnElevenLabsAudioCaptured.
|
||||||
|
*
|
||||||
|
* Modelled after Convai's ConvaiAudioCaptureComponent but stripped to the
|
||||||
|
* minimal functionality needed for the ElevenLabs Conversational AI API.
|
||||||
|
*/
|
||||||
|
UCLASS(ClassGroup = "ElevenLabs", meta = (BlueprintSpawnableComponent),
|
||||||
|
DisplayName = "ElevenLabs Microphone Capture")
|
||||||
|
class PS_AI_AGENT_ELEVENLABS_API UElevenLabsMicrophoneCaptureComponent : public UActorComponent
|
||||||
|
{
|
||||||
|
GENERATED_BODY()
|
||||||
|
|
||||||
|
public:
|
||||||
|
UElevenLabsMicrophoneCaptureComponent();
|
||||||
|
|
||||||
|
/** Volume multiplier applied to captured samples before forwarding. */
|
||||||
|
UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs|Microphone",
|
||||||
|
meta = (ClampMin = "0.0", ClampMax = "4.0"))
|
||||||
|
float VolumeMultiplier = 1.0f;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Delegate fired on the game thread each time a new chunk of PCM audio
|
||||||
|
* is captured. Samples are float32, resampled to 16000 Hz mono.
|
||||||
|
*/
|
||||||
|
FOnElevenLabsAudioCaptured OnAudioCaptured;
|
||||||
|
|
||||||
|
/** Open the default capture device and begin streaming audio. */
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void StartCapture();
|
||||||
|
|
||||||
|
/** Stop streaming and close the capture device. */
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void StopCapture();
|
||||||
|
|
||||||
|
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
|
||||||
|
bool IsCapturing() const { return bCapturing; }
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────
|
||||||
|
// UActorComponent overrides
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────
|
||||||
|
virtual void EndPlay(const EEndPlayReason::Type EndPlayReason) override;
|
||||||
|
|
||||||
|
private:
|
||||||
|
/** Called by the audio capture callback on a background thread. */
|
||||||
|
void OnAudioGenerate(const float* InAudio, int32 NumSamples,
|
||||||
|
int32 InNumChannels, int32 InSampleRate, double StreamTime, bool bOverflow);
|
||||||
|
|
||||||
|
/** Simple linear resample from InSampleRate to 16000 Hz. */
|
||||||
|
static TArray<float> ResampleTo16000(const float* InAudio, int32 NumSamples,
|
||||||
|
int32 InChannels, int32 InSampleRate);
|
||||||
|
|
||||||
|
Audio::FAudioCapture AudioCapture;
|
||||||
|
Audio::FAudioCaptureDeviceParams DeviceParams;
|
||||||
|
bool bCapturing = false;
|
||||||
|
|
||||||
|
// Device sample rate discovered on StartCapture
|
||||||
|
int32 DeviceSampleRate = 44100;
|
||||||
|
int32 DeviceChannels = 1;
|
||||||
|
};
|
||||||
@ -0,0 +1,166 @@
|
|||||||
|
// Copyright ASTERION. All Rights Reserved.
|
||||||
|
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include "CoreMinimal.h"
|
||||||
|
#include "UObject/NoExportTypes.h"
|
||||||
|
#include "ElevenLabsDefinitions.h"
|
||||||
|
#include "IWebSocket.h"
|
||||||
|
#include "ElevenLabsWebSocketProxy.generated.h"
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Delegates (all Blueprint-assignable)
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsConnected,
|
||||||
|
const FElevenLabsConversationInfo&, ConversationInfo);
|
||||||
|
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE_TwoParams(FOnElevenLabsDisconnected,
|
||||||
|
int32, StatusCode, const FString&, Reason);
|
||||||
|
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsError,
|
||||||
|
const FString&, ErrorMessage);
|
||||||
|
|
||||||
|
/** Fired when a PCM audio chunk arrives from the agent. Raw bytes, 16-bit signed 16kHz mono. */
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsAudioReceived,
|
||||||
|
const TArray<uint8>&, PCMData);
|
||||||
|
|
||||||
|
/** Fired for user or agent transcript segments. */
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsTranscript,
|
||||||
|
const FElevenLabsTranscriptSegment&, Segment);
|
||||||
|
|
||||||
|
/** Fired with the final text response from the agent. */
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsAgentResponse,
|
||||||
|
const FString&, ResponseText);
|
||||||
|
|
||||||
|
/** Fired when the agent interrupts the user. */
|
||||||
|
DECLARE_DYNAMIC_MULTICAST_DELEGATE(FOnElevenLabsInterrupted);
|
||||||
|
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// WebSocket Proxy
|
||||||
|
// Manages the lifecycle of a single ElevenLabs Conversational AI WebSocket session.
|
||||||
|
// Instantiate via UElevenLabsConversationalAgentComponent (the component manages
|
||||||
|
// one proxy at a time), or create manually through Blueprints.
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
UCLASS(BlueprintType, Blueprintable)
|
||||||
|
class PS_AI_AGENT_ELEVENLABS_API UElevenLabsWebSocketProxy : public UObject
|
||||||
|
{
|
||||||
|
GENERATED_BODY()
|
||||||
|
|
||||||
|
public:
|
||||||
|
// ── Events ────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/** Called once the WebSocket handshake succeeds and the agent sends its initiation metadata. */
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnElevenLabsConnected OnConnected;
|
||||||
|
|
||||||
|
/** Called when the WebSocket closes (graceful or remote). */
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnElevenLabsDisconnected OnDisconnected;
|
||||||
|
|
||||||
|
/** Called on any connection or protocol error. */
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnElevenLabsError OnError;
|
||||||
|
|
||||||
|
/** Raw PCM audio coming from the agent — feed this into your audio component. */
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnElevenLabsAudioReceived OnAudioReceived;
|
||||||
|
|
||||||
|
/** User or agent transcript (may be tentative while the conversation is ongoing). */
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnElevenLabsTranscript OnTranscript;
|
||||||
|
|
||||||
|
/** Final text response from the agent (complements audio). */
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnElevenLabsAgentResponse OnAgentResponse;
|
||||||
|
|
||||||
|
/** The agent was interrupted by new user speech. */
|
||||||
|
UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
|
||||||
|
FOnElevenLabsInterrupted OnInterrupted;
|
||||||
|
|
||||||
|
// ── Lifecycle ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Open a WebSocket connection to ElevenLabs.
|
||||||
|
* Uses settings from Project Settings unless overridden by the parameters.
|
||||||
|
*
|
||||||
|
* @param AgentID ElevenLabs agent ID. Overrides the project-level default when non-empty.
|
||||||
|
* @param APIKey API key. Overrides the project-level default when non-empty.
|
||||||
|
*/
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void Connect(const FString& AgentID = TEXT(""), const FString& APIKey = TEXT(""));
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Gracefully close the WebSocket connection.
|
||||||
|
* OnDisconnected will fire after the server acknowledges.
|
||||||
|
*/
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void Disconnect();
|
||||||
|
|
||||||
|
/** Current connection state. */
|
||||||
|
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
|
||||||
|
EElevenLabsConnectionState GetConnectionState() const { return ConnectionState; }
|
||||||
|
|
||||||
|
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
|
||||||
|
bool IsConnected() const { return ConnectionState == EElevenLabsConnectionState::Connected; }
|
||||||
|
|
||||||
|
// ── Audio sending ─────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Send a chunk of raw PCM audio to ElevenLabs.
|
||||||
|
* Audio must be 16-bit signed, 16000 Hz, mono, little-endian.
|
||||||
|
* The data is Base64-encoded and sent as a JSON message.
|
||||||
|
* Call this repeatedly while the microphone is capturing.
|
||||||
|
*
|
||||||
|
* @param PCMData Raw PCM bytes (16-bit LE, 16kHz, mono).
|
||||||
|
*/
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void SendAudioChunk(const TArray<uint8>& PCMData);
|
||||||
|
|
||||||
|
// ── Turn control (only relevant in Client turn mode) ──────────────────────
|
||||||
|
|
||||||
|
/** Signal that the user has started speaking (Client turn mode). */
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void SendUserTurnStart();
|
||||||
|
|
||||||
|
/** Signal that the user has finished speaking (Client turn mode). */
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void SendUserTurnEnd();
|
||||||
|
|
||||||
|
/** Ask the agent to stop the current utterance. */
|
||||||
|
UFUNCTION(BlueprintCallable, Category = "ElevenLabs")
|
||||||
|
void SendInterrupt();
|
||||||
|
|
||||||
|
// ── Info ──────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
UFUNCTION(BlueprintPure, Category = "ElevenLabs")
|
||||||
|
const FElevenLabsConversationInfo& GetConversationInfo() const { return ConversationInfo; }
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────
|
||||||
|
// Internal
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────
|
||||||
|
private:
|
||||||
|
void OnWsConnected();
|
||||||
|
void OnWsConnectionError(const FString& Error);
|
||||||
|
void OnWsClosed(int32 StatusCode, const FString& Reason, bool bWasClean);
|
||||||
|
void OnWsMessage(const FString& Message);
|
||||||
|
void OnWsBinaryMessage(const void* Data, SIZE_T Size, SIZE_T BytesRemaining);
|
||||||
|
|
||||||
|
void HandleConversationInitiation(const TSharedPtr<FJsonObject>& Payload);
|
||||||
|
void HandleAudioResponse(const TSharedPtr<FJsonObject>& Payload);
|
||||||
|
void HandleTranscript(const TSharedPtr<FJsonObject>& Payload);
|
||||||
|
void HandleAgentResponse(const TSharedPtr<FJsonObject>& Payload);
|
||||||
|
void HandleInterruption(const TSharedPtr<FJsonObject>& Payload);
|
||||||
|
void HandlePing(const TSharedPtr<FJsonObject>& Payload);
|
||||||
|
|
||||||
|
/** Build and send a JSON text frame to the server. */
|
||||||
|
void SendJsonMessage(const TSharedPtr<FJsonObject>& JsonObj);
|
||||||
|
|
||||||
|
/** Resolve the WebSocket URL from settings / parameters. */
|
||||||
|
FString BuildWebSocketURL(const FString& AgentID, const FString& APIKey) const;
|
||||||
|
|
||||||
|
TSharedPtr<IWebSocket> WebSocket;
|
||||||
|
EElevenLabsConnectionState ConnectionState = EElevenLabsConnectionState::Disconnected;
|
||||||
|
FElevenLabsConversationInfo ConversationInfo;
|
||||||
|
};
|
||||||
@ -0,0 +1,99 @@
|
|||||||
|
// Copyright ASTERION. All Rights Reserved.
|
||||||
|
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include "CoreMinimal.h"
|
||||||
|
#include "Modules/ModuleManager.h"
|
||||||
|
#include "PS_AI_Agent_ElevenLabs.generated.h"
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Settings object – exposed in Project Settings → Plugins → ElevenLabs AI Agent
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
UCLASS(config = Engine, defaultconfig)
|
||||||
|
class PS_AI_AGENT_ELEVENLABS_API UElevenLabsSettings : public UObject
|
||||||
|
{
|
||||||
|
GENERATED_BODY()
|
||||||
|
|
||||||
|
public:
|
||||||
|
UElevenLabsSettings(const FObjectInitializer& ObjectInitializer)
|
||||||
|
: Super(ObjectInitializer)
|
||||||
|
{
|
||||||
|
API_Key = TEXT("");
|
||||||
|
AgentID = TEXT("");
|
||||||
|
bSignedURLMode = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* ElevenLabs API key.
|
||||||
|
* Obtain from https://elevenlabs.io – used to authenticate WebSocket connections.
|
||||||
|
* Keep this secret; do not ship with the key hard-coded in a shipping build.
|
||||||
|
*/
|
||||||
|
UPROPERTY(Config, EditAnywhere, Category = "ElevenLabs API")
|
||||||
|
FString API_Key;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The default ElevenLabs Conversational Agent ID to use when none is specified
|
||||||
|
* on the component. Create agents at https://elevenlabs.io/app/conversational-ai
|
||||||
|
*/
|
||||||
|
UPROPERTY(Config, EditAnywhere, Category = "ElevenLabs API")
|
||||||
|
FString AgentID;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* When true, the plugin fetches a signed WebSocket URL from your own backend
|
||||||
|
* before connecting, so the API key is never exposed in the client.
|
||||||
|
* Set SignedURLEndpoint to point to your server that returns the signed URL.
|
||||||
|
*/
|
||||||
|
UPROPERTY(Config, EditAnywhere, Category = "ElevenLabs API | Security")
|
||||||
|
bool bSignedURLMode;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Your backend endpoint that returns a signed WebSocket URL for ElevenLabs.
|
||||||
|
* Only used when bSignedURLMode = true.
|
||||||
|
* Expected response body: { "signed_url": "wss://..." }
|
||||||
|
*/
|
||||||
|
UPROPERTY(Config, EditAnywhere, Category = "ElevenLabs API | Security",
|
||||||
|
meta = (EditCondition = "bSignedURLMode"))
|
||||||
|
FString SignedURLEndpoint;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Override the ElevenLabs WebSocket base URL. Leave empty to use the default:
|
||||||
|
* wss://api.elevenlabs.io/v1/convai/conversation
|
||||||
|
*/
|
||||||
|
UPROPERTY(Config, EditAnywhere, AdvancedDisplay, Category = "ElevenLabs API")
|
||||||
|
FString CustomWebSocketURL;
|
||||||
|
|
||||||
|
/** Log verbose WebSocket messages to the Output Log (useful during development). */
|
||||||
|
UPROPERTY(Config, EditAnywhere, AdvancedDisplay, Category = "ElevenLabs API")
|
||||||
|
bool bVerboseLogging = false;
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
// Module
|
||||||
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
class PS_AI_AGENT_ELEVENLABS_API FPS_AI_Agent_ElevenLabsModule : public IModuleInterface
|
||||||
|
{
|
||||||
|
public:
|
||||||
|
/** IModuleInterface implementation */
|
||||||
|
virtual void StartupModule() override;
|
||||||
|
virtual void ShutdownModule() override;
|
||||||
|
|
||||||
|
virtual bool IsGameModule() const override { return true; }
|
||||||
|
|
||||||
|
/** Singleton access */
|
||||||
|
static inline FPS_AI_Agent_ElevenLabsModule& Get()
|
||||||
|
{
|
||||||
|
return FModuleManager::LoadModuleChecked<FPS_AI_Agent_ElevenLabsModule>("PS_AI_Agent_ElevenLabs");
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline bool IsAvailable()
|
||||||
|
{
|
||||||
|
return FModuleManager::Get().IsModuleLoaded("PS_AI_Agent_ElevenLabs");
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Access the settings object at runtime */
|
||||||
|
UElevenLabsSettings* GetSettings() const;
|
||||||
|
|
||||||
|
private:
|
||||||
|
UElevenLabsSettings* Settings = nullptr;
|
||||||
|
};
|
||||||
Loading…
x
Reference in New Issue
Block a user