diff --git a/.claude/MEMORY.md b/.claude/MEMORY.md new file mode 100644 index 0000000..4ff24f8 --- /dev/null +++ b/.claude/MEMORY.md @@ -0,0 +1,35 @@ +# Project Memory – PS_AI_Agent + +> This file is committed to the repository so it is available on any machine. +> Claude Code reads it automatically at session start (via the auto-memory system) +> when the working directory is inside this repo. +> **Keep it under ~180 lines** – lines beyond 200 are truncated by the system. + +--- + +## Project Location +- Repo root: `/` (wherever this is cloned) +- UE5 project: `/Unreal/PS_AI_Agent/` +- `.uproject`: `/Unreal/PS_AI_Agent/PS_AI_Agent.uproject` +- Engine: **Unreal Engine 5.5** — Win64 primary target + +## Plugins +| Plugin | Path | Purpose | +|--------|------|---------| +| Convai (reference) | `/ConvAI/Convai/` | gRPC + protobuf streaming to Convai API. Has ElevenLabs voice type enum in `ConvaiDefinitions.h`. Used as architectural reference. | +| **PS_AI_Agent_ElevenLabs** | `/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/` | Our ElevenLabs Conversational AI integration. See `.claude/elevenlabs_plugin.md` for full details. | + +## User Preferences +- Plugin naming: `PS_AI_Agent_` (e.g. `PS_AI_Agent_ElevenLabs`) +- Save memory frequently during long sessions +- Goal: ElevenLabs Conversational AI integration — simpler than Convai, no gRPC +- Full original ask + intent: see `.claude/project_context.md` + +## Key UE5 Plugin Patterns +- Settings object: `UCLASS(config=Engine, defaultconfig)` inheriting `UObject`, registered via `ISettingsModule` +- Module startup: `NewObject(..., RF_Standalone)` + `AddToRoot()` +- WebSocket: `FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), Headers)` +- Audio capture: `Audio::FAudioCapture` from the `AudioCapture` module +- Procedural audio playback: `USoundWaveProcedural` + `OnSoundWaveProceduralUnderflow` delegate +- Audio capture callbacks arrive on a **background thread** — always marshal to game thread with `AsyncTask(ENamedThreads::GameThread, ...)` +- Resample mic audio to **16000 Hz mono** before sending to ElevenLabs diff --git a/.claude/elevenlabs_plugin.md b/.claude/elevenlabs_plugin.md new file mode 100644 index 0000000..881ba64 --- /dev/null +++ b/.claude/elevenlabs_plugin.md @@ -0,0 +1,61 @@ +# PS_AI_Agent_ElevenLabs Plugin + +## Location +`Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/` + +## File Map +``` +PS_AI_Agent_ElevenLabs.uplugin +Source/PS_AI_Agent_ElevenLabs/ + PS_AI_Agent_ElevenLabs.Build.cs + Public/ + PS_AI_Agent_ElevenLabs.h – FPS_AI_Agent_ElevenLabsModule + UElevenLabsSettings + ElevenLabsDefinitions.h – Enums, structs, ElevenLabsMessageType/Audio constants + ElevenLabsWebSocketProxy.h/.cpp – UObject managing one WS session + ElevenLabsConversationalAgentComponent.h/.cpp – Main ActorComponent (attach to NPC) + ElevenLabsMicrophoneCaptureComponent.h/.cpp – Mic capture, resample, dispatch to game thread + Private/ + (implementations of the above) +``` + +## ElevenLabs Conversational AI Protocol +- **WebSocket URL**: `wss://api.elevenlabs.io/v1/convai/conversation?agent_id=` +- **Auth**: HTTP upgrade header `xi-api-key: ` (set in Project Settings) +- **All frames**: JSON text (no binary frames used by the API) +- **Audio format**: PCM 16-bit signed, 16000 Hz, mono, little-endian — Base64-encoded in JSON + +### Client → Server messages +| Type field value | Payload | +|---|---| +| *(none – key is the type)* `user_audio_chunk` | `{ "user_audio_chunk": "" }` | +| `user_turn_start` | `{ "type": "user_turn_start" }` | +| `user_turn_end` | `{ "type": "user_turn_end" }` | +| `interrupt` | `{ "type": "interrupt" }` | +| `pong` | `{ "type": "pong", "pong_event": { "event_id": N } }` | + +### Server → Client messages (field: `type`) +| type value | Key nested object | Notes | +|---|---|---| +| `conversation_initiation_metadata` | `conversation_initiation_metadata_event.conversation_id` | Marks WS ready | +| `audio` | `audio_event.audio_base_64` | Base64 PCM from agent | +| `transcript` | `transcript_event.{speaker, message, is_final}` | User or agent speech | +| `agent_response` | `agent_response_event.agent_response` | Final agent text | +| `interruption` | — | Agent stopped mid-sentence | +| `ping` | `ping_event.event_id` | Must reply with pong | + +## Key Design Decisions +- **No gRPC / no ThirdParty libs** — pure UE WebSockets + HTTP, builds out of the box +- Audio resampled in-plugin: device rate → 16000 Hz mono (linear interpolation) +- `USoundWaveProcedural` for real-time agent audio playback (queue-driven) +- Silence heuristic: 30 game-thread ticks (~0.5 s at 60 fps) with no new audio → agent done speaking +- `bSignedURLMode` setting: fetch a signed WS URL from your own backend (keeps API key off client) +- Two turn modes: `Server VAD` (ElevenLabs detects speech end) and `Client Controlled` (push-to-talk) + +## Build Dependencies (Build.cs) +Core, CoreUObject, Engine, InputCore, Json, JsonUtilities, WebSockets, HTTP, +AudioMixer, AudioCaptureCore, AudioCapture, Voice, SignalProcessing + +## Status +- **Session 1** (2026-02-19): All source files written, registered in .uproject. Not yet compiled. +- **TODO**: Open in UE 5.5 Editor → compile → test basic WS connection with a test agent ID. +- **Watch out**: Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature vs UE 5.5 API. diff --git a/.claude/project_context.md b/.claude/project_context.md new file mode 100644 index 0000000..b64a740 --- /dev/null +++ b/.claude/project_context.md @@ -0,0 +1,79 @@ +# Project Context & Original Ask + +## What the user wants to build + +A **UE5 plugin** that integrates the **ElevenLabs Conversational AI Agent** API into Unreal Engine 5.5, +allowing an in-game NPC (or any Actor) to hold a real-time voice conversation with a player. + +### The original request (paraphrased) +> "I want to create a plugin to use ElevenLabs Conversational Agent in Unreal Engine 5.5. +> I previously used the Convai plugin which does what I want, but I prefer ElevenLabs quality. +> The goal is to create a plugin in the existing Unreal Project to make a first step for integration. +> Convai AI plugin may be too big in terms of functionality for the new project, but it is the final goal. +> You can use the Convai source code to find the right way to make the ElevenLabs version — +> it should be very similar." + +### Plugin name +`PS_AI_Agent_ElevenLabs` + +--- + +## User's mental model / intent + +1. **Short-term**: A working first-step plugin — minimal but functional — that can: + - Connect to ElevenLabs Conversational AI via WebSocket + - Capture microphone audio from the player + - Stream it to ElevenLabs in real time + - Play back the agent's voice response + - Surface key events (transcript, agent text, speaking state) to Blueprint + +2. **Long-term**: Match the full feature set of Convai — character IDs, session memory, + actions/environment context, lip-sync, etc. — but powered by ElevenLabs instead. + +3. **Key preference**: Simpler than Convai. No gRPC, no protobuf, no ThirdParty precompiled + libraries. ElevenLabs' Conversational AI API uses plain WebSocket + JSON, which maps + naturally to UE's built-in `WebSockets` module. + +--- + +## How we used Convai as a reference + +We studied the Convai plugin source (`ConvAI/Convai/`) to understand: +- **Module structure**: `UConvaiSettings` + `IModuleInterface` + `ISettingsModule` registration +- **Audio capture pattern**: `Audio::FAudioCapture`, ring buffers, thread-safe dispatch to game thread +- **Audio playback pattern**: `USoundWaveProcedural` fed from a queue +- **Component architecture**: `UConvaiChatbotComponent` (NPC side) + `UConvaiPlayerComponent` (player side) +- **HTTP proxy pattern**: `UConvaiAPIBaseProxy` base class for async REST calls +- **Voice type enum**: Convai already had `EVoiceType::ElevenLabsVoices` — confirming ElevenLabs + is a natural fit + +We then replaced gRPC/protobuf with **WebSocket + JSON** to match the ElevenLabs API, and +simplified the architecture to the minimum needed for a first working version. + +--- + +## What was built (Session 1 — 2026-02-19) + +All source files created and registered. See `.claude/elevenlabs_plugin.md` for full file map and protocol details. + +### Components created +| Class | Role | +|---|---| +| `UElevenLabsSettings` | Project Settings UI — API key, Agent ID, security options | +| `UElevenLabsWebSocketProxy` | Manages one WS session: connect, send audio, handle all server message types | +| `UElevenLabsConversationalAgentComponent` | ActorComponent to attach to any NPC — orchestrates mic + WS + playback | +| `UElevenLabsMicrophoneCaptureComponent` | Wraps `Audio::FAudioCapture`, resamples to 16 kHz mono | + +### Not yet done (next sessions) +- Compile & test in UE 5.5 Editor +- Verify `USoundWaveProcedural::OnSoundWaveProceduralUnderflow` delegate signature for UE 5.5 +- Add lip-sync support (future) +- Add session memory / conversation history (future) +- Add environment/action context support (future, matching Convai's full feature set) + +--- + +## Notes on the ElevenLabs API +- Docs: https://elevenlabs.io/docs/conversational-ai +- Create agents at: https://elevenlabs.io/app/conversational-ai +- API keys at: https://elevenlabs.io (dashboard) diff --git a/.claude/settings.local.json b/.claude/settings.local.json new file mode 100644 index 0000000..87b376c --- /dev/null +++ b/.claude/settings.local.json @@ -0,0 +1,7 @@ +{ + "permissions": { + "allow": [ + "Bash(dir /s \"E:\\\\ASTERION\\\\GIT\\\\PS_AI_Agent\")" + ] + } +} diff --git a/Unreal/PS_AI_Agent/PS_AI_Agent.uproject b/Unreal/PS_AI_Agent/PS_AI_Agent.uproject index 3762467..6bbd5ee 100644 --- a/Unreal/PS_AI_Agent/PS_AI_Agent.uproject +++ b/Unreal/PS_AI_Agent/PS_AI_Agent.uproject @@ -17,6 +17,14 @@ "TargetAllowList": [ "Editor" ] + }, + { + "Name": "PS_AI_Agent_ElevenLabs", + "Enabled": true + }, + { + "Name": "WebSockets", + "Enabled": true } ] } \ No newline at end of file diff --git a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/PS_AI_Agent_ElevenLabs.uplugin b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/PS_AI_Agent_ElevenLabs.uplugin new file mode 100644 index 0000000..096b69b --- /dev/null +++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/PS_AI_Agent_ElevenLabs.uplugin @@ -0,0 +1,35 @@ +{ + "FileVersion": 3, + "Version": 1, + "VersionName": "1.0.0", + "FriendlyName": "PS AI Agent - ElevenLabs", + "Description": "Integrates ElevenLabs Conversational AI Agent into Unreal Engine 5.5. Supports real-time voice conversation via WebSocket, microphone capture, and audio playback.", + "Category": "AI", + "CreatedBy": "ASTERION", + "CreatedByURL": "", + "DocsURL": "https://elevenlabs.io/docs/conversational-ai", + "MarketplaceURL": "", + "SupportURL": "", + "CanContainContent": false, + "IsBetaVersion": true, + "IsExperimentalVersion": false, + "Installed": false, + "Modules": [ + { + "Name": "PS_AI_Agent_ElevenLabs", + "Type": "Runtime", + "LoadingPhase": "PreDefault", + "PlatformAllowList": [ + "Win64", + "Mac", + "Linux" + ] + } + ], + "Plugins": [ + { + "Name": "WebSockets", + "Enabled": true + } + ] +} diff --git a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/PS_AI_Agent_ElevenLabs.Build.cs b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/PS_AI_Agent_ElevenLabs.Build.cs new file mode 100644 index 0000000..c1ad7ad --- /dev/null +++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/PS_AI_Agent_ElevenLabs.Build.cs @@ -0,0 +1,40 @@ +// Copyright ASTERION. All Rights Reserved. + +using UnrealBuildTool; + +public class PS_AI_Agent_ElevenLabs : ModuleRules +{ + public PS_AI_Agent_ElevenLabs(ReadOnlyTargetRules Target) : base(Target) + { + DefaultBuildSettings = BuildSettingsVersion.Latest; + PCHUsage = PCHUsageMode.UseExplicitOrSharedPCHs; + + PublicDependencyModuleNames.AddRange(new string[] + { + "Core", + "CoreUObject", + "Engine", + "InputCore", + // JSON serialization for WebSocket message payloads + "Json", + "JsonUtilities", + // WebSocket for ElevenLabs Conversational AI real-time API + "WebSockets", + // HTTP for REST calls (agent metadata, auth, etc.) + "HTTP", + // Audio capture (microphone input) + "AudioMixer", + "AudioCaptureCore", + "AudioCapture", + "Voice", + "SignalProcessing", + }); + + PrivateDependencyModuleNames.AddRange(new string[] + { + "Projects", + // For ISettingsModule (Project Settings integration) + "Settings", + }); + } +} diff --git a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsConversationalAgentComponent.cpp b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsConversationalAgentComponent.cpp new file mode 100644 index 0000000..182e999 --- /dev/null +++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsConversationalAgentComponent.cpp @@ -0,0 +1,335 @@ +// Copyright ASTERION. All Rights Reserved. + +#include "ElevenLabsConversationalAgentComponent.h" +#include "ElevenLabsMicrophoneCaptureComponent.h" +#include "PS_AI_Agent_ElevenLabs.h" + +#include "Components/AudioComponent.h" +#include "Sound/SoundWaveProcedural.h" +#include "GameFramework/Actor.h" +#include "Engine/World.h" + +DEFINE_LOG_CATEGORY_STATIC(LogElevenLabsAgent, Log, All); + +// ───────────────────────────────────────────────────────────────────────────── +// Constructor +// ───────────────────────────────────────────────────────────────────────────── +UElevenLabsConversationalAgentComponent::UElevenLabsConversationalAgentComponent() +{ + PrimaryComponentTick.bCanEverTick = true; + // Tick is used only to detect silence (agent stopped speaking). + // Disable if not needed for perf. + PrimaryComponentTick.TickInterval = 1.0f / 60.0f; +} + +// ───────────────────────────────────────────────────────────────────────────── +// Lifecycle +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsConversationalAgentComponent::BeginPlay() +{ + Super::BeginPlay(); + InitAudioPlayback(); +} + +void UElevenLabsConversationalAgentComponent::EndPlay(const EEndPlayReason::Type EndPlayReason) +{ + EndConversation(); + Super::EndPlay(EndPlayReason); +} + +void UElevenLabsConversationalAgentComponent::TickComponent(float DeltaTime, ELevelTick TickType, + FActorComponentTickFunction* ThisTickFunction) +{ + Super::TickComponent(DeltaTime, TickType, ThisTickFunction); + + if (bAgentSpeaking) + { + FScopeLock Lock(&AudioQueueLock); + if (AudioQueue.Num() == 0) + { + SilentTickCount++; + if (SilentTickCount >= SilenceThresholdTicks) + { + bAgentSpeaking = false; + SilentTickCount = 0; + OnAgentStoppedSpeaking.Broadcast(); + } + } + else + { + SilentTickCount = 0; + } + } +} + +// ───────────────────────────────────────────────────────────────────────────── +// Control +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsConversationalAgentComponent::StartConversation() +{ + if (!WebSocketProxy) + { + WebSocketProxy = NewObject(this); + WebSocketProxy->OnConnected.AddDynamic(this, + &UElevenLabsConversationalAgentComponent::HandleConnected); + WebSocketProxy->OnDisconnected.AddDynamic(this, + &UElevenLabsConversationalAgentComponent::HandleDisconnected); + WebSocketProxy->OnError.AddDynamic(this, + &UElevenLabsConversationalAgentComponent::HandleError); + WebSocketProxy->OnAudioReceived.AddDynamic(this, + &UElevenLabsConversationalAgentComponent::HandleAudioReceived); + WebSocketProxy->OnTranscript.AddDynamic(this, + &UElevenLabsConversationalAgentComponent::HandleTranscript); + WebSocketProxy->OnAgentResponse.AddDynamic(this, + &UElevenLabsConversationalAgentComponent::HandleAgentResponse); + WebSocketProxy->OnInterrupted.AddDynamic(this, + &UElevenLabsConversationalAgentComponent::HandleInterrupted); + } + + WebSocketProxy->Connect(AgentID); +} + +void UElevenLabsConversationalAgentComponent::EndConversation() +{ + StopListening(); + StopAgentAudio(); + + if (WebSocketProxy) + { + WebSocketProxy->Disconnect(); + WebSocketProxy = nullptr; + } +} + +void UElevenLabsConversationalAgentComponent::StartListening() +{ + if (!IsConnected()) + { + UE_LOG(LogElevenLabsAgent, Warning, TEXT("StartListening: not connected.")); + return; + } + + if (bIsListening) return; + bIsListening = true; + + if (TurnMode == EElevenLabsTurnMode::Client) + { + WebSocketProxy->SendUserTurnStart(); + } + + // Find the microphone component on our owner actor, or create one. + UElevenLabsMicrophoneCaptureComponent* Mic = + GetOwner()->FindComponentByClass(); + + if (!Mic) + { + Mic = NewObject(GetOwner(), + TEXT("ElevenLabsMicrophone")); + Mic->RegisterComponent(); + } + + Mic->OnAudioCaptured.AddUObject(this, + &UElevenLabsConversationalAgentComponent::OnMicrophoneDataCaptured); + Mic->StartCapture(); + + UE_LOG(LogElevenLabsAgent, Log, TEXT("Microphone capture started.")); +} + +void UElevenLabsConversationalAgentComponent::StopListening() +{ + if (!bIsListening) return; + bIsListening = false; + + if (UElevenLabsMicrophoneCaptureComponent* Mic = + GetOwner() ? GetOwner()->FindComponentByClass() : nullptr) + { + Mic->StopCapture(); + Mic->OnAudioCaptured.RemoveAll(this); + } + + if (WebSocketProxy && TurnMode == EElevenLabsTurnMode::Client) + { + WebSocketProxy->SendUserTurnEnd(); + } + + UE_LOG(LogElevenLabsAgent, Log, TEXT("Microphone capture stopped.")); +} + +void UElevenLabsConversationalAgentComponent::InterruptAgent() +{ + if (WebSocketProxy) WebSocketProxy->SendInterrupt(); + StopAgentAudio(); +} + +// ───────────────────────────────────────────────────────────────────────────── +// State queries +// ───────────────────────────────────────────────────────────────────────────── +bool UElevenLabsConversationalAgentComponent::IsConnected() const +{ + return WebSocketProxy && WebSocketProxy->IsConnected(); +} + +const FElevenLabsConversationInfo& UElevenLabsConversationalAgentComponent::GetConversationInfo() const +{ + static FElevenLabsConversationInfo Empty; + return WebSocketProxy ? WebSocketProxy->GetConversationInfo() : Empty; +} + +// ───────────────────────────────────────────────────────────────────────────── +// WebSocket event handlers +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsConversationalAgentComponent::HandleConnected(const FElevenLabsConversationInfo& Info) +{ + UE_LOG(LogElevenLabsAgent, Log, TEXT("Agent connected. ConversationID=%s"), *Info.ConversationID); + OnAgentConnected.Broadcast(Info); + + if (bAutoStartListening) + { + StartListening(); + } +} + +void UElevenLabsConversationalAgentComponent::HandleDisconnected(int32 StatusCode, const FString& Reason) +{ + UE_LOG(LogElevenLabsAgent, Log, TEXT("Agent disconnected. Code=%d Reason=%s"), StatusCode, *Reason); + bIsListening = false; + bAgentSpeaking = false; + OnAgentDisconnected.Broadcast(StatusCode, Reason); +} + +void UElevenLabsConversationalAgentComponent::HandleError(const FString& ErrorMessage) +{ + UE_LOG(LogElevenLabsAgent, Error, TEXT("Agent error: %s"), *ErrorMessage); + OnAgentError.Broadcast(ErrorMessage); +} + +void UElevenLabsConversationalAgentComponent::HandleAudioReceived(const TArray& PCMData) +{ + EnqueueAgentAudio(PCMData); +} + +void UElevenLabsConversationalAgentComponent::HandleTranscript(const FElevenLabsTranscriptSegment& Segment) +{ + OnAgentTranscript.Broadcast(Segment); +} + +void UElevenLabsConversationalAgentComponent::HandleAgentResponse(const FString& ResponseText) +{ + OnAgentTextResponse.Broadcast(ResponseText); +} + +void UElevenLabsConversationalAgentComponent::HandleInterrupted() +{ + StopAgentAudio(); + OnAgentInterrupted.Broadcast(); +} + +// ───────────────────────────────────────────────────────────────────────────── +// Audio playback +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsConversationalAgentComponent::InitAudioPlayback() +{ + AActor* Owner = GetOwner(); + if (!Owner) return; + + // USoundWaveProcedural lets us push raw PCM data at runtime. + ProceduralSoundWave = NewObject(this); + ProceduralSoundWave->SetSampleRate(ElevenLabsAudio::SampleRate); + ProceduralSoundWave->NumChannels = ElevenLabsAudio::Channels; + ProceduralSoundWave->Duration = INDEFINITELY_LOOPING_DURATION; + ProceduralSoundWave->SoundGroup = SOUNDGROUP_Voice; + ProceduralSoundWave->bLooping = false; + + // Create the audio component attached to the owner. + AudioPlaybackComponent = NewObject(Owner, TEXT("ElevenLabsAudioPlayback")); + AudioPlaybackComponent->RegisterComponent(); + AudioPlaybackComponent->bAutoActivate = false; + AudioPlaybackComponent->SetSound(ProceduralSoundWave); + + // When the procedural sound wave needs more audio data, pull from our queue. + ProceduralSoundWave->OnSoundWaveProceduralUnderflow = + FOnSoundWaveProceduralUnderflow::CreateUObject( + this, &UElevenLabsConversationalAgentComponent::OnProceduralUnderflow); +} + +void UElevenLabsConversationalAgentComponent::OnProceduralUnderflow( + USoundWaveProcedural* InProceduralWave, const int32 SamplesRequired) +{ + FScopeLock Lock(&AudioQueueLock); + if (AudioQueue.Num() == 0) return; + + const int32 BytesRequired = SamplesRequired * sizeof(int16); + const int32 BytesToPush = FMath::Min(AudioQueue.Num(), BytesRequired); + + InProceduralWave->QueueAudio(AudioQueue.GetData(), BytesToPush); + AudioQueue.RemoveAt(0, BytesToPush, false); +} + +void UElevenLabsConversationalAgentComponent::EnqueueAgentAudio(const TArray& PCMData) +{ + { + FScopeLock Lock(&AudioQueueLock); + AudioQueue.Append(PCMData); + } + + // Start playback if not already playing. + if (!bAgentSpeaking) + { + bAgentSpeaking = true; + SilentTickCount = 0; + OnAgentStartedSpeaking.Broadcast(); + + if (AudioPlaybackComponent && !AudioPlaybackComponent->IsPlaying()) + { + AudioPlaybackComponent->Play(); + } + } +} + +void UElevenLabsConversationalAgentComponent::StopAgentAudio() +{ + if (AudioPlaybackComponent && AudioPlaybackComponent->IsPlaying()) + { + AudioPlaybackComponent->Stop(); + } + + FScopeLock Lock(&AudioQueueLock); + AudioQueue.Empty(); + + if (bAgentSpeaking) + { + bAgentSpeaking = false; + SilentTickCount = 0; + OnAgentStoppedSpeaking.Broadcast(); + } +} + +// ───────────────────────────────────────────────────────────────────────────── +// Microphone → WebSocket +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsConversationalAgentComponent::OnMicrophoneDataCaptured(const TArray& FloatPCM) +{ + if (!IsConnected() || !bIsListening) return; + + TArray PCMBytes = FloatPCMToInt16Bytes(FloatPCM); + WebSocketProxy->SendAudioChunk(PCMBytes); +} + +TArray UElevenLabsConversationalAgentComponent::FloatPCMToInt16Bytes(const TArray& FloatPCM) +{ + TArray Out; + Out.Reserve(FloatPCM.Num() * 2); + + for (float Sample : FloatPCM) + { + // Clamp to [-1,1] then scale to int16 range + const float Clamped = FMath::Clamp(Sample, -1.0f, 1.0f); + const int16 Int16Sample = static_cast(Clamped * 32767.0f); + + // Little-endian + Out.Add(static_cast(Int16Sample & 0xFF)); + Out.Add(static_cast((Int16Sample >> 8) & 0xFF)); + } + + return Out; +} diff --git a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsMicrophoneCaptureComponent.cpp b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsMicrophoneCaptureComponent.cpp new file mode 100644 index 0000000..d226e13 --- /dev/null +++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsMicrophoneCaptureComponent.cpp @@ -0,0 +1,168 @@ +// Copyright ASTERION. All Rights Reserved. + +#include "ElevenLabsMicrophoneCaptureComponent.h" +#include "ElevenLabsDefinitions.h" + +#include "AudioCaptureCore.h" +#include "Async/Async.h" + +DEFINE_LOG_CATEGORY_STATIC(LogElevenLabsMic, Log, All); + +// ───────────────────────────────────────────────────────────────────────────── +// Constructor +// ───────────────────────────────────────────────────────────────────────────── +UElevenLabsMicrophoneCaptureComponent::UElevenLabsMicrophoneCaptureComponent() +{ + PrimaryComponentTick.bCanEverTick = false; +} + +// ───────────────────────────────────────────────────────────────────────────── +// Lifecycle +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsMicrophoneCaptureComponent::EndPlay(const EEndPlayReason::Type EndPlayReason) +{ + StopCapture(); + Super::EndPlay(EndPlayReason); +} + +// ───────────────────────────────────────────────────────────────────────────── +// Capture control +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsMicrophoneCaptureComponent::StartCapture() +{ + if (bCapturing) + { + UE_LOG(LogElevenLabsMic, Warning, TEXT("StartCapture called while already capturing.")); + return; + } + + // Open the default audio capture stream. + // FAudioCapture discovers the default device and its sample rate automatically. + Audio::FOnAudioCaptureFunction CaptureCallback = + [this](const float* InAudio, int32 NumSamples, int32 InNumChannels, + int32 InSampleRate, double StreamTime, bool bOverflow) + { + OnAudioGenerate(InAudio, NumSamples, InNumChannels, InSampleRate, StreamTime, bOverflow); + }; + + if (!AudioCapture.OpenDefaultCaptureStream(DeviceParams, MoveTemp(CaptureCallback), 1024)) + { + UE_LOG(LogElevenLabsMic, Error, TEXT("Failed to open default audio capture stream.")); + return; + } + + // Retrieve the actual device parameters after opening the stream. + Audio::FCaptureDeviceInfo DeviceInfo; + if (AudioCapture.GetCaptureDeviceInfo(DeviceInfo)) + { + DeviceSampleRate = DeviceInfo.PreferredSampleRate; + DeviceChannels = DeviceInfo.InputChannels; + UE_LOG(LogElevenLabsMic, Log, TEXT("Capture device: %s | Rate=%d | Channels=%d"), + *DeviceInfo.DeviceName, DeviceSampleRate, DeviceChannels); + } + + AudioCapture.StartStream(); + bCapturing = true; + UE_LOG(LogElevenLabsMic, Log, TEXT("Audio capture started.")); +} + +void UElevenLabsMicrophoneCaptureComponent::StopCapture() +{ + if (!bCapturing) return; + + AudioCapture.StopStream(); + AudioCapture.CloseStream(); + bCapturing = false; + UE_LOG(LogElevenLabsMic, Log, TEXT("Audio capture stopped.")); +} + +// ───────────────────────────────────────────────────────────────────────────── +// Audio callback (background thread) +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsMicrophoneCaptureComponent::OnAudioGenerate( + const float* InAudio, int32 NumSamples, + int32 InNumChannels, int32 InSampleRate, + double StreamTime, bool bOverflow) +{ + if (bOverflow) + { + UE_LOG(LogElevenLabsMic, Verbose, TEXT("Audio capture buffer overflow.")); + } + + // Resample + downmix to 16000 Hz mono. + TArray Resampled = ResampleTo16000(InAudio, NumSamples, InNumChannels, InSampleRate); + + // Apply volume multiplier. + if (!FMath::IsNearlyEqual(VolumeMultiplier, 1.0f)) + { + for (float& S : Resampled) + { + S *= VolumeMultiplier; + } + } + + // Fire the delegate on the game thread so subscribers don't need to be + // thread-safe (WebSocket Send is not thread-safe in UE's implementation). + AsyncTask(ENamedThreads::GameThread, [this, Data = MoveTemp(Resampled)]() + { + if (bCapturing) + { + OnAudioCaptured.Broadcast(Data); + } + }); +} + +// ───────────────────────────────────────────────────────────────────────────── +// Resampling +// ───────────────────────────────────────────────────────────────────────────── +TArray UElevenLabsMicrophoneCaptureComponent::ResampleTo16000( + const float* InAudio, int32 NumSamples, + int32 InChannels, int32 InSampleRate) +{ + const int32 TargetRate = ElevenLabsAudio::SampleRate; // 16000 + + // --- Step 1: Downmix to mono --- + TArray Mono; + if (InChannels == 1) + { + Mono = TArray(InAudio, NumSamples); + } + else + { + const int32 NumFrames = NumSamples / InChannels; + Mono.Reserve(NumFrames); + for (int32 i = 0; i < NumFrames; i++) + { + float Sum = 0.0f; + for (int32 c = 0; c < InChannels; c++) + { + Sum += InAudio[i * InChannels + c]; + } + Mono.Add(Sum / static_cast(InChannels)); + } + } + + // --- Step 2: Resample via linear interpolation --- + if (InSampleRate == TargetRate) + { + return Mono; + } + + const float Ratio = static_cast(InSampleRate) / static_cast(TargetRate); + const int32 OutSamples = FMath::FloorToInt(static_cast(Mono.Num()) / Ratio); + + TArray Out; + Out.Reserve(OutSamples); + + for (int32 i = 0; i < OutSamples; i++) + { + const float SrcIndex = static_cast(i) * Ratio; + const int32 SrcLow = FMath::FloorToInt(SrcIndex); + const int32 SrcHigh = FMath::Min(SrcLow + 1, Mono.Num() - 1); + const float Alpha = SrcIndex - static_cast(SrcLow); + + Out.Add(FMath::Lerp(Mono[SrcLow], Mono[SrcHigh], Alpha)); + } + + return Out; +} diff --git a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsWebSocketProxy.cpp b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsWebSocketProxy.cpp new file mode 100644 index 0000000..296e037 --- /dev/null +++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsWebSocketProxy.cpp @@ -0,0 +1,382 @@ +// Copyright ASTERION. All Rights Reserved. + +#include "ElevenLabsWebSocketProxy.h" +#include "PS_AI_Agent_ElevenLabs.h" + +#include "WebSocketsModule.h" +#include "IWebSocket.h" + +#include "Json.h" +#include "JsonUtilities.h" +#include "Misc/Base64.h" + +DEFINE_LOG_CATEGORY_STATIC(LogElevenLabsWS, Log, All); + +// ───────────────────────────────────────────────────────────────────────────── +// Helpers +// ───────────────────────────────────────────────────────────────────────────── +static void EL_LOG(bool bVerbose, const TCHAR* Format, ...) +{ + if (!bVerbose) return; + va_list Args; + va_start(Args, Format); + // Forward to UE_LOG at Verbose level + TCHAR Buffer[2048]; + FCString::GetVarArgs(Buffer, UE_ARRAY_COUNT(Buffer), Format, Args); + va_end(Args); + UE_LOG(LogElevenLabsWS, Verbose, TEXT("%s"), Buffer); +} + +// ───────────────────────────────────────────────────────────────────────────── +// Connect / Disconnect +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsWebSocketProxy::Connect(const FString& AgentIDOverride, const FString& APIKeyOverride) +{ + if (ConnectionState == EElevenLabsConnectionState::Connected || + ConnectionState == EElevenLabsConnectionState::Connecting) + { + UE_LOG(LogElevenLabsWS, Warning, TEXT("Connect called but already connecting/connected. Ignoring.")); + return; + } + + if (!FModuleManager::Get().IsModuleLoaded("WebSockets")) + { + FModuleManager::LoadModuleChecked("WebSockets"); + } + + const FString URL = BuildWebSocketURL(AgentIDOverride, APIKeyOverride); + if (URL.IsEmpty()) + { + const FString Msg = TEXT("Cannot connect: no Agent ID configured. Set it in Project Settings or pass it to Connect()."); + UE_LOG(LogElevenLabsWS, Error, TEXT("%s"), *Msg); + OnError.Broadcast(Msg); + ConnectionState = EElevenLabsConnectionState::Error; + return; + } + + UE_LOG(LogElevenLabsWS, Log, TEXT("Connecting to ElevenLabs: %s"), *URL); + ConnectionState = EElevenLabsConnectionState::Connecting; + + // Headers: the ElevenLabs Conversational AI WS endpoint accepts the + // xi-api-key header on the initial HTTP upgrade request. + TMap UpgradeHeaders; + const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings(); + const FString ResolvedKey = APIKeyOverride.IsEmpty() ? Settings->API_Key : APIKeyOverride; + if (!ResolvedKey.IsEmpty()) + { + UpgradeHeaders.Add(TEXT("xi-api-key"), ResolvedKey); + } + + WebSocket = FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), UpgradeHeaders); + + WebSocket->OnConnected().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsConnected); + WebSocket->OnConnectionError().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsConnectionError); + WebSocket->OnClosed().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsClosed); + WebSocket->OnMessage().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsMessage); + WebSocket->OnRawMessage().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsBinaryMessage); + + WebSocket->Connect(); +} + +void UElevenLabsWebSocketProxy::Disconnect() +{ + if (WebSocket.IsValid() && WebSocket->IsConnected()) + { + WebSocket->Close(1000, TEXT("Client disconnected")); + } + ConnectionState = EElevenLabsConnectionState::Disconnected; +} + +// ───────────────────────────────────────────────────────────────────────────── +// Audio & turn control +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsWebSocketProxy::SendAudioChunk(const TArray& PCMData) +{ + if (!IsConnected()) + { + UE_LOG(LogElevenLabsWS, Warning, TEXT("SendAudioChunk: not connected.")); + return; + } + if (PCMData.Num() == 0) return; + + // ElevenLabs expects: { "user_audio_chunk": "" } + const FString Base64Audio = FBase64::Encode(PCMData.GetData(), PCMData.Num()); + + TSharedPtr Msg = MakeShareable(new FJsonObject()); + Msg->SetStringField(ElevenLabsMessageType::AudioChunk, Base64Audio); + SendJsonMessage(Msg); +} + +void UElevenLabsWebSocketProxy::SendUserTurnStart() +{ + if (!IsConnected()) return; + TSharedPtr Msg = MakeShareable(new FJsonObject()); + Msg->SetStringField(TEXT("type"), ElevenLabsMessageType::UserTurnStart); + SendJsonMessage(Msg); +} + +void UElevenLabsWebSocketProxy::SendUserTurnEnd() +{ + if (!IsConnected()) return; + TSharedPtr Msg = MakeShareable(new FJsonObject()); + Msg->SetStringField(TEXT("type"), ElevenLabsMessageType::UserTurnEnd); + SendJsonMessage(Msg); +} + +void UElevenLabsWebSocketProxy::SendInterrupt() +{ + if (!IsConnected()) return; + TSharedPtr Msg = MakeShareable(new FJsonObject()); + Msg->SetStringField(TEXT("type"), ElevenLabsMessageType::Interrupt); + SendJsonMessage(Msg); +} + +// ───────────────────────────────────────────────────────────────────────────── +// WebSocket callbacks +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsWebSocketProxy::OnWsConnected() +{ + UE_LOG(LogElevenLabsWS, Log, TEXT("WebSocket connected. Waiting for conversation_initiation_metadata...")); + // State stays Connecting until we receive the initiation metadata from the server. +} + +void UElevenLabsWebSocketProxy::OnWsConnectionError(const FString& Error) +{ + UE_LOG(LogElevenLabsWS, Error, TEXT("WebSocket connection error: %s"), *Error); + ConnectionState = EElevenLabsConnectionState::Error; + OnError.Broadcast(Error); +} + +void UElevenLabsWebSocketProxy::OnWsClosed(int32 StatusCode, const FString& Reason, bool bWasClean) +{ + UE_LOG(LogElevenLabsWS, Log, TEXT("WebSocket closed. Code=%d Reason=%s Clean=%d"), StatusCode, *Reason, bWasClean); + ConnectionState = EElevenLabsConnectionState::Disconnected; + WebSocket.Reset(); + OnDisconnected.Broadcast(StatusCode, Reason); +} + +void UElevenLabsWebSocketProxy::OnWsMessage(const FString& Message) +{ + const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings(); + if (Settings->bVerboseLogging) + { + UE_LOG(LogElevenLabsWS, Verbose, TEXT(">> %s"), *Message); + } + + TSharedPtr Root; + TSharedRef> Reader = TJsonReaderFactory<>::Create(Message); + if (!FJsonSerializer::Deserialize(Reader, Root) || !Root.IsValid()) + { + UE_LOG(LogElevenLabsWS, Warning, TEXT("Failed to parse WebSocket message as JSON.")); + return; + } + + FString MsgType; + // ElevenLabs wraps the type in a "type" field + if (!Root->TryGetStringField(TEXT("type"), MsgType)) + { + // Fallback: some messages use the top-level key as the type + // e.g. { "user_audio_chunk": "..." } from ourselves (shouldn't arrive) + UE_LOG(LogElevenLabsWS, Verbose, TEXT("Message has no 'type' field, ignoring.")); + return; + } + + if (MsgType == ElevenLabsMessageType::ConversationInitiation) + { + HandleConversationInitiation(Root); + } + else if (MsgType == ElevenLabsMessageType::AudioResponse) + { + HandleAudioResponse(Root); + } + else if (MsgType == ElevenLabsMessageType::Transcript) + { + HandleTranscript(Root); + } + else if (MsgType == ElevenLabsMessageType::AgentResponse) + { + HandleAgentResponse(Root); + } + else if (MsgType == ElevenLabsMessageType::InterruptionEvent) + { + HandleInterruption(Root); + } + else if (MsgType == ElevenLabsMessageType::PingEvent) + { + HandlePing(Root); + } + else + { + UE_LOG(LogElevenLabsWS, Verbose, TEXT("Unhandled message type: %s"), *MsgType); + } +} + +void UElevenLabsWebSocketProxy::OnWsBinaryMessage(const void* Data, SIZE_T Size, SIZE_T BytesRemaining) +{ + // ElevenLabs Conversational AI uses text (JSON) frames only. + // If binary frames arrive in future API versions, handle here. + UE_LOG(LogElevenLabsWS, Warning, TEXT("Received unexpected binary WebSocket frame (%llu bytes)."), (uint64)Size); +} + +// ───────────────────────────────────────────────────────────────────────────── +// Message handlers +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsWebSocketProxy::HandleConversationInitiation(const TSharedPtr& Root) +{ + // Expected structure: + // { "type": "conversation_initiation_metadata", + // "conversation_initiation_metadata_event": { + // "conversation_id": "...", + // "agent_output_audio_format": "pcm_16000" + // } + // } + const TSharedPtr* MetaObj = nullptr; + if (Root->TryGetObjectField(TEXT("conversation_initiation_metadata_event"), MetaObj) && MetaObj) + { + (*MetaObj)->TryGetStringField(TEXT("conversation_id"), ConversationInfo.ConversationID); + } + + UE_LOG(LogElevenLabsWS, Log, TEXT("Conversation initiated. ID=%s"), *ConversationInfo.ConversationID); + ConnectionState = EElevenLabsConnectionState::Connected; + OnConnected.Broadcast(ConversationInfo); +} + +void UElevenLabsWebSocketProxy::HandleAudioResponse(const TSharedPtr& Root) +{ + // Expected structure: + // { "type": "audio", + // "audio_event": { "audio_base_64": "", "event_id": 1 } + // } + const TSharedPtr* AudioEvent = nullptr; + if (!Root->TryGetObjectField(TEXT("audio_event"), AudioEvent) || !AudioEvent) + { + UE_LOG(LogElevenLabsWS, Warning, TEXT("audio message missing 'audio_event' field.")); + return; + } + + FString Base64Audio; + if (!(*AudioEvent)->TryGetStringField(TEXT("audio_base_64"), Base64Audio)) + { + UE_LOG(LogElevenLabsWS, Warning, TEXT("audio_event missing 'audio_base_64' field.")); + return; + } + + TArray PCMData; + if (!FBase64::Decode(Base64Audio, PCMData)) + { + UE_LOG(LogElevenLabsWS, Warning, TEXT("Failed to Base64-decode audio data.")); + return; + } + + OnAudioReceived.Broadcast(PCMData); +} + +void UElevenLabsWebSocketProxy::HandleTranscript(const TSharedPtr& Root) +{ + // Expected structure: + // { "type": "transcript", + // "transcript_event": { "speaker": "user"|"agent", "message": "...", "event_id": 1 } + // } + const TSharedPtr* TranscriptEvent = nullptr; + if (!Root->TryGetObjectField(TEXT("transcript_event"), TranscriptEvent) || !TranscriptEvent) + { + return; + } + + FElevenLabsTranscriptSegment Segment; + (*TranscriptEvent)->TryGetStringField(TEXT("speaker"), Segment.Speaker); + (*TranscriptEvent)->TryGetStringField(TEXT("message"), Segment.Text); + + // ElevenLabs marks final vs. interim via "is_final" + (*TranscriptEvent)->TryGetBoolField(TEXT("is_final"), Segment.bIsFinal); + + OnTranscript.Broadcast(Segment); +} + +void UElevenLabsWebSocketProxy::HandleAgentResponse(const TSharedPtr& Root) +{ + // { "type": "agent_response", + // "agent_response_event": { "agent_response": "..." } + // } + const TSharedPtr* ResponseEvent = nullptr; + if (!Root->TryGetObjectField(TEXT("agent_response_event"), ResponseEvent) || !ResponseEvent) + { + return; + } + + FString ResponseText; + (*ResponseEvent)->TryGetStringField(TEXT("agent_response"), ResponseText); + OnAgentResponse.Broadcast(ResponseText); +} + +void UElevenLabsWebSocketProxy::HandleInterruption(const TSharedPtr& Root) +{ + UE_LOG(LogElevenLabsWS, Log, TEXT("Agent interrupted.")); + OnInterrupted.Broadcast(); +} + +void UElevenLabsWebSocketProxy::HandlePing(const TSharedPtr& Root) +{ + // Reply with a pong to keep the connection alive. + // { "type": "ping", "ping_event": { "event_id": 1 } } + int32 EventID = 0; + const TSharedPtr* PingEvent = nullptr; + if (Root->TryGetObjectField(TEXT("ping_event"), PingEvent) && PingEvent) + { + (*PingEvent)->TryGetNumberField(TEXT("event_id"), EventID); + } + + TSharedPtr Pong = MakeShareable(new FJsonObject()); + Pong->SetStringField(TEXT("type"), TEXT("pong")); + TSharedPtr PongEvent = MakeShareable(new FJsonObject()); + PongEvent->SetNumberField(TEXT("event_id"), EventID); + Pong->SetObjectField(TEXT("pong_event"), PongEvent); + SendJsonMessage(Pong); +} + +// ───────────────────────────────────────────────────────────────────────────── +// Helpers +// ───────────────────────────────────────────────────────────────────────────── +void UElevenLabsWebSocketProxy::SendJsonMessage(const TSharedPtr& JsonObj) +{ + if (!WebSocket.IsValid() || !WebSocket->IsConnected()) + { + UE_LOG(LogElevenLabsWS, Warning, TEXT("SendJsonMessage: WebSocket not connected.")); + return; + } + + FString Out; + TSharedRef> Writer = TJsonWriterFactory<>::Create(&Out); + FJsonSerializer::Serialize(JsonObj.ToSharedRef(), Writer); + + const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings(); + if (Settings->bVerboseLogging) + { + UE_LOG(LogElevenLabsWS, Verbose, TEXT("<< %s"), *Out); + } + + WebSocket->Send(Out); +} + +FString UElevenLabsWebSocketProxy::BuildWebSocketURL(const FString& AgentIDOverride, const FString& APIKeyOverride) const +{ + const UElevenLabsSettings* Settings = FPS_AI_Agent_ElevenLabsModule::Get().GetSettings(); + + // Custom URL override takes full precedence + if (!Settings->CustomWebSocketURL.IsEmpty()) + { + return Settings->CustomWebSocketURL; + } + + const FString ResolvedAgentID = AgentIDOverride.IsEmpty() ? Settings->AgentID : AgentIDOverride; + if (ResolvedAgentID.IsEmpty()) + { + return FString(); + } + + // Official ElevenLabs Conversational AI WebSocket endpoint + // wss://api.elevenlabs.io/v1/convai/conversation?agent_id= + return FString::Printf( + TEXT("wss://api.elevenlabs.io/v1/convai/conversation?agent_id=%s"), + *ResolvedAgentID); +} diff --git a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/PS_AI_Agent_ElevenLabs.cpp b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/PS_AI_Agent_ElevenLabs.cpp new file mode 100644 index 0000000..62baad0 --- /dev/null +++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/PS_AI_Agent_ElevenLabs.cpp @@ -0,0 +1,50 @@ +// Copyright ASTERION. All Rights Reserved. + +#include "PS_AI_Agent_ElevenLabs.h" +#include "Developer/Settings/Public/ISettingsModule.h" +#include "UObject/UObjectGlobals.h" +#include "UObject/Package.h" + +IMPLEMENT_MODULE(FPS_AI_Agent_ElevenLabsModule, PS_AI_Agent_ElevenLabs) + +#define LOCTEXT_NAMESPACE "PS_AI_Agent_ElevenLabs" + +void FPS_AI_Agent_ElevenLabsModule::StartupModule() +{ + Settings = NewObject(GetTransientPackage(), "ElevenLabsSettings", RF_Standalone); + Settings->AddToRoot(); + + if (ISettingsModule* SettingsModule = FModuleManager::GetModulePtr("Settings")) + { + SettingsModule->RegisterSettings( + "Project", "Plugins", "ElevenLabsAIAgent", + LOCTEXT("SettingsName", "ElevenLabs AI Agent"), + LOCTEXT("SettingsDescription", "Configure the ElevenLabs Conversational AI Agent plugin"), + Settings); + } +} + +void FPS_AI_Agent_ElevenLabsModule::ShutdownModule() +{ + if (ISettingsModule* SettingsModule = FModuleManager::GetModulePtr("Settings")) + { + SettingsModule->UnregisterSettings("Project", "Plugins", "ElevenLabsAIAgent"); + } + + if (!GExitPurge) + { + Settings->RemoveFromRoot(); + } + else + { + Settings = nullptr; + } +} + +UElevenLabsSettings* FPS_AI_Agent_ElevenLabsModule::GetSettings() const +{ + check(Settings); + return Settings; +} + +#undef LOCTEXT_NAMESPACE diff --git a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsConversationalAgentComponent.h b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsConversationalAgentComponent.h new file mode 100644 index 0000000..d3f5b0a --- /dev/null +++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsConversationalAgentComponent.h @@ -0,0 +1,225 @@ +// Copyright ASTERION. All Rights Reserved. + +#pragma once + +#include "CoreMinimal.h" +#include "Components/ActorComponent.h" +#include "ElevenLabsDefinitions.h" +#include "ElevenLabsWebSocketProxy.h" +#include "Sound/SoundWaveProcedural.h" +#include "ElevenLabsConversationalAgentComponent.generated.h" + +class UAudioComponent; +class UElevenLabsMicrophoneCaptureComponent; + +// ───────────────────────────────────────────────────────────────────────────── +// Delegates exposed to Blueprint +// ───────────────────────────────────────────────────────────────────────────── +DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnAgentConnected, + const FElevenLabsConversationInfo&, ConversationInfo); + +DECLARE_DYNAMIC_MULTICAST_DELEGATE_TwoParams(FOnAgentDisconnected, + int32, StatusCode, const FString&, Reason); + +DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnAgentError, + const FString&, ErrorMessage); + +DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnAgentTranscript, + const FElevenLabsTranscriptSegment&, Segment); + +DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnAgentTextResponse, + const FString&, ResponseText); + +DECLARE_DYNAMIC_MULTICAST_DELEGATE(FOnAgentStartedSpeaking); +DECLARE_DYNAMIC_MULTICAST_DELEGATE(FOnAgentStoppedSpeaking); +DECLARE_DYNAMIC_MULTICAST_DELEGATE(FOnAgentInterrupted); + +// ───────────────────────────────────────────────────────────────────────────── +// UElevenLabsConversationalAgentComponent +// +// Attach this to any Actor (e.g. a character NPC) to give it a voice powered by +// the ElevenLabs Conversational AI API. +// +// Workflow: +// 1. Set AgentID (or rely on project default). +// 2. Call StartConversation() to open the WebSocket. +// 3. Call StartListening() / StopListening() to control microphone capture. +// 4. React to events (OnAgentTranscript, OnAgentTextResponse, etc.) in Blueprint. +// 5. Call EndConversation() when done. +// ───────────────────────────────────────────────────────────────────────────── +UCLASS(ClassGroup = "ElevenLabs", meta = (BlueprintSpawnableComponent), + DisplayName = "ElevenLabs Conversational Agent") +class PS_AI_AGENT_ELEVENLABS_API UElevenLabsConversationalAgentComponent : public UActorComponent +{ + GENERATED_BODY() + +public: + UElevenLabsConversationalAgentComponent(); + + // ── Configuration ───────────────────────────────────────────────────────── + + /** + * ElevenLabs Agent ID. Overrides the project-level default in Project Settings. + * Leave empty to use the project default. + */ + UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs") + FString AgentID; + + /** + * Turn mode: + * - Server VAD: ElevenLabs detects end-of-speech automatically (recommended). + * - Client Controlled: you call StartListening/StopListening manually (push-to-talk). + */ + UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs") + EElevenLabsTurnMode TurnMode = EElevenLabsTurnMode::Server; + + /** + * Automatically start listening (microphone capture) once the WebSocket is + * connected and the conversation is initiated. + */ + UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs") + bool bAutoStartListening = true; + + // ── Events ──────────────────────────────────────────────────────────────── + + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnAgentConnected OnAgentConnected; + + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnAgentDisconnected OnAgentDisconnected; + + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnAgentError OnAgentError; + + /** Fired for every transcript segment (user speech or agent speech, tentative and final). */ + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnAgentTranscript OnAgentTranscript; + + /** Final text response produced by the agent (mirrors the audio). */ + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnAgentTextResponse OnAgentTextResponse; + + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnAgentStartedSpeaking OnAgentStartedSpeaking; + + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnAgentStoppedSpeaking OnAgentStoppedSpeaking; + + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnAgentInterrupted OnAgentInterrupted; + + // ── Control ─────────────────────────────────────────────────────────────── + + /** + * Open the WebSocket connection and start the conversation. + * If bAutoStartListening is true, microphone capture also starts once connected. + */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void StartConversation(); + + /** Close the WebSocket and stop all audio. */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void EndConversation(); + + /** + * Start capturing microphone audio and streaming it to ElevenLabs. + * In Client turn mode, also sends a UserTurnStart signal. + */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void StartListening(); + + /** + * Stop capturing microphone audio. + * In Client turn mode, also sends a UserTurnEnd signal. + */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void StopListening(); + + /** Interrupt the agent's current utterance. */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void InterruptAgent(); + + // ── State queries ───────────────────────────────────────────────────────── + + UFUNCTION(BlueprintPure, Category = "ElevenLabs") + bool IsConnected() const; + + UFUNCTION(BlueprintPure, Category = "ElevenLabs") + bool IsListening() const { return bIsListening; } + + UFUNCTION(BlueprintPure, Category = "ElevenLabs") + bool IsAgentSpeaking() const { return bAgentSpeaking; } + + UFUNCTION(BlueprintPure, Category = "ElevenLabs") + const FElevenLabsConversationInfo& GetConversationInfo() const; + + /** Access the underlying WebSocket proxy (advanced use). */ + UFUNCTION(BlueprintPure, Category = "ElevenLabs") + UElevenLabsWebSocketProxy* GetWebSocketProxy() const { return WebSocketProxy; } + + // ───────────────────────────────────────────────────────────────────────── + // UActorComponent overrides + // ───────────────────────────────────────────────────────────────────────── + virtual void BeginPlay() override; + virtual void EndPlay(const EEndPlayReason::Type EndPlayReason) override; + virtual void TickComponent(float DeltaTime, ELevelTick TickType, + FActorComponentTickFunction* ThisTickFunction) override; + +private: + // ── Internal event handlers ─────────────────────────────────────────────── + UFUNCTION() + void HandleConnected(const FElevenLabsConversationInfo& Info); + + UFUNCTION() + void HandleDisconnected(int32 StatusCode, const FString& Reason); + + UFUNCTION() + void HandleError(const FString& ErrorMessage); + + UFUNCTION() + void HandleAudioReceived(const TArray& PCMData); + + UFUNCTION() + void HandleTranscript(const FElevenLabsTranscriptSegment& Segment); + + UFUNCTION() + void HandleAgentResponse(const FString& ResponseText); + + UFUNCTION() + void HandleInterrupted(); + + // ── Audio playback ──────────────────────────────────────────────────────── + void InitAudioPlayback(); + void EnqueueAgentAudio(const TArray& PCMData); + void StopAgentAudio(); + /** Called by USoundWaveProcedural when it needs more PCM data. */ + void OnProceduralUnderflow(USoundWaveProcedural* InProceduralWave, const int32 SamplesRequired); + + // ── Microphone streaming ────────────────────────────────────────────────── + void OnMicrophoneDataCaptured(const TArray& FloatPCM); + /** Convert float PCM to int16 little-endian bytes for ElevenLabs. */ + static TArray FloatPCMToInt16Bytes(const TArray& FloatPCM); + + // ── Sub-objects ─────────────────────────────────────────────────────────── + UPROPERTY() + UElevenLabsWebSocketProxy* WebSocketProxy = nullptr; + + UPROPERTY() + UAudioComponent* AudioPlaybackComponent = nullptr; + + UPROPERTY() + USoundWaveProcedural* ProceduralSoundWave = nullptr; + + // ── State ───────────────────────────────────────────────────────────────── + bool bIsListening = false; + bool bAgentSpeaking = false; + + // Accumulates incoming PCM bytes until the audio component needs data. + TArray AudioQueue; + FCriticalSection AudioQueueLock; + + // Simple heuristic: if we haven't received audio data for this many ticks, + // consider the agent done speaking. + int32 SilentTickCount = 0; + static constexpr int32 SilenceThresholdTicks = 30; // ~0.5s at 60fps +}; diff --git a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsDefinitions.h b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsDefinitions.h new file mode 100644 index 0000000..7f3dcd8 --- /dev/null +++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsDefinitions.h @@ -0,0 +1,104 @@ +// Copyright ASTERION. All Rights Reserved. + +#pragma once + +#include "CoreMinimal.h" +#include "ElevenLabsDefinitions.generated.h" + +// ───────────────────────────────────────────────────────────────────────────── +// Connection state +// ───────────────────────────────────────────────────────────────────────────── +UENUM(BlueprintType) +enum class EElevenLabsConnectionState : uint8 +{ + Disconnected UMETA(DisplayName = "Disconnected"), + Connecting UMETA(DisplayName = "Connecting"), + Connected UMETA(DisplayName = "Connected"), + Error UMETA(DisplayName = "Error"), +}; + +// ───────────────────────────────────────────────────────────────────────────── +// Agent turn mode +// ───────────────────────────────────────────────────────────────────────────── +UENUM(BlueprintType) +enum class EElevenLabsTurnMode : uint8 +{ + /** ElevenLabs server decides when the user has finished speaking (default). */ + Server UMETA(DisplayName = "Server VAD"), + /** Client explicitly signals turn start/end (manual push-to-talk). */ + Client UMETA(DisplayName = "Client Controlled"), +}; + +// ───────────────────────────────────────────────────────────────────────────── +// WebSocket message type helpers (internal, not exposed to Blueprint) +// ───────────────────────────────────────────────────────────────────────────── +namespace ElevenLabsMessageType +{ + // Client → Server + static const FString AudioChunk = TEXT("user_audio_chunk"); + static const FString UserTurnStart = TEXT("user_turn_start"); + static const FString UserTurnEnd = TEXT("user_turn_end"); + static const FString Interrupt = TEXT("interrupt"); + static const FString ClientToolResult = TEXT("client_tool_result"); + + // Server → Client + static const FString ConversationInitiation = TEXT("conversation_initiation_metadata"); + static const FString AudioResponse = TEXT("audio"); + static const FString Transcript = TEXT("transcript"); + static const FString AgentResponse = TEXT("agent_response"); + static const FString InterruptionEvent = TEXT("interruption"); + static const FString PingEvent = TEXT("ping"); + static const FString ClientToolCall = TEXT("client_tool_call"); + static const FString InternalTentativeAgent = TEXT("internal_tentative_agent_response"); +} + +// ───────────────────────────────────────────────────────────────────────────── +// Audio format exchanged with ElevenLabs +// PCM 16-bit signed, 16000 Hz, mono, little-endian. +// ───────────────────────────────────────────────────────────────────────────── +namespace ElevenLabsAudio +{ + static constexpr int32 SampleRate = 16000; + static constexpr int32 Channels = 1; + static constexpr int32 BitsPerSample = 16; + // Chunk size sent per WebSocket frame: 100 ms of audio + static constexpr int32 ChunkSamples = SampleRate / 10; // 1600 samples = 3200 bytes +} + +// ───────────────────────────────────────────────────────────────────────────── +// Conversation metadata received on successful connection +// ───────────────────────────────────────────────────────────────────────────── +USTRUCT(BlueprintType) +struct PS_AI_AGENT_ELEVENLABS_API FElevenLabsConversationInfo +{ + GENERATED_BODY() + + /** Unique ID of this conversation session assigned by ElevenLabs. */ + UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs") + FString ConversationID; + + /** Agent ID that is responding. */ + UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs") + FString AgentID; +}; + +// ───────────────────────────────────────────────────────────────────────────── +// Transcript segment +// ───────────────────────────────────────────────────────────────────────────── +USTRUCT(BlueprintType) +struct PS_AI_AGENT_ELEVENLABS_API FElevenLabsTranscriptSegment +{ + GENERATED_BODY() + + /** Transcribed text. */ + UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs") + FString Text; + + /** "user" or "agent". */ + UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs") + FString Speaker; + + /** Whether this is a final transcript or a tentative (in-progress) one. */ + UPROPERTY(BlueprintReadOnly, Category = "ElevenLabs") + bool bIsFinal = false; +}; diff --git a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsMicrophoneCaptureComponent.h b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsMicrophoneCaptureComponent.h new file mode 100644 index 0000000..3f2d6f2 --- /dev/null +++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsMicrophoneCaptureComponent.h @@ -0,0 +1,73 @@ +// Copyright ASTERION. All Rights Reserved. + +#pragma once + +#include "CoreMinimal.h" +#include "Components/ActorComponent.h" +#include "AudioCapture.h" +#include "ElevenLabsMicrophoneCaptureComponent.generated.h" + +// Delivers captured float PCM samples (16000 Hz mono, resampled from device rate). +DECLARE_MULTICAST_DELEGATE_OneParam(FOnElevenLabsAudioCaptured, const TArray& /*FloatPCM*/); + +/** + * Lightweight microphone capture component. + * Captures from the default audio input device, resamples to 16000 Hz mono, + * and delivers chunks via FOnElevenLabsAudioCaptured. + * + * Modelled after Convai's ConvaiAudioCaptureComponent but stripped to the + * minimal functionality needed for the ElevenLabs Conversational AI API. + */ +UCLASS(ClassGroup = "ElevenLabs", meta = (BlueprintSpawnableComponent), + DisplayName = "ElevenLabs Microphone Capture") +class PS_AI_AGENT_ELEVENLABS_API UElevenLabsMicrophoneCaptureComponent : public UActorComponent +{ + GENERATED_BODY() + +public: + UElevenLabsMicrophoneCaptureComponent(); + + /** Volume multiplier applied to captured samples before forwarding. */ + UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs|Microphone", + meta = (ClampMin = "0.0", ClampMax = "4.0")) + float VolumeMultiplier = 1.0f; + + /** + * Delegate fired on the game thread each time a new chunk of PCM audio + * is captured. Samples are float32, resampled to 16000 Hz mono. + */ + FOnElevenLabsAudioCaptured OnAudioCaptured; + + /** Open the default capture device and begin streaming audio. */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void StartCapture(); + + /** Stop streaming and close the capture device. */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void StopCapture(); + + UFUNCTION(BlueprintPure, Category = "ElevenLabs") + bool IsCapturing() const { return bCapturing; } + + // ───────────────────────────────────────────────────────────────────────── + // UActorComponent overrides + // ───────────────────────────────────────────────────────────────────────── + virtual void EndPlay(const EEndPlayReason::Type EndPlayReason) override; + +private: + /** Called by the audio capture callback on a background thread. */ + void OnAudioGenerate(const float* InAudio, int32 NumSamples, + int32 InNumChannels, int32 InSampleRate, double StreamTime, bool bOverflow); + + /** Simple linear resample from InSampleRate to 16000 Hz. */ + static TArray ResampleTo16000(const float* InAudio, int32 NumSamples, + int32 InChannels, int32 InSampleRate); + + Audio::FAudioCapture AudioCapture; + Audio::FAudioCaptureDeviceParams DeviceParams; + bool bCapturing = false; + + // Device sample rate discovered on StartCapture + int32 DeviceSampleRate = 44100; + int32 DeviceChannels = 1; +}; diff --git a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsWebSocketProxy.h b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsWebSocketProxy.h new file mode 100644 index 0000000..2e86a21 --- /dev/null +++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsWebSocketProxy.h @@ -0,0 +1,166 @@ +// Copyright ASTERION. All Rights Reserved. + +#pragma once + +#include "CoreMinimal.h" +#include "UObject/NoExportTypes.h" +#include "ElevenLabsDefinitions.h" +#include "IWebSocket.h" +#include "ElevenLabsWebSocketProxy.generated.h" + +// ───────────────────────────────────────────────────────────────────────────── +// Delegates (all Blueprint-assignable) +// ───────────────────────────────────────────────────────────────────────────── + +DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsConnected, + const FElevenLabsConversationInfo&, ConversationInfo); + +DECLARE_DYNAMIC_MULTICAST_DELEGATE_TwoParams(FOnElevenLabsDisconnected, + int32, StatusCode, const FString&, Reason); + +DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsError, + const FString&, ErrorMessage); + +/** Fired when a PCM audio chunk arrives from the agent. Raw bytes, 16-bit signed 16kHz mono. */ +DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsAudioReceived, + const TArray&, PCMData); + +/** Fired for user or agent transcript segments. */ +DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsTranscript, + const FElevenLabsTranscriptSegment&, Segment); + +/** Fired with the final text response from the agent. */ +DECLARE_DYNAMIC_MULTICAST_DELEGATE_OneParam(FOnElevenLabsAgentResponse, + const FString&, ResponseText); + +/** Fired when the agent interrupts the user. */ +DECLARE_DYNAMIC_MULTICAST_DELEGATE(FOnElevenLabsInterrupted); + + +// ───────────────────────────────────────────────────────────────────────────── +// WebSocket Proxy +// Manages the lifecycle of a single ElevenLabs Conversational AI WebSocket session. +// Instantiate via UElevenLabsConversationalAgentComponent (the component manages +// one proxy at a time), or create manually through Blueprints. +// ───────────────────────────────────────────────────────────────────────────── +UCLASS(BlueprintType, Blueprintable) +class PS_AI_AGENT_ELEVENLABS_API UElevenLabsWebSocketProxy : public UObject +{ + GENERATED_BODY() + +public: + // ── Events ──────────────────────────────────────────────────────────────── + + /** Called once the WebSocket handshake succeeds and the agent sends its initiation metadata. */ + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnElevenLabsConnected OnConnected; + + /** Called when the WebSocket closes (graceful or remote). */ + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnElevenLabsDisconnected OnDisconnected; + + /** Called on any connection or protocol error. */ + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnElevenLabsError OnError; + + /** Raw PCM audio coming from the agent — feed this into your audio component. */ + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnElevenLabsAudioReceived OnAudioReceived; + + /** User or agent transcript (may be tentative while the conversation is ongoing). */ + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnElevenLabsTranscript OnTranscript; + + /** Final text response from the agent (complements audio). */ + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnElevenLabsAgentResponse OnAgentResponse; + + /** The agent was interrupted by new user speech. */ + UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events") + FOnElevenLabsInterrupted OnInterrupted; + + // ── Lifecycle ───────────────────────────────────────────────────────────── + + /** + * Open a WebSocket connection to ElevenLabs. + * Uses settings from Project Settings unless overridden by the parameters. + * + * @param AgentID ElevenLabs agent ID. Overrides the project-level default when non-empty. + * @param APIKey API key. Overrides the project-level default when non-empty. + */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void Connect(const FString& AgentID = TEXT(""), const FString& APIKey = TEXT("")); + + /** + * Gracefully close the WebSocket connection. + * OnDisconnected will fire after the server acknowledges. + */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void Disconnect(); + + /** Current connection state. */ + UFUNCTION(BlueprintPure, Category = "ElevenLabs") + EElevenLabsConnectionState GetConnectionState() const { return ConnectionState; } + + UFUNCTION(BlueprintPure, Category = "ElevenLabs") + bool IsConnected() const { return ConnectionState == EElevenLabsConnectionState::Connected; } + + // ── Audio sending ───────────────────────────────────────────────────────── + + /** + * Send a chunk of raw PCM audio to ElevenLabs. + * Audio must be 16-bit signed, 16000 Hz, mono, little-endian. + * The data is Base64-encoded and sent as a JSON message. + * Call this repeatedly while the microphone is capturing. + * + * @param PCMData Raw PCM bytes (16-bit LE, 16kHz, mono). + */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void SendAudioChunk(const TArray& PCMData); + + // ── Turn control (only relevant in Client turn mode) ────────────────────── + + /** Signal that the user has started speaking (Client turn mode). */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void SendUserTurnStart(); + + /** Signal that the user has finished speaking (Client turn mode). */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void SendUserTurnEnd(); + + /** Ask the agent to stop the current utterance. */ + UFUNCTION(BlueprintCallable, Category = "ElevenLabs") + void SendInterrupt(); + + // ── Info ────────────────────────────────────────────────────────────────── + + UFUNCTION(BlueprintPure, Category = "ElevenLabs") + const FElevenLabsConversationInfo& GetConversationInfo() const { return ConversationInfo; } + + // ───────────────────────────────────────────────────────────────────────── + // Internal + // ───────────────────────────────────────────────────────────────────────── +private: + void OnWsConnected(); + void OnWsConnectionError(const FString& Error); + void OnWsClosed(int32 StatusCode, const FString& Reason, bool bWasClean); + void OnWsMessage(const FString& Message); + void OnWsBinaryMessage(const void* Data, SIZE_T Size, SIZE_T BytesRemaining); + + void HandleConversationInitiation(const TSharedPtr& Payload); + void HandleAudioResponse(const TSharedPtr& Payload); + void HandleTranscript(const TSharedPtr& Payload); + void HandleAgentResponse(const TSharedPtr& Payload); + void HandleInterruption(const TSharedPtr& Payload); + void HandlePing(const TSharedPtr& Payload); + + /** Build and send a JSON text frame to the server. */ + void SendJsonMessage(const TSharedPtr& JsonObj); + + /** Resolve the WebSocket URL from settings / parameters. */ + FString BuildWebSocketURL(const FString& AgentID, const FString& APIKey) const; + + TSharedPtr WebSocket; + EElevenLabsConnectionState ConnectionState = EElevenLabsConnectionState::Disconnected; + FElevenLabsConversationInfo ConversationInfo; +}; diff --git a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/PS_AI_Agent_ElevenLabs.h b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/PS_AI_Agent_ElevenLabs.h new file mode 100644 index 0000000..18ba69f --- /dev/null +++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/PS_AI_Agent_ElevenLabs.h @@ -0,0 +1,99 @@ +// Copyright ASTERION. All Rights Reserved. + +#pragma once + +#include "CoreMinimal.h" +#include "Modules/ModuleManager.h" +#include "PS_AI_Agent_ElevenLabs.generated.h" + +// ───────────────────────────────────────────────────────────────────────────── +// Settings object – exposed in Project Settings → Plugins → ElevenLabs AI Agent +// ───────────────────────────────────────────────────────────────────────────── +UCLASS(config = Engine, defaultconfig) +class PS_AI_AGENT_ELEVENLABS_API UElevenLabsSettings : public UObject +{ + GENERATED_BODY() + +public: + UElevenLabsSettings(const FObjectInitializer& ObjectInitializer) + : Super(ObjectInitializer) + { + API_Key = TEXT(""); + AgentID = TEXT(""); + bSignedURLMode = false; + } + + /** + * ElevenLabs API key. + * Obtain from https://elevenlabs.io – used to authenticate WebSocket connections. + * Keep this secret; do not ship with the key hard-coded in a shipping build. + */ + UPROPERTY(Config, EditAnywhere, Category = "ElevenLabs API") + FString API_Key; + + /** + * The default ElevenLabs Conversational Agent ID to use when none is specified + * on the component. Create agents at https://elevenlabs.io/app/conversational-ai + */ + UPROPERTY(Config, EditAnywhere, Category = "ElevenLabs API") + FString AgentID; + + /** + * When true, the plugin fetches a signed WebSocket URL from your own backend + * before connecting, so the API key is never exposed in the client. + * Set SignedURLEndpoint to point to your server that returns the signed URL. + */ + UPROPERTY(Config, EditAnywhere, Category = "ElevenLabs API | Security") + bool bSignedURLMode; + + /** + * Your backend endpoint that returns a signed WebSocket URL for ElevenLabs. + * Only used when bSignedURLMode = true. + * Expected response body: { "signed_url": "wss://..." } + */ + UPROPERTY(Config, EditAnywhere, Category = "ElevenLabs API | Security", + meta = (EditCondition = "bSignedURLMode")) + FString SignedURLEndpoint; + + /** + * Override the ElevenLabs WebSocket base URL. Leave empty to use the default: + * wss://api.elevenlabs.io/v1/convai/conversation + */ + UPROPERTY(Config, EditAnywhere, AdvancedDisplay, Category = "ElevenLabs API") + FString CustomWebSocketURL; + + /** Log verbose WebSocket messages to the Output Log (useful during development). */ + UPROPERTY(Config, EditAnywhere, AdvancedDisplay, Category = "ElevenLabs API") + bool bVerboseLogging = false; +}; + + +// ───────────────────────────────────────────────────────────────────────────── +// Module +// ───────────────────────────────────────────────────────────────────────────── +class PS_AI_AGENT_ELEVENLABS_API FPS_AI_Agent_ElevenLabsModule : public IModuleInterface +{ +public: + /** IModuleInterface implementation */ + virtual void StartupModule() override; + virtual void ShutdownModule() override; + + virtual bool IsGameModule() const override { return true; } + + /** Singleton access */ + static inline FPS_AI_Agent_ElevenLabsModule& Get() + { + return FModuleManager::LoadModuleChecked("PS_AI_Agent_ElevenLabs"); + } + + static inline bool IsAvailable() + { + return FModuleManager::Get().IsModuleLoaded("PS_AI_Agent_ElevenLabs"); + } + + /** Access the settings object at runtime */ + UElevenLabsSettings* GetSettings() const; + +private: + UElevenLabsSettings* Settings = nullptr; +};