8 changed files with 30 additions and 417 deletions
--- a/.claude/MEMORY.md
+++ b/.claude/MEMORY.md
@ -43,30 +43,20 @@

 ## Plugin Status
 - **PS_AI_Agent_ElevenLabs**: compiles cleanly on UE 5.5 Win64 (verified 2026-02-19)
- v1.5.0 — mic audio chunk size fixed: WASAPI 5ms callbacks accumulated to 100ms before sending
- v1.4.0 — push-to-talk fully fixed: bAutoStartListening now ignored in Client turn mode
+- v1.1.0 — all 3 protocol bugs fixed (transcript fields, pong format, client turn mode)
 - Binary WS frame handling implemented (ElevenLabs sends ALL frames as binary, not text)
 - First-byte discrimination: `{` = JSON control message, else = raw PCM audio
 - `SendTextMessage()` added to both WebSocketProxy and ConversationalAgentComponent
- `conversation_initiation_client_data` now sent immediately on WS connect (required for mic + latency)
-
-## Audio Chunk Size — CRITICAL
- WASAPI fires mic callbacks every ~5ms → **158 bytes** at 16kHz 16-bit mono
- ElevenLabs VAD/STT requires **≥3200 bytes (100ms)** per chunk; smaller chunks are silently ignored
- Fix: `MicAccumulationBuffer` in component accumulates chunks; sends only when `>= MicChunkMinBytes` (3200)
- `StopListening()` flushes remainder so final partial chunk is never dropped before end-of-turn
+- Connection confirmed working end-to-end; audio receive path functional

 ## ElevenLabs WebSocket Protocol Notes
- **ALL frames are binary** — bind ONLY `OnRawMessage`; NEVER bind `OnMessage` (text) — UE fires both for same frame → double audio bug
+- **ALL frames are binary** — `OnRawMessage` handles everything; `OnMessage` (text) never fires
 - Binary frame discrimination: peek byte[0] → `'{'` (0x7B) = JSON, else = raw PCM audio
 - Fragment reassembly: accumulate into `BinaryFrameBuffer` until `BytesRemaining == 0`
 - Pong: `{"type":"pong","event_id":N}` — `event_id` is **top-level**, NOT nested
 - Transcript: type=`user_transcript`, key=`user_transcription_event`, field=`user_transcript`
- Client turn mode (`client_vad`): send `user_activity` **with every audio chunk** (not just once) — server needs continuous signal to know user is speaking; stopping chunks = silence detected = agent responds
+- Client turn mode: `{"type":"user_activity"}` to signal speaking; no explicit end message
 - Text input: `{"type":"user_message","text":"..."}` — agent replies with audio + text
- **MUST send `conversation_initiation_client_data` immediately after WS connect** — without it, server won't process client audio (mic appears dead)
- `conversation_initiation_client_data` payload: `conversation_config_override.agent.turn.mode`, `conversation_config_override.tts.optimize_streaming_latency`, `custom_llm_extra_body.enable_intermediate_response`
- `enable_intermediate_response: true` in `custom_llm_extra_body` reduces time-to-first-audio latency

 ## API Keys / Secrets
 - ElevenLabs API key is set in **Project Settings → Plugins → ElevenLabs AI Agent** in the Editor
--- a/.claude/session_log_2026-02-19.md
+++ b/.claude/session_log_2026-02-19.md
@ -189,160 +189,9 @@ Commit: `99017f4`

 ---

---
-
-## Session 3 — 2026-02-19 (bug fixes from live testing)
-
-### 16. Three Runtime Bugs Fixed (v1.2.0)
-
-User reported after live testing:
-1. **AI speaks twice** — every audio response played double
-2. **Cannot speak** — mic capture didn't reach ElevenLabs
-3. **Latency** — requested `enable_intermediate_response: true`
-
-**Bug 1 Root Cause — Double Audio:**
-UE's libwebsockets backend fires **both** `OnMessage()` (text callback) **and** `OnRawMessage()` (binary callback) for the same incoming frame.
-We had bound both `WebSocket->OnMessage()` and `WebSocket->OnRawMessage()` in `Connect()`.
-Result: every audio frame was decoded and enqueued twice → played twice.
-
-Fix: **Remove `OnMessage` binding entirely.** `OnRawMessage` now handles all frames (JSON control messages peeked via first byte, raw PCM otherwise).
-
-**Bug 2 Root Cause — Mic Silent:**
-ElevenLabs requires a `conversation_initiation_client_data` message sent **immediately** after the WebSocket handshake completes. Without it, the server never enters a state where it will accept and process client audio chunks. This is a required session negotiation step, not optional.
-
-Fix: Send `conversation_initiation_client_data` in `OnWsConnected()` before any other message.
-
-**Bug 2 Secondary — Delegate Stacking:**
-`StartListening()` called `Mic->OnAudioCaptured.AddUObject(this, ...)` without first removing existing bindings. If called more than once (e.g. after reconnect), delegates stack up and audio is sent multiple times per frame.
-
-Fix: Add `Mic->OnAudioCaptured.RemoveAll(this)` before `AddUObject` in `StartListening()`.
-
-**Bug 3 — Latency:**
-Added `"enable_intermediate_response": true` inside `custom_llm_extra_body` of the `conversation_initiation_client_data` message. Also added `optimize_streaming_latency: 3` in `conversation_config_override.tts`.
-
-**Files changed:**
- `ElevenLabsWebSocketProxy.cpp`:
-  - `Connect()`: removed `OnMessage` binding
-  - `OnWsConnected()`: now sends full `conversation_initiation_client_data` JSON
- `ElevenLabsConversationalAgentComponent.cpp`:
-  - `StartListening()`: added `RemoveAll` guard before delegate binding
-
---
-
---
-
-## Session 4 — 2026-02-19 (mic still silent — push-to-talk deeper investigation)
-
-### 17. Two More Bugs Found and Fixed (v1.3.0)
-
-User confirmed Bug 1 (double audio) was fixed. Bug 2 (cannot speak) persisted.
-
-**Analysis of log:**
- Blueprint is correct: T Pressed → StartListening, T Released → StopListening (proper push-to-talk)
- Mic opens and closes correctly — audio capture IS happening
- Server never responds to mic input → audio reaching ElevenLabs but being ignored
-
-**Bug A — TurnMode mismatch in conversation_initiation_client_data:**
-`OnWsConnected()` hardcoded `"mode": "server_vad"` in the init message regardless of the
-component's `TurnMode` setting. User's Blueprint uses Client turn mode (push-to-talk),
-so the server was configured for server_vad while the client sent client_vad audio signals.
-
-Fix: Read `TurnMode` field on the proxy (set from the component before `Connect()`).
-Translate `EElevenLabsTurnMode::Client` → `"client_vad"`, Server → `"server_vad"`.
-
-**Bug B — user_activity never sent continuously:**
-In client VAD mode, ElevenLabs requires `user_activity` to be sent **continuously**
-alongside every audio chunk to keep the server's VAD aware the user is speaking.
-`SendUserTurnStart()` sent it once on key press, but never again during speech.
-Server-side, without continuous `user_activity`, the server treated the audio as noise.
-
-Fix: In `SendAudioChunk()`, automatically send `user_activity` before each audio chunk
-when `TurnMode == Client`. This keeps the signal continuous for the full duration of speech.
-When the user releases T, `StopListening()` stops the mic → audio stops → `user_activity`
-stops → server detects silence and triggers the agent response.
-
-**Bug C — TurnMode not propagated to proxy:**
-`UElevenLabsConversationalAgentComponent` never told the proxy what TurnMode to use.
-Added `WebSocketProxy->TurnMode = TurnMode` before `Connect()` in `StartConversation()`.
-
-**Files changed:**
- `ElevenLabsWebSocketProxy.h`: added `public TurnMode` field
- `ElevenLabsWebSocketProxy.cpp`:
-  - `OnWsConnected()`: use `TurnMode` to set correct mode string in init message
-  - `SendAudioChunk()`: auto-send `user_activity` before each chunk in Client mode
- `ElevenLabsConversationalAgentComponent.cpp`:
-  - `StartConversation()`: set `WebSocketProxy->TurnMode = TurnMode` before `Connect()`
-
---
-
---
-
-## Session 5 — 2026-02-19 (still can't speak — bAutoStartListening conflict)
-
-### 18. Root Cause Found and Fixed (v1.4.0)
-
-Log analysis revealed the true root cause:
-
-**Exact sequence:**
-```
-OnConnected → bAutoStartListening=true → StartListening() → bIsListening=true, mic opens
-OnAgentStoppedSpeaking → Blueprint calls StartListening() → bIsListening guard → no-op (already open)
-User presses T → StartListening() → bIsListening guard → no-op
-User releases T → StopListening() → bIsListening=false, mic CLOSES
-User presses T → StartListening() → NOW opens mic (was closed)
-User releases T → StopListening() → mic closes — but ElevenLabs never got audio
-```
-
-**Root cause:** `bAutoStartListening = true` opens the mic on connect and sets `bIsListening = true`.
-In Client/push-to-talk mode, every T-press hits the `bIsListening` guard and does nothing.
-Every T-release closes the auto-started mic. The mic was never open during actual speech.
-
-**Fix:** `HandleConnected()` now only calls `StartListening()` when `TurnMode == Server`.
-In Client mode, `bAutoStartListening` is ignored — the user controls listening via T key.
-
-**File changed:**
- `ElevenLabsConversationalAgentComponent.cpp`:
-  - `HandleConnected()`: guard `bAutoStartListening` with `TurnMode == Server` check
-
---
-
---
-
-## Session 6 — 2026-02-19 (audio chunk size fix)
-
-### 19. Mic Audio Chunk Accumulation (v1.5.0)
-
-**Root cause (from diagnostic log in Session 5):**
-Log showed hundreds of `SendAudioChunk: 158 bytes (TurnMode=Client)` lines with zero server responses.
- 158 bytes = 79 samples = ~5ms of audio at 16kHz 16-bit mono
- WASAPI (Windows Audio Session API) fires the `FAudioCapture` callback at its internal buffer period (~5ms)
- ElevenLabs requires a minimum chunk size for its VAD and STT to operate (~100ms / 3200 bytes)
- Tiny 5ms fragments arrived at the server but were silently ignored → agent never responded
-
-**Fix applied:**
-Added `MicAccumulationBuffer TArray<uint8>` to `UElevenLabsConversationalAgentComponent`.
-`OnMicrophoneDataCaptured()` appends each callback's converted bytes and only calls `SendAudioChunk`
-when `>= MicChunkMinBytes` (3200 bytes = 100ms) have accumulated.
-
-`StopListening()` flushes any remaining bytes in the buffer before sending `SendUserTurnEnd()`,
-so the last partial chunk of speech is never dropped.
-
-`HandleDisconnected()` clears the buffer to prevent stale data on reconnect.
-
-**Files changed:**
- `ElevenLabsConversationalAgentComponent.h`: added `MicAccumulationBuffer` + `MicChunkMinBytes = 3200`
- `ElevenLabsConversationalAgentComponent.cpp`:
-  - `OnMicrophoneDataCaptured()`: accumulate → send when threshold reached
-  - `StopListening()`: flush remainder before end-of-turn signal
-  - `HandleDisconnected()`: clear accumulation buffer
-
-Commit: `91cf5b1`
-
---
-
 ## Next Steps (not done yet)

- [ ] Test v1.5.0 in Editor — verify push-to-talk mic works end-to-end (should be the final fix)
+- [ ] Verify mic audio actually reaches ElevenLabs (enable Verbose Logging, test in Editor)
 - [ ] Test `USoundWaveProcedural` underflow behaviour in practice (check for audio glitches)
 - [ ] Test `SendTextMessage` end-to-end in Blueprint
 - [ ] Add lip-sync support (future)
--- a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Config/FilterPlugin.ini
+++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Config/FilterPlugin.ini
@ -1,8 +0,0 @@
-[FilterPlugin]
-; This section lists additional files which will be packaged along with your plugin. Paths should be listed relative to the root plugin directory, and
-; may include "...", "*", and "?" wildcards to match directories, files, and individual characters respectively.
-;
-; Examples:
-;    /README.txt
-;    /Extras/...
-;    /Binaries/ThirdParty/*.dll
--- a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsConversationalAgentComponent.cpp
+++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsConversationalAgentComponent.cpp
@ -86,10 +86,6 @@ void UElevenLabsConversationalAgentComponent::StartConversation()
 			&UElevenLabsConversationalAgentComponent::HandleInterrupted);
 	}

-	// Pass configuration to the proxy before connecting.
-	WebSocketProxy->TurnMode = TurnMode;
-	WebSocketProxy->bSpeculativeTurn = bSpeculativeTurn;
-
 	WebSocketProxy->Connect(AgentID);
 }

@ -132,9 +128,6 @@ void UElevenLabsConversationalAgentComponent::StartListening()
 		Mic->RegisterComponent();
 	}

-	// Always remove existing binding first to prevent duplicate delegates stacking
-	// up if StartListening is called more than once without a matching StopListening.
-	Mic->OnAudioCaptured.RemoveAll(this);
 	Mic->OnAudioCaptured.AddUObject(this,
 		&UElevenLabsConversationalAgentComponent::OnMicrophoneDataCaptured);
 	Mic->StartCapture();
@ -154,15 +147,6 @@ void UElevenLabsConversationalAgentComponent::StopListening()
 		Mic->OnAudioCaptured.RemoveAll(this);
 	}

-	// Flush any partially-accumulated mic audio before signalling end-of-turn.
-	// This ensures the final words aren't discarded just because the last callback
-	// didn't push the buffer over the MicChunkMinBytes threshold.
-	if (MicAccumulationBuffer.Num() > 0 && WebSocketProxy && IsConnected())
-	{
-		WebSocketProxy->SendAudioChunk(MicAccumulationBuffer);
-	}
-	MicAccumulationBuffer.Reset();
-
 	if (WebSocketProxy && TurnMode == EElevenLabsTurnMode::Client)
 	{
 		WebSocketProxy->SendUserTurnEnd();
@ -209,12 +193,7 @@ void UElevenLabsConversationalAgentComponent::HandleConnected(const FElevenLabsC
 	UE_LOG(LogElevenLabsAgent, Log, TEXT("Agent connected. ConversationID=%s"), *Info.ConversationID);
 	OnAgentConnected.Broadcast(Info);

-	// In Client turn mode (push-to-talk), the user controls listening manually via
-	// StartListening()/StopListening(). Auto-starting would leave the mic open
-	// permanently and interfere with push-to-talk — the T-release StopListening()
-	// would close the mic that auto-start opened, leaving the user unable to speak.
-	// Only auto-start in Server VAD mode where the mic stays open the whole session.
-	if (bAutoStartListening && TurnMode == EElevenLabsTurnMode::Server)
+	if (bAutoStartListening)
 	{
 		StartListening();
 	}
@ -225,7 +204,6 @@ void UElevenLabsConversationalAgentComponent::HandleDisconnected(int32 StatusCod
 	UE_LOG(LogElevenLabsAgent, Log, TEXT("Agent disconnected. Code=%d Reason=%s"), StatusCode, *Reason);
 	bIsListening = false;
 	bAgentSpeaking = false;
-	MicAccumulationBuffer.Reset();
 	OnAgentDisconnected.Broadcast(StatusCode, Reason);
 }

@ -242,18 +220,12 @@ void UElevenLabsConversationalAgentComponent::HandleAudioReceived(const TArray<u

 void UElevenLabsConversationalAgentComponent::HandleTranscript(const FElevenLabsTranscriptSegment& Segment)
 {
-	if (bEnableUserTranscript)
-	{
-		OnAgentTranscript.Broadcast(Segment);
-	}
+	OnAgentTranscript.Broadcast(Segment);
 }

 void UElevenLabsConversationalAgentComponent::HandleAgentResponse(const FString& ResponseText)
 {
-	if (bEnableAgentTextResponse)
-	{
-		OnAgentTextResponse.Broadcast(ResponseText);
-	}
+	OnAgentTextResponse.Broadcast(ResponseText);
 }

 void UElevenLabsConversationalAgentComponent::HandleInterrupted()
@ -349,18 +321,8 @@ void UElevenLabsConversationalAgentComponent::OnMicrophoneDataCaptured(const TAr
 {
 	if (!IsConnected() || !bIsListening) return;

-	// Convert this callback's samples to int16 bytes and accumulate.
-	// WASAPI fires every ~5ms (158 bytes at 16kHz). ElevenLabs needs ≥100ms
-	// (3200 bytes) per chunk for reliable VAD and STT. We hold bytes here
-	// until we have enough, then send the whole batch in one WebSocket frame.
 	TArray<uint8> PCMBytes = FloatPCMToInt16Bytes(FloatPCM);
-	MicAccumulationBuffer.Append(PCMBytes);
-
-	if (MicAccumulationBuffer.Num() >= MicChunkMinBytes)
-	{
-		WebSocketProxy->SendAudioChunk(MicAccumulationBuffer);
-		MicAccumulationBuffer.Reset();
-	}
+	WebSocketProxy->SendAudioChunk(PCMBytes);
 }

 TArray<uint8> UElevenLabsConversationalAgentComponent::FloatPCMToInt16Bytes(const TArray<float>& FloatPCM)
--- a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsMicrophoneCaptureComponent.cpp
+++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsMicrophoneCaptureComponent.cpp
@ -119,22 +119,20 @@ void UElevenLabsMicrophoneCaptureComponent::OnAudioGenerate(
 // Resampling
 // ─────────────────────────────────────────────────────────────────────────────
 TArray<float> UElevenLabsMicrophoneCaptureComponent::ResampleTo16000(
-	const float* InAudio, int32 NumFrames,
+	const float* InAudio, int32 NumSamples,
 	int32 InChannels, int32 InSampleRate)
 {
 	const int32 TargetRate = ElevenLabsAudio::SampleRate; // 16000

 	// --- Step 1: Downmix to mono ---
-	// NOTE: NumFrames is the number of audio frames (not total samples).
-	// Each frame contains InChannels samples (e.g. 2 for stereo).
-	// The raw buffer has NumFrames * InChannels total float values.
 	TArray<float> Mono;
 	if (InChannels == 1)
 	{
-		Mono = TArray<float>(InAudio, NumFrames);
+		Mono = TArray<float>(InAudio, NumSamples);
 	}
 	else
 	{
+		const int32 NumFrames = NumSamples / InChannels;
 		Mono.Reserve(NumFrames);
 		for (int32 i = 0; i < NumFrames; i++)
 		{
--- a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsWebSocketProxy.cpp
+++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Private/ElevenLabsWebSocketProxy.cpp
@ -72,11 +72,7 @@ void UElevenLabsWebSocketProxy::Connect(const FString& AgentIDOverride, const FS
 	WebSocket->OnConnected().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsConnected);
 	WebSocket->OnConnectionError().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsConnectionError);
 	WebSocket->OnClosed().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsClosed);
-	// NOTE: We bind ONLY OnRawMessage (binary frames), NOT OnMessage (text frames).
-	// UE's WebSocket implementation fires BOTH callbacks for the same frame when using
-	// the libwebsockets backend — binding both causes every audio packet to be decoded
-	// and played twice. OnRawMessage handles all frame types: raw binary audio AND
-	// text-framed JSON (detected by peeking first byte for '{').
+	WebSocket->OnMessage().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsMessage);
 	WebSocket->OnRawMessage().AddUObject(this, &UElevenLabsWebSocketProxy::OnWsBinaryMessage);

 	WebSocket->Connect();
@ -98,58 +94,36 @@ void UElevenLabsWebSocketProxy::SendAudioChunk(const TArray<uint8>& PCMData)
 {
 	if (!IsConnected())
 	{
-		UE_LOG(LogElevenLabsWS, Warning, TEXT("SendAudioChunk: not connected (state=%d). Audio dropped."),
-			(int32)ConnectionState);
+		UE_LOG(LogElevenLabsWS, Warning, TEXT("SendAudioChunk: not connected."));
 		return;
 	}
 	if (PCMData.Num() == 0) return;

-	UE_LOG(LogElevenLabsWS, Log, TEXT("SendAudioChunk: %d bytes (PCM int16 LE @ 16kHz mono)"), PCMData.Num());
-
-	// Track when the last audio chunk was sent for latency measurement.
-	LastAudioChunkSentTime = FPlatformTime::Seconds();
-
 	// ElevenLabs expects: { "user_audio_chunk": "<base64 PCM>" }
-	// The server's VAD detects silence to determine end-of-turn.
-	// Do NOT send user_activity here — it resets the turn timeout timer
-	// and would prevent the server from taking the turn after the user stops speaking.
 	const FString Base64Audio = FBase64::Encode(PCMData.GetData(), PCMData.Num());

-	// Send as compact JSON (no pretty-printing) directly, bypassing SendJsonMessage
-	// to avoid the pretty-printed writer and to keep the payload minimal.
-	const FString AudioJson = FString::Printf(TEXT("{\"user_audio_chunk\":\"%s\"}"), *Base64Audio);
-
-	// Log first chunk fully for debugging
-	static int32 AudioChunksSent = 0;
-	AudioChunksSent++;
-	if (AudioChunksSent <= 2)
-	{
-		UE_LOG(LogElevenLabsWS, Log, TEXT("  Audio JSON (first 200 chars): %.200s"), *AudioJson);
-	}
-
-	if (WebSocket.IsValid() && WebSocket->IsConnected())
-	{
-		WebSocket->Send(AudioJson);
-	}
+	TSharedPtr<FJsonObject> Msg = MakeShareable(new FJsonObject());
+	Msg->SetStringField(ElevenLabsMessageType::AudioChunk, Base64Audio);
+	SendJsonMessage(Msg);
 }

 void UElevenLabsWebSocketProxy::SendUserTurnStart()
 {
-	// No-op: the ElevenLabs API does not require a "start speaking" signal.
-	// The server's VAD detects speech from the audio chunks we send.
-	// user_activity is a keep-alive/timeout-reset message and should NOT be
-	// sent here — it would delay the agent's turn after the user stops.
-	UE_LOG(LogElevenLabsWS, Log, TEXT("User turn started (audio chunks will follow)."));
+	// In client turn mode, signal that the user is active/speaking.
+	// API message: { "type": "user_activity" }
+	if (!IsConnected()) return;
+	TSharedPtr<FJsonObject> Msg = MakeShareable(new FJsonObject());
+	Msg->SetStringField(TEXT("type"), ElevenLabsMessageType::UserActivity);
+	SendJsonMessage(Msg);
 }

 void UElevenLabsWebSocketProxy::SendUserTurnEnd()
 {
-	// No explicit "end turn" message exists in the ElevenLabs API.
-	// The server detects end-of-speech via VAD when we stop sending audio chunks.
-	UserTurnEndTime = FPlatformTime::Seconds();
-	bWaitingForResponse = true;
-	bFirstAudioResponseLogged = false;
-	UE_LOG(LogElevenLabsWS, Log, TEXT("User turn ended — stopped sending audio chunks. Server VAD will detect silence."));
+	// In client turn mode, stopping user_activity signals end of user turn.
+	// The API uses user_activity for ongoing speech; simply stop sending it.
+	// No explicit end message is required — silence is detected server-side.
+	// We still log for debug visibility.
+	UE_LOG(LogElevenLabsWS, Log, TEXT("User turn ended (client mode) — stopped sending user_activity."));
 }

 void UElevenLabsWebSocketProxy::SendTextMessage(const FString& Text)
@ -181,79 +155,8 @@ void UElevenLabsWebSocketProxy::SendInterrupt()
 // ─────────────────────────────────────────────────────────────────────────────
 void UElevenLabsWebSocketProxy::OnWsConnected()
 {
-	UE_LOG(LogElevenLabsWS, Log, TEXT("WebSocket connected. Sending conversation_initiation_client_data..."));
-	// State stays Connecting until we receive conversation_initiation_metadata from the server.
-
-	// ElevenLabs requires this message immediately after the WebSocket handshake to
-	// negotiate the session configuration. Without it, the server won't accept audio
-	// from the client (microphone stays silent from server perspective) and default
-	// settings are used (higher latency, no intermediate responses).
-	//
-	// Structure:
-	// {
-	//   "type": "conversation_initiation_client_data",
-	//   "conversation_config_override": {
-	//     "agent": {
-	//       "turn": { "turn_timeout": 3, "speculative_turn": true }
-	//     },
-	//     "tts": {
-	//       "optimize_streaming_latency": 3
-	//     }
-	//   },
-	//   "custom_llm_extra_body": {
-	//     "enable_intermediate_response": true
-	//   }
-	// }
-
-	// Configure turn-taking behaviour.
-	// The ElevenLabs API does NOT have a turn.mode field.
-	// Turn-taking is controlled by the server's VAD and the turn_* parameters.
-	// In push-to-talk (Client mode) the user controls the mic; the server still
-	// uses its VAD to detect the end of speech from the audio chunks it receives.
-	TSharedPtr<FJsonObject> TurnObj = MakeShareable(new FJsonObject());
-	// Lower turn_timeout so the agent responds faster after the user stops speaking.
-	// Default is 7s. In push-to-talk (Client mode), the user explicitly signals
-	// end-of-turn by releasing the key, so we can use a very short timeout (1s).
-	if (TurnMode == EElevenLabsTurnMode::Client)
-	{
-		TurnObj->SetNumberField(TEXT("turn_timeout"), 1);
-	}
-	// Speculative turn: start LLM generation during silence before the VAD is
-	// fully confident the user finished speaking. Reduces latency by 200-500ms.
-	if (bSpeculativeTurn)
-	{
-		TurnObj->SetBoolField(TEXT("speculative_turn"), true);
-	}
-
-	TSharedPtr<FJsonObject> AgentObj = MakeShareable(new FJsonObject());
-	AgentObj->SetObjectField(TEXT("turn"), TurnObj);
-
-	TSharedPtr<FJsonObject> TtsObj = MakeShareable(new FJsonObject());
-	TtsObj->SetNumberField(TEXT("optimize_streaming_latency"), 3);
-
-	TSharedPtr<FJsonObject> ConversationConfigOverride = MakeShareable(new FJsonObject());
-	ConversationConfigOverride->SetObjectField(TEXT("agent"), AgentObj);
-	ConversationConfigOverride->SetObjectField(TEXT("tts"), TtsObj);
-
-	// enable_intermediate_response reduces time-to-first-audio by allowing the agent
-	// to start speaking before it has finished generating the full response.
-	TSharedPtr<FJsonObject> CustomLlmExtraBody = MakeShareable(new FJsonObject());
-	CustomLlmExtraBody->SetBoolField(TEXT("enable_intermediate_response"), true);
-
-	TSharedPtr<FJsonObject> InitMsg = MakeShareable(new FJsonObject());
-	InitMsg->SetStringField(TEXT("type"), ElevenLabsMessageType::ConversationClientData);
-	InitMsg->SetObjectField(TEXT("conversation_config_override"), ConversationConfigOverride);
-	InitMsg->SetObjectField(TEXT("custom_llm_extra_body"), CustomLlmExtraBody);
-
-	// NOTE: We bypass SendJsonMessage() here intentionally.
-	// SendJsonMessage() guards on WebSocket->IsConnected(), but OnWsConnected fires
-	// during the handshake before IsConnected() returns true in some UE WS backends.
-	// We know the socket is open at this point — send directly.
-	FString InitJson;
-	TSharedRef<TJsonWriter<>> InitWriter = TJsonWriterFactory<>::Create(&InitJson);
-	FJsonSerializer::Serialize(InitMsg.ToSharedRef(), InitWriter);
-	UE_LOG(LogElevenLabsWS, Log, TEXT("Sending initiation: %s"), *InitJson);
-	WebSocket->Send(InitJson);
+	UE_LOG(LogElevenLabsWS, Log, TEXT("WebSocket connected. Waiting for conversation_initiation_metadata..."));
+	// State stays Connecting until we receive the initiation metadata from the server.
 }

 void UElevenLabsWebSocketProxy::OnWsConnectionError(const FString& Error)
@ -297,53 +200,20 @@ void UElevenLabsWebSocketProxy::OnWsMessage(const FString& Message)
 		return;
 	}

-	// Log every message type received from the server for debugging.
-	UE_LOG(LogElevenLabsWS, Log, TEXT("Received message type: %s"), *MsgType);
-
 	if (MsgType == ElevenLabsMessageType::ConversationInitiation)
 	{
 		HandleConversationInitiation(Root);
 	}
 	else if (MsgType == ElevenLabsMessageType::AudioResponse)
 	{
-		// Log time-to-first-audio: latency between end of user turn and first agent audio.
-		if (bWaitingForResponse && !bFirstAudioResponseLogged)
-		{
-			const double Now = FPlatformTime::Seconds();
-			const double LatencyFromTurnEnd = (Now - UserTurnEndTime) * 1000.0;
-			const double LatencyFromLastChunk = (Now - LastAudioChunkSentTime) * 1000.0;
-			UE_LOG(LogElevenLabsWS, Warning,
-				TEXT("[LATENCY] Time-to-first-audio: %.0f ms (from turn end), %.0f ms (from last chunk sent)"),
-				LatencyFromTurnEnd, LatencyFromLastChunk);
-			bFirstAudioResponseLogged = true;
-		}
 		HandleAudioResponse(Root);
 	}
 	else if (MsgType == ElevenLabsMessageType::UserTranscript)
 	{
-		// Log transcription latency.
-		if (bWaitingForResponse)
-		{
-			const double Now = FPlatformTime::Seconds();
-			const double LatencyFromTurnEnd = (Now - UserTurnEndTime) * 1000.0;
-			UE_LOG(LogElevenLabsWS, Warning,
-				TEXT("[LATENCY] User transcript received: %.0f ms after turn end"),
-				LatencyFromTurnEnd);
-			bWaitingForResponse = false;
-		}
 		HandleTranscript(Root);
 	}
 	else if (MsgType == ElevenLabsMessageType::AgentResponse)
 	{
-		// Log agent text response latency.
-		if (UserTurnEndTime > 0.0)
-		{
-			const double Now = FPlatformTime::Seconds();
-			const double LatencyFromTurnEnd = (Now - UserTurnEndTime) * 1000.0;
-			UE_LOG(LogElevenLabsWS, Warning,
-				TEXT("[LATENCY] Agent text response: %.0f ms after turn end"),
-				LatencyFromTurnEnd);
-		}
 		HandleAgentResponse(Root);
 	}
 	else if (MsgType == ElevenLabsMessageType::AgentResponseCorrection)
--- a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsConversationalAgentComponent.h
+++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsConversationalAgentComponent.h
@ -80,29 +80,6 @@ public:
 	UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs")
 	bool bAutoStartListening = true;

-	/**
-	 * Enable speculative turn: the LLM starts generating a response during
-	 * silence before the VAD is fully confident the user has finished speaking.
-	 * Reduces latency by 200-500ms but may occasionally produce premature responses.
-	 */
-	UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs|Latency")
-	bool bSpeculativeTurn = true;
-
-	/**
-	 * Forward user speech transcripts (user_transcript events) to the
-	 * OnAgentTranscript delegate. Disable to reduce overhead if you don't
-	 * need to display what the user said.
-	 */
-	UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs|Events")
-	bool bEnableUserTranscript = true;
-
-	/**
-	 * Forward agent text responses (agent_response events) to the
-	 * OnAgentTextResponse delegate. Disable if you only need audio output.
-	 */
-	UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "ElevenLabs|Events")
-	bool bEnableAgentTextResponse = true;
-
 	// ── Events ────────────────────────────────────────────────────────────────

 	UPROPERTY(BlueprintAssignable, Category = "ElevenLabs|Events")
@ -253,11 +230,4 @@ private:
 	// consider the agent done speaking.
 	int32 SilentTickCount = 0;
 	static constexpr int32 SilenceThresholdTicks = 30; // ~0.5s at 60fps
-
-	// ── Microphone accumulation ───────────────────────────────────────────────
-	// WASAPI fires callbacks every ~5ms (158 bytes at 16kHz 16-bit mono).
-	// ElevenLabs needs at least ~100ms (3200 bytes) per chunk for reliable VAD/STT.
-	// We accumulate here and only call SendAudioChunk once enough bytes are ready.
-	TArray<uint8> MicAccumulationBuffer;
-	static constexpr int32 MicChunkMinBytes = 3200; // 100ms @ 16kHz 16-bit mono
 };
--- a/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsWebSocketProxy.h
+++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/Source/PS_AI_Agent_ElevenLabs/Public/ElevenLabsWebSocketProxy.h
@ -183,22 +183,4 @@ private:
 	// Accumulation buffer for multi-fragment binary WebSocket frames.
 	// ElevenLabs sends JSON as binary frames; large messages arrive in fragments.
 	TArray<uint8> BinaryFrameBuffer;
-
-	// ── Latency tracking ─────────────────────────────────────────────────────
-	// Timestamp of the last audio chunk sent (user speech).
-	double LastAudioChunkSentTime = 0.0;
-	// Timestamp when user turn ended (StopListening).
-	double UserTurnEndTime = 0.0;
-	// Whether we are waiting for the first response after user stopped speaking.
-	bool bWaitingForResponse = false;
-	// Whether we already logged the first audio response latency for this turn.
-	bool bFirstAudioResponseLogged = false;
-
-public:
-	// Set by UElevenLabsConversationalAgentComponent before calling Connect().
-	// Controls turn_timeout in conversation_initiation_client_data.
-	EElevenLabsTurnMode TurnMode = EElevenLabsTurnMode::Server;
-
-	// Speculative turn: start LLM generation during silence before full turn confidence.
-	bool bSpeculativeTurn = true;
 };