Update memory: latency HUD status + future server-side metrics TODO

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix latency HUD: anchor all metrics to agent_response_started
2026-03-05 18:47:11 +01:00 · 2026-03-05 18:39:49 +01:00 · 2026-03-05 18:32:59 +01:00 · 2026-03-05 18:27:14 +01:00 · 2026-03-05 18:22:28 +01:00 · 2026-03-05 18:06:39 +01:00
3 changed files with 143 additions and 75 deletions
--- a/.claude/MEMORY.md
+++ b/.claude/MEMORY.md
@ -17,69 +17,70 @@
 ## Plugins
 | Plugin | Path | Purpose |
 |--------|------|---------|
-| Convai (reference) | `<repo_root>/ConvAI/Convai/` | gRPC + protobuf streaming to Convai API. Has ElevenLabs voice type enum in `ConvaiDefinitions.h`. Used as architectural reference. |
-| **PS_AI_Agent_ElevenLabs** | `<repo_root>/Unreal/PS_AI_Agent/Plugins/PS_AI_Agent_ElevenLabs/` | Our ElevenLabs Conversational AI integration. See `.claude/elevenlabs_plugin.md` for full details. |
+| Convai (reference) | `<repo_root>/ConvAI/Convai/` | gRPC + protobuf streaming to Convai API. Used as architectural reference. |
+| **PS_AI_ConvAgent** | `<repo_root>/Unreal/PS_AI_Agent/Plugins/PS_AI_ConvAgent/` | Main plugin — ElevenLabs Conversational AI, posture, gaze, lip sync, facial expressions. |

 ## User Preferences
- Plugin naming: `PS_AI_Agent_<Service>` (e.g. `PS_AI_Agent_ElevenLabs`)
+- Plugin naming: `PS_AI_ConvAgent` (renamed from PS_AI_Agent_ElevenLabs)
 - Save memory frequently during long sessions
- Goal: ElevenLabs Conversational AI integration — simpler than Convai, no gRPC
- Full original ask + intent: see `.claude/project_context.md`
 - Git remote is a **private server** — no public exposure risk
+- Full original ask + intent: see `.claude/project_context.md`
+
+## Current Branch & Work
+- **Branch**: `main`
+- **Recent merges**: `feature/multi-player-shared-agent` merged to main
+
+### Latency Debug HUD (just implemented)
+- Separate `bDebugLatency` property + CVar `ps.ai.ConvAgent.Debug.Latency`
+- All metrics anchored to `GenerationStartTime` (`agent_response_started` event)
+- Metrics: Gen>Audio (LLM+TTS), Pre-buffer, Gen>Ear (user-perceived)
+- Reset per turn in `HandleAgentResponseStarted()`
+- `DrawLatencyHUD()` separate from `DrawDebugHUD()`
+
+### Future: Server-Side Latency from ElevenLabs API
+**TODO — high-value improvement parked for later:**
+- `GET /v1/convai/conversations/{conversation_id}` returns:
+  - `conversation_turn_metrics` with `elapsed_time` per metric (STT, LLM, TTS breakdown!)
+  - `tool_latency_secs`, `step_latency_secs`, `rag_latency_secs`
+  - `time_in_call_secs` per message
+- `ping` WS event has `ping_ms` (network round-trip) — could display on HUD
+- `vad_score` WS event (0.0-1.0) — could detect real speech start client-side
+- Docs: https://elevenlabs.io/docs/api-reference/conversations/get
+
+### Multi-Player Shared Agent — Key Design
+- **Old model**: exclusive lock (one player per agent via `NetConversatingPawn`)
+- **New model**: shared array (`NetConnectedPawns`) + active speaker (`NetActiveSpeakerPawn`)
+- Speaker arbitration: server-side with `SpeakerSwitchHysteresis` (0.3s) + `SpeakerIdleTimeout` (3.0s)
+- In standalone (≤1 player): speaker arbitration bypassed, audio sent directly to WebSocket
+- Internal mic (WASAPI thread): direct WebSocket send, no game-thread state access
+- `GetCurrentBlendshapes()` thread-safe via `ThreadSafeBlendshapes` snapshot + `BlendshapeLock`

 ## Key UE5 Plugin Patterns
 - Settings object: `UCLASS(config=Engine, defaultconfig)` inheriting `UObject`, registered via `ISettingsModule`
- Module startup: `NewObject<USettings>(..., RF_Standalone)` + `AddToRoot()`
 - WebSocket: `FWebSocketsModule::Get().CreateWebSocket(URL, TEXT(""), Headers)`
-  - `WebSockets` is a **module** (Build.cs only) — NOT a plugin, don't put it in `.uplugin`
- Audio capture: `Audio::FAudioCapture::OpenAudioCaptureStream()` (UE 5.3+, replaces deprecated `OpenCaptureStream`)
-  - `AudioCapture` IS a plugin — declare it in `.uplugin` Plugins array
-  - Callback type: `FOnAudioCaptureFunction` = `TFunction<void(const void*, int32, int32, int32, double, bool)>`
-  - Cast `const void*` to `const float*` inside — device sends float32 interleaved
- Procedural audio playback: `USoundWaveProcedural` + `OnSoundWaveProceduralUnderflow` delegate
- Audio capture callbacks arrive on a **background thread** — always marshal to game thread with `AsyncTask(ENamedThreads::GameThread, ...)`
+- Audio capture: `Audio::FAudioCapture::OpenAudioCaptureStream()` (UE 5.3+)
+  - Callback arrives on **background thread** — marshal to game thread
+- Procedural audio playback: `USoundWaveProcedural` + `OnSoundWaveProceduralUnderflow`
 - Resample mic audio to **16000 Hz mono** before sending to ElevenLabs
 - `TArray::RemoveAt(idx, count, EAllowShrinking::No)` — bool overload deprecated in UE 5.5

-## Plugin Status
- **PS_AI_Agent_ElevenLabs**: compiles cleanly on UE 5.5 Win64 (verified 2026-02-19)
- v1.5.0 — mic audio chunk size fixed: WASAPI 5ms callbacks accumulated to 100ms before sending
- v1.4.0 — push-to-talk fully fixed: bAutoStartListening now ignored in Client turn mode
- Binary WS frame handling implemented (ElevenLabs sends ALL frames as binary, not text)
- First-byte discrimination: `{` = JSON control message, else = raw PCM audio
- `SendTextMessage()` added to both WebSocketProxy and ConversationalAgentComponent
- `conversation_initiation_client_data` now sent immediately on WS connect (required for mic + latency)
-
-## Audio Chunk Size — CRITICAL
- WASAPI fires mic callbacks every ~5ms → **158 bytes** at 16kHz 16-bit mono
- ElevenLabs VAD/STT requires **≥3200 bytes (100ms)** per chunk; smaller chunks are silently ignored
- Fix: `MicAccumulationBuffer` in component accumulates chunks; sends only when `>= MicChunkMinBytes` (3200)
- `StopListening()` flushes remainder so final partial chunk is never dropped before end-of-turn
-
 ## ElevenLabs WebSocket Protocol Notes
- **ALL frames are binary** — bind ONLY `OnRawMessage`; NEVER bind `OnMessage` (text) — UE fires both for same frame → double audio bug
+- **ALL frames are binary** — bind ONLY `OnRawMessage`; NEVER bind `OnMessage` (text)
 - Binary frame discrimination: peek byte[0] → `'{'` (0x7B) = JSON, else = raw PCM audio
- Fragment reassembly: accumulate into `BinaryFrameBuffer` until `BytesRemaining == 0`
 - Pong: `{"type":"pong","event_id":N}` — `event_id` is **top-level**, NOT nested
- Transcript: type=`user_transcript`, key=`user_transcription_event`, field=`user_transcript`
- Client turn mode (`client_vad`): send `user_activity` **with every audio chunk** (not just once) — server needs continuous signal to know user is speaking; stopping chunks = silence detected = agent responds
- Text input: `{"type":"user_message","text":"..."}` — agent replies with audio + text
- **MUST send `conversation_initiation_client_data` immediately after WS connect** — without it, server won't process client audio (mic appears dead)
- `conversation_initiation_client_data` payload: `conversation_config_override.agent.turn.mode`, `conversation_config_override.tts.optimize_streaming_latency`, `custom_llm_extra_body.enable_intermediate_response`
- `enable_intermediate_response: true` in `custom_llm_extra_body` reduces time-to-first-audio latency
+- `user_transcript` arrives AFTER `agent_response_started` in Server VAD mode
+- **MUST send `conversation_initiation_client_data` immediately after WS connect**

 ## API Keys / Secrets
- ElevenLabs API key is set in **Project Settings → Plugins → ElevenLabs AI Agent** in the Editor
- UE saves it to `DefaultEngine.ini` under `[/Script/PS_AI_Agent_ElevenLabs.ElevenLabsSettings]`
- **The key is stripped from `DefaultEngine.ini` before every commit** — do not commit it
- Each developer sets the key locally; it does not go in git
+- ElevenLabs API key: **Project Settings → Plugins → ElevenLabs AI Agent**
+- Saved to `DefaultEngine.ini` — **stripped before every commit**

 ## Claude Memory Files in This Repo
 | File | Contents |
 |------|----------|
 | `.claude/MEMORY.md` | This file — project structure, patterns, status |
 | `.claude/elevenlabs_plugin.md` | Plugin file map, ElevenLabs WS protocol, design decisions |
-| `.claude/elevenlabs_api_reference.md` | Full ElevenLabs API reference (WS messages, REST, signed URL, Agent ID location) |
+| `.claude/elevenlabs_api_reference.md` | Full ElevenLabs API reference (WS messages, REST, signed URL) |
 | `.claude/project_context.md` | Original ask, intent, short/long-term goals |
-| `.claude/session_log_2026-02-19.md` | Full session record: steps, commits, technical decisions, next steps |
+| `.claude/session_log_2026-02-19.md` | Session record: steps, commits, technical decisions |
 | `.claude/PS_AI_Agent_ElevenLabs_Documentation.md` | User-facing Markdown reference doc |
--- a/Unreal/PS_AI_Agent/Plugins/PS_AI_ConvAgent/Source/PS_AI_ConvAgent/Private/PS_AI_ConvAgent_ElevenLabsComponent.cpp
+++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_ConvAgent/Source/PS_AI_ConvAgent/Private/PS_AI_ConvAgent_ElevenLabsComponent.cpp
@ -26,6 +26,12 @@ static TAutoConsoleVariable<int32> CVarDebugElevenLabs(
 	TEXT("Debug HUD for ElevenLabs. -1=use property, 0=off, 1-3=verbosity."),
 	ECVF_Default);

+static TAutoConsoleVariable<int32> CVarDebugLatency(
+	TEXT("ps.ai.ConvAgent.Debug.Latency"),
+	-1,
+	TEXT("Latency debug HUD. -1=use property, 0=off, 1=on."),
+	ECVF_Default);
+
 // ─────────────────────────────────────────────────────────────────────────────
 // Constructor
 // ─────────────────────────────────────────────────────────────────────────────
@ -160,9 +166,13 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::TickComponent(float DeltaTime, ELevel
 					AudioPlaybackComponent->Play();
 				}
 				PlaybackStartTime = FPlatformTime::Seconds();
-				if (bDebug && TurnEndTime > 0.0)
+				if (GenerationStartTime > 0.0)
 				{
-					LastLatencies.EndToEarMs = static_cast<float>((PlaybackStartTime - TurnEndTime) * 1000.0);
+					CurrentLatencies.GenToEarMs = static_cast<float>((PlaybackStartTime - GenerationStartTime) * 1000.0);
+				}
+				if (PreBufferStartTime > 0.0)
+				{
+					CurrentLatencies.PreBufferMs = static_cast<float>((PlaybackStartTime - PreBufferStartTime) * 1000.0);
 				}
 				OnAudioPlaybackStarted.Broadcast();
 			}
@ -308,6 +318,14 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::TickComponent(float DeltaTime, ELevel
 			DrawDebugHUD();
 		}
 	}
+	{
+		const int32 CVarVal = CVarDebugLatency.GetValueOnGameThread();
+		const bool bShowLatency = (CVarVal >= 0) ? (CVarVal > 0) : bDebugLatency;
+		if (bShowLatency)
+		{
+			DrawLatencyHUD();
+		}
+	}
 }

 // ─────────────────────────────────────────────────────────────────────────────
@ -576,6 +594,7 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::StopListening()
 	}

 	TurnEndTime = FPlatformTime::Seconds();
+
 	// Start the response timeout clock — but only when the server hasn't already started
 	// generating. When StopListening() is called from HandleAgentResponseStarted() as part
 	// of collision avoidance, bAgentGenerating is already true, meaning the server IS already
@ -1057,14 +1076,26 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::HandleAgentResponseStarted()
 	}

 	const double Now = FPlatformTime::Seconds();
-	GenerationStartTime = Now;
-	if (bDebug && TurnEndTime > 0.0)
+
+	// --- Latency reset for this new response cycle ---
+	// In Server VAD mode, StopListening() is not called — the server detects
+	// end of user speech and immediately starts generating. If TurnEndTime was
+	// not set by StopListening since the last generation (i.e. it's stale or 0),
+	// use Now as the best client-side approximation.
+	const bool bFreshTurnEnd = (TurnEndTime > GenerationStartTime) && (GenerationStartTime > 0.0);
+	if (!bFreshTurnEnd)
 	{
-		LastLatencies.STTToGenMs = static_cast<float>((Now - TurnEndTime) * 1000.0);
+		TurnEndTime = Now;
 	}

+	// Reset all latency measurements — new response cycle starts here.
+	// All metrics are anchored to GenerationStartTime (= now), which is the closest
+	// client-side proxy for "user stopped speaking" in Server VAD mode.
+	CurrentLatencies = FDebugLatencies();
+	GenerationStartTime = Now;
+
 	const double T = Now - SessionStartTime;
-	const double LatencyFromTurnEnd = TurnEndTime > 0.0 ? Now - TurnEndTime : 0.0;
+	const double LatencyFromTurnEnd = Now - TurnEndTime;
 	if (bIsListening)
 	{
 		// In Server VAD + interruption mode, keep the mic open so the server can
@ -1341,6 +1372,10 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::EnqueueAgentAudio(const TArray<uint8>
 		bQueueWasDry = false;
 		SilentTickCount = 0;

+		// Latency capture (always, for HUD display).
+		if (GenerationStartTime > 0.0)
+			CurrentLatencies.GenToAudioMs = static_cast<float>((AgentSpeakStart - GenerationStartTime) * 1000.0);
+
 		if (bDebug)
 		{
 			const double T = AgentSpeakStart - SessionStartTime;
@ -1348,12 +1383,6 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::EnqueueAgentAudio(const TArray<uint8>
 			UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Log,
 				TEXT("[T+%.2fs] [Turn %d] Agent speaking — first audio chunk. (%.2fs after turn end)"),
 				T, LastClosedTurnIndex, LatencyFromTurnEnd);
-
-			// Update latency snapshot for HUD display.
-			if (TurnEndTime > 0.0)
-				LastLatencies.TotalMs = static_cast<float>((AgentSpeakStart - TurnEndTime) * 1000.0);
-			if (GenerationStartTime > 0.0)
-				LastLatencies.GenToAudioMs = static_cast<float>((AgentSpeakStart - GenerationStartTime) * 1000.0);
 		}

 		OnAgentStartedSpeaking.Broadcast();
@ -1386,10 +1415,11 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::EnqueueAgentAudio(const TArray<uint8>
 				AudioPlaybackComponent->Play();
 			}
 			PlaybackStartTime = FPlatformTime::Seconds();
-			if (bDebug && TurnEndTime > 0.0)
+			if (GenerationStartTime > 0.0)
 			{
-				LastLatencies.EndToEarMs = static_cast<float>((PlaybackStartTime - TurnEndTime) * 1000.0);
+				CurrentLatencies.GenToEarMs = static_cast<float>((PlaybackStartTime - GenerationStartTime) * 1000.0);
 			}
+			// No pre-buffer in this path (immediate playback).
 			OnAudioPlaybackStarted.Broadcast();
 		}
 	}
@ -1417,9 +1447,13 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::EnqueueAgentAudio(const TArray<uint8>
 				AudioPlaybackComponent->Play();
 			}
 			PlaybackStartTime = FPlatformTime::Seconds();
-			if (bDebug && TurnEndTime > 0.0)
+			if (GenerationStartTime > 0.0)
 			{
-				LastLatencies.EndToEarMs = static_cast<float>((PlaybackStartTime - TurnEndTime) * 1000.0);
+				CurrentLatencies.GenToEarMs = static_cast<float>((PlaybackStartTime - GenerationStartTime) * 1000.0);
+			}
+			if (PreBufferStartTime > 0.0)
+			{
+				CurrentLatencies.PreBufferMs = static_cast<float>((PlaybackStartTime - PreBufferStartTime) * 1000.0);
 			}
 			OnAudioPlaybackStarted.Broadcast();
 		}
@ -2362,19 +2396,45 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::DrawDebugHUD() const
 				NetConnectedPawns.Num(), *SpeakerName));
 	}

-	// Latencies (from last completed turn)
-	if (LastLatencies.TotalMs > 0.0f)
-	{
-		GEngine->AddOnScreenDebugMessage(BaseKey + 8, DisplayTime, MainColor,
-			FString::Printf(TEXT("  Latency: total=%.0fms (stt>gen=%.0fms gen>audio=%.0fms) ear=%.0fms"),
-				LastLatencies.TotalMs, LastLatencies.STTToGenMs,
-				LastLatencies.GenToAudioMs, LastLatencies.EndToEarMs));
-	}
-
 	// Reconnection
-	GEngine->AddOnScreenDebugMessage(BaseKey + 9, DisplayTime,
+	GEngine->AddOnScreenDebugMessage(BaseKey + 8, DisplayTime,
 		bWantsReconnect ? FColor::Red : MainColor,
 		FString::Printf(TEXT("  Reconnect: %d/%d attempts%s"),
 			ReconnectAttemptCount, MaxReconnectAttempts,
 			bWantsReconnect ? TEXT(" (ACTIVE)") : TEXT("")));
 }
+
+void UPS_AI_ConvAgent_ElevenLabsComponent::DrawLatencyHUD() const
+{
+	if (!GEngine) return;
+
+	// Separate BaseKey range so it never collides with DrawDebugHUD
+	const int32 BaseKey = 93700;
+	const float DisplayTime = 1.0f; // long enough to avoid flicker between ticks
+
+	const FColor TitleColor  = FColor::Cyan;
+	const FColor ValueColor  = FColor::White;
+	const FColor HighlightColor = FColor::Yellow;
+
+	// Helper: format a single metric — shows "---" when not yet captured this turn
+	auto Fmt = [](float Ms) -> FString
+	{
+		return (Ms > 0.0f) ? FString::Printf(TEXT("%.0f ms"), Ms) : FString(TEXT("---"));
+	};
+
+	// Title — all times measured from agent_response_started
+	GEngine->AddOnScreenDebugMessage(BaseKey, DisplayTime, TitleColor,
+		TEXT("=== Latency (from gen start) ==="));
+
+	// 1. Gen → Audio: generation start → first audio chunk (LLM + TTS)
+	GEngine->AddOnScreenDebugMessage(BaseKey + 1, DisplayTime, ValueColor,
+		FString::Printf(TEXT("  Gen>Audio:      %s"), *Fmt(CurrentLatencies.GenToAudioMs)));
+
+	// 2. Pre-buffer wait before playback
+	GEngine->AddOnScreenDebugMessage(BaseKey + 2, DisplayTime, ValueColor,
+		FString::Printf(TEXT("  Pre-buffer:     %s"), *Fmt(CurrentLatencies.PreBufferMs)));
+
+	// 3. Gen → Ear: generation start → playback starts (user-perceived total)
+	GEngine->AddOnScreenDebugMessage(BaseKey + 3, DisplayTime, HighlightColor,
+		FString::Printf(TEXT("  Gen>Ear:        %s"), *Fmt(CurrentLatencies.GenToEarMs)));
+}
--- a/Unreal/PS_AI_Agent/Plugins/PS_AI_ConvAgent/Source/PS_AI_ConvAgent/Public/PS_AI_ConvAgent_ElevenLabsComponent.h
+++ b/Unreal/PS_AI_Agent/Plugins/PS_AI_ConvAgent/Source/PS_AI_ConvAgent/Public/PS_AI_ConvAgent_ElevenLabsComponent.h
@ -231,6 +231,11 @@ public:
 		meta = (ClampMin = "0", ClampMax = "3", EditCondition = "bDebug"))
 	int32 DebugVerbosity = 1;

+	/** Show a separate latency debug HUD with detailed per-turn timing breakdown.
+	 *  Independent from bDebug — can be enabled alone via CVar ps.ai.ConvAgent.Debug.Latency. */
+	UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "PS AI ConvAgent|Debug")
+	bool bDebugLatency = false;
+
 	// ── Events ────────────────────────────────────────────────────────────────

 	/** Fired when the WebSocket connection is established and the conversation session is ready. Provides the ConversationID and AgentID. */
@ -635,16 +640,17 @@ private:
 	double GenerationStartTime = 0.0; // Set in HandleAgentResponseStarted — server starts generating.
 	double PlaybackStartTime   = 0.0; // Set when audio playback actually starts (post pre-buffer).

-	// Last-turn latency snapshot (ms) — updated per turn, displayed on debug HUD.
-	// Persists between turns so the HUD always shows the most recent measurement.
+	// Current-turn latency measurements (ms). Reset in HandleAgentResponseStarted.
+	// All anchored to GenerationStartTime (agent_response_started event), which is
+	// the closest client-side proxy for "user stopped speaking" in Server VAD mode.
+	// Zero means "not yet measured this turn".
 	struct FDebugLatencies
 	{
-		float STTToGenMs   = 0.0f; // TurnEnd → server starts generating
-		float GenToAudioMs = 0.0f; // Server generating → first audio chunk
-		float TotalMs      = 0.0f; // TurnEnd → first audio chunk
-		float EndToEarMs   = 0.0f; // TurnEnd → audio playback starts (user-perceived)
+		float GenToAudioMs  = 0.0f; // agent_response_started → first audio chunk (LLM + TTS)
+		float PreBufferMs   = 0.0f; // Pre-buffer wait before playback starts
+		float GenToEarMs    = 0.0f; // agent_response_started → playback starts (user-perceived)
 	};
-	FDebugLatencies LastLatencies;
+	FDebugLatencies CurrentLatencies;

 	// Accumulates incoming PCM bytes until the audio component needs data.
 	// Uses a read offset instead of RemoveAt(0,N) to avoid O(n) memmove every
@ -747,4 +753,5 @@ private:

 	/** Draw on-screen debug info (called from TickComponent when bDebug). */
 	void DrawDebugHUD() const;
+	void DrawLatencyHUD() const;
 };
Author	SHA1	Message	Date
j.foucher	2169c58cd7	Update memory: latency HUD status + future server-side metrics TODO Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 18:47:11 +01:00
j.foucher	5fad6376bc	Fix latency HUD: anchor all metrics to agent_response_started user_transcript arrives AFTER agent_response_started in Server VAD mode (the server detects end of speech via VAD, starts generating immediately, and STT completes later). This caused Transcript>Gen to show stale values (19s) and Total < Gen>Audio (impossible). Now all metrics are anchored to GenerationStartTime (agent_response_started), which is the closest client-side proxy for "user stopped speaking": - Gen>Audio: generation start → first audio chunk (LLM + TTS) - Pre-buffer: wait before playback - Gen>Ear: generation start → playback starts (user-perceived) Removed STTToGenMs, TotalMs, EndToEarMs, UserSpeechMs (all depended on unreliable timestamps). Simpler, always correct, 3 clear metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 18:39:49 +01:00
j.foucher	0b190c3149	Rework latency HUD: base all measurements on first user_transcript Previous approach used TurnEndTime (from StopListening) which was never set in Server VAD mode. Now all latency measurements are anchored to TurnStartTime, captured when the first user_transcript arrives from the ElevenLabs server — the earliest client-side confirmation of user speech. Timeline: [user speaks] → STT → user_transcript(=T0) → agent_response_started → audio → playback Metrics shown: - Transcript>Gen: T0 → generation start (LLM think time) - Gen>Audio: generation start → first audio chunk (LLM + TTS) - Total: T0 → first audio chunk (full pipeline) - Pre-buffer: wait before playback - End-to-Ear: T0 → playback starts (user-perceived) Removed UserSpeechMs (unmeasurable without client-side VAD). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 18:32:59 +01:00
j.foucher	56b072c45e	Fix UserSpeechMs growing indefinitely in Server VAD mode TurnStartTime was only set in StartListening(), which is called once. In Server VAD + interruption mode the mic stays open, so TurnStartTime was never updated between turns. Now reset TurnStartTime when the agent stops speaking (normal end + interruption), marking the start of the next potential user turn. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 18:27:14 +01:00
j.foucher	a8dd5a022f	Fix latency HUD showing --- in Server VAD mode Move latency reset from StopListening to HandleAgentResponseStarted. In Server VAD + interruption mode, StopListening is never called so TurnEndTime stayed at 0 and all dependent metrics showed ---. Now HandleAgentResponseStarted detects whether StopListening provided a fresh TurnEndTime; if not (Server VAD), it uses Now as approximation. Also fix DisplayTime from 0 to 1s to prevent HUD flicker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 18:22:28 +01:00
j.foucher	955c97e0dd	Improve latency debug HUD: separate toggle, per-turn reset, multi-line display - Add bDebugLatency property + CVar (ps.ai.ConvAgent.Debug.Latency) independent from bDebug to save HUD space - Reset latencies to zero each turn (StopListening) instead of persisting - Add UserSpeechMs and PreBufferMs to the latency struct - Move latency captures outside bDebug guard (always measured) - Replace single-line latency in DrawDebugHUD with dedicated DrawLatencyHUD() showing 6 metrics on separate lines with color coding Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 18:06:39 +01:00