Reliability & perf: atomic flags, WebSocket reconnect, zero-alloc eye curves, audio queue read-offset
- Make bAgentGenerating, bWaitingForAgentResponse, bWaitingForResponse, bFirstAudioResponseLogged, bAgentResponseStartedFired (std::atomic<bool>) and LastInterruptEventId (std::atomic<int32>) for thread-safety - Add WebSocket auto-reconnection with exponential backoff (1s→30s cap, max 5 attempts), distinguishing intentional vs unexpected disconnects - Add FillCurrentEyeCurves() zero-allocation method using FindOrAdd() to eliminate per-frame TMap heap allocation in anim thread - Replace AudioQueue RemoveAt(0,N) with read-offset pattern — O(1) per underflow callback, periodic compaction when offset > half buffer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
8886e7a7a2
commit
35d217f6ec
@ -1,359 +0,0 @@
|
|||||||
seem to less stuck but timeout often : LogDebuggerCommands: Repeating last play command: Selected Viewport
|
|
||||||
LogPlayLevel: PlayLevel: No blueprints needed recompiling
|
|
||||||
LogPlayLevel: Creating play world package: /Game/UEDPIE_0_TestMap
|
|
||||||
LogPlayLevel: PIE: StaticDuplicateObject took: (0.005478s)
|
|
||||||
LogPlayLevel: PIE: Created PIE world by copying editor world from /Game/TestMap.TestMap to /Game/UEDPIE_0_TestMap.TestMap (0.005520s)
|
|
||||||
LogUObjectHash: Compacting FUObjectHashTables data took 0.64ms
|
|
||||||
LogChaosDD: Creating Chaos Debug Draw Scene for world TestMap
|
|
||||||
LogPlayLevel: PIE: World Init took: (0.001821s)
|
|
||||||
LogAudio: Display: Creating Audio Device: Id: 4, Scope: Unique, Realtime: True
|
|
||||||
LogAudioMixer: Display: Audio Mixer Platform Settings:
|
|
||||||
LogAudioMixer: Display: Sample Rate: 48000
|
|
||||||
LogAudioMixer: Display: Callback Buffer Frame Size Requested: 1024
|
|
||||||
LogAudioMixer: Display: Callback Buffer Frame Size To Use: 1024
|
|
||||||
LogAudioMixer: Display: Number of buffers to queue: 1
|
|
||||||
LogAudioMixer: Display: Max Channels (voices): 32
|
|
||||||
LogAudioMixer: Display: Number of Async Source Workers: 4
|
|
||||||
LogAudio: Display: AudioDevice MaxSources: 32
|
|
||||||
LogAudio: Display: Audio Spatialization Plugin: None (built-in).
|
|
||||||
LogAudio: Display: Audio Reverb Plugin: None (built-in).
|
|
||||||
LogAudio: Display: Audio Occlusion Plugin: None (built-in).
|
|
||||||
LogAudioMixer: Display: Initializing audio mixer using platform API: 'XAudio2'
|
|
||||||
LogAudioMixer: Display: Using Audio Hardware Device Speakers (Realtek(R) Audio)
|
|
||||||
LogAudioMixer: Display: Initializing Sound Submixes...
|
|
||||||
LogAudioMixer: Display: Creating Master Submix 'MasterSubmixDefault'
|
|
||||||
LogAudioMixer: Display: Creating Master Submix 'MasterReverbSubmixDefault'
|
|
||||||
LogAudioMixer: FMixerPlatformXAudio2::StartAudioStream() called. InstanceID=4
|
|
||||||
LogAudioMixer: Display: Output buffers initialized: Frames=1024, Channels=2, Samples=2048, InstanceID=4
|
|
||||||
LogAudioMixer: Display: Starting AudioMixerPlatformInterface::RunInternal(), InstanceID=4
|
|
||||||
LogAudioMixer: Display: FMixerPlatformXAudio2::SubmitBuffer() called for the first time. InstanceID=4
|
|
||||||
LogInit: FAudioDevice initialized with ID 4.
|
|
||||||
LogAudio: Display: Audio Device (ID: 4) registered with world 'TestMap'.
|
|
||||||
LogAudioMixer: Initializing Audio Bus Subsystem for audio device with ID 4
|
|
||||||
LogLoad: Game class is 'GameModeBase'
|
|
||||||
LogWorld: Bringing World /Game/UEDPIE_0_TestMap.TestMap up for play (max tick rate 60) at 2026.02.20-17.38.08
|
|
||||||
LogWorld: Bringing up level for play took: 0.000919
|
|
||||||
LogOnline: OSS: Created online subsystem instance for: :Context_3
|
|
||||||
LogElevenLabsWS: Connecting to ElevenLabs: wss://api.elevenlabs.io/v1/convai/conversation?agent_id=agent_5301kc1qkq49fn2av43nrbsar65k
|
|
||||||
PIE: Server logged in
|
|
||||||
PIE: Play in editor total start time 0,075 seconds.
|
|
||||||
LogElevenLabsWS: WebSocket connected. Sending conversation_initiation_client_data...
|
|
||||||
LogElevenLabsWS: Sending initiation: {
|
|
||||||
"type": "conversation_initiation_client_data",
|
|
||||||
"conversation_config_override":
|
|
||||||
{
|
|
||||||
"agent":
|
|
||||||
{
|
|
||||||
"turn":
|
|
||||||
{
|
|
||||||
"turn_timeout": 1
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"tts":
|
|
||||||
{
|
|
||||||
"optimize_streaming_latency": 3
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"custom_llm_extra_body":
|
|
||||||
{
|
|
||||||
"enable_intermediate_response": true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
LogElevenLabsWS: Received message type: conversation_initiation_metadata
|
|
||||||
LogElevenLabsWS: [T+0.00s] Conversation initiated. ID=conv_2901khxymx7xed88dq8tw35b2pfk
|
|
||||||
LogElevenLabsAgent: [T+0.00s] Agent connected. ConversationID=conv_2901khxymx7xed88dq8tw35b2pfk
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] Connected
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Listening
|
|
||||||
LogElevenLabsWS: [T+2.18s] User turn started — mic open, audio chunks will follow.
|
|
||||||
LogAudioCaptureCore: Display: WasapiCapture AudioFormat SampeRate: 48000, BitDepth: 32-Bit Floating Point
|
|
||||||
LogElevenLabsMic: Capture device: Microphone Array (Intel® Smart Sound Technology for Digital Microphones) | Rate=48000 | Channels=2
|
|
||||||
LogElevenLabsMic: Audio capture started.
|
|
||||||
LogElevenLabsAgent: [T+2.18s] [Turn 1] Mic opened — user speaking.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Listening
|
|
||||||
LogElevenLabsMic: Audio capture stopped.
|
|
||||||
LogElevenLabsWS: [T+4.60s] User turn ended — server VAD silence detection started (turn_timeout=1s).
|
|
||||||
LogElevenLabsAgent: [T+4.60s] [Turn 1] Mic closed — user spoke 2.42s. Waiting for server response (timeout 10s)...
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: [T+5.66s] Agent started generating (1061 ms after turn end — includes VAD silence timeout + LLM start).
|
|
||||||
LogElevenLabsAgent: [T+5.66s] [Turn 1] Agent generating. (1.06s after turn end)
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Agent Start Generating
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Warning: [T+7.95s] [LATENCY] First audio: 3346 ms after turn end (3346 ms after last chunk)
|
|
||||||
LogElevenLabsAgent: [T+7.95s] [Turn 1] Agent speaking — first audio chunk. (3.35s after turn end)
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Speaking
|
|
||||||
LogElevenLabsWS: Received message type: user_transcript
|
|
||||||
LogElevenLabsWS: Warning: [T+7.95s] [LATENCY] User transcript: 3346 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Transcript : Hello, how are you?
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Received message type: agent_response
|
|
||||||
LogElevenLabsWS: Warning: [T+9.60s] [LATENCY] Agent text response: 4994 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Text Response : Hello! I'm doing well, thank you for asking. How can I assist you today?
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Speaking
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Listening
|
|
||||||
LogElevenLabsWS: [T+18.84s] User turn started — mic open, audio chunks will follow.
|
|
||||||
LogAudioCaptureCore: Display: WasapiCapture AudioFormat SampeRate: 48000, BitDepth: 32-Bit Floating Point
|
|
||||||
LogElevenLabsMic: Capture device: Microphone Array (Intel® Smart Sound Technology for Digital Microphones) | Rate=48000 | Channels=2
|
|
||||||
LogElevenLabsMic: Audio capture started.
|
|
||||||
LogElevenLabsAgent: [T+18.84s] [Turn 2] Mic opened — user speaking.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Listening
|
|
||||||
LogElevenLabsMic: Audio capture stopped.
|
|
||||||
LogElevenLabsWS: [T+21.67s] User turn ended — server VAD silence detection started (turn_timeout=1s).
|
|
||||||
LogElevenLabsAgent: [T+21.67s] [Turn 2] Mic closed — user spoke 2.83s. Waiting for server response (timeout 10s)...
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: [T+22.45s] Agent started generating (777 ms after turn end — includes VAD silence timeout + LLM start).
|
|
||||||
LogElevenLabsAgent: [T+22.45s] [Turn 2] Agent generating. (0.78s after turn end)
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Agent Start Generating
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Warning: [T+23.85s] [LATENCY] First audio: 2180 ms after turn end (2180 ms after last chunk)
|
|
||||||
LogElevenLabsAgent: [T+23.85s] [Turn 2] Agent speaking — first audio chunk. (2.18s after turn end)
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Speaking
|
|
||||||
LogElevenLabsWS: Received message type: user_transcript
|
|
||||||
LogElevenLabsWS: Warning: [T+23.90s] [LATENCY] User transcript: 2227 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Transcript : Nothing special, just talking.
|
|
||||||
LogElevenLabsAgent: Warning: [Turn 2] Agent silence hard-timeout (2s) without agent_response — declaring agent stopped.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Speaking
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsAgent: [T+28.55s] [Turn 2] Agent speaking — first audio chunk. (6.88s after turn end)
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Speaking
|
|
||||||
LogElevenLabsWS: Received message type: agent_response
|
|
||||||
LogElevenLabsWS: Warning: [T+28.58s] [LATENCY] Agent text response: 6910 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Text Response : That's perfectly fine! I'm here to chat if you have any questions or just want to talk. What's on your mind?
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Speaking
|
|
||||||
LogEOSSDK: LogEOS: Updating Product SDK Config, Time: 347.575653
|
|
||||||
LogEOSSDK: LogEOS: SDK Config Product Update Request Completed - No Change
|
|
||||||
LogEOSSDK: LogEOS: ScheduleNextSDKConfigDataUpdate - Time: 347.774902, Update Interval: 340.359497
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Listening
|
|
||||||
LogElevenLabsWS: [T+41.83s] User turn started — mic open, audio chunks will follow.
|
|
||||||
LogAudioCaptureCore: Display: WasapiCapture AudioFormat SampeRate: 48000, BitDepth: 32-Bit Floating Point
|
|
||||||
LogElevenLabsMic: Capture device: Microphone Array (Intel® Smart Sound Technology for Digital Microphones) | Rate=48000 | Channels=2
|
|
||||||
LogElevenLabsMic: Audio capture started.
|
|
||||||
LogElevenLabsAgent: [T+41.83s] [Turn 3] Mic opened — user speaking.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Listening
|
|
||||||
LogElevenLabsMic: Audio capture stopped.
|
|
||||||
LogElevenLabsWS: [T+44.86s] User turn ended — server VAD silence detection started (turn_timeout=1s).
|
|
||||||
LogElevenLabsAgent: [T+44.86s] [Turn 3] Mic closed — user spoke 3.03s. Waiting for server response (timeout 10s)...
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: [T+45.20s] Agent started generating (343 ms after turn end — includes VAD silence timeout + LLM start).
|
|
||||||
LogElevenLabsAgent: [T+45.20s] [Turn 3] Agent generating. (0.34s after turn end)
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Agent Start Generating
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Warning: [T+48.22s] [LATENCY] First audio: 3361 ms after turn end (3361 ms after last chunk)
|
|
||||||
LogElevenLabsAgent: [T+48.22s] [Turn 3] Agent speaking — first audio chunk. (3.36s after turn end)
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Speaking
|
|
||||||
LogElevenLabsWS: Received message type: user_transcript
|
|
||||||
LogElevenLabsWS: Warning: [T+48.22s] [LATENCY] User transcript: 3361 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Transcript : I wanna... Hi agent!
|
|
||||||
LogElevenLabsWS: Received message type: agent_response
|
|
||||||
LogElevenLabsWS: Warning: [T+48.25s] [LATENCY] Agent text response: 3393 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Text Response : Hello! It's good to hear from you. How can I help you today?
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Speaking
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Listening
|
|
||||||
LogElevenLabsWS: [T+57.55s] User turn started — mic open, audio chunks will follow.
|
|
||||||
LogAudioCaptureCore: Display: WasapiCapture AudioFormat SampeRate: 48000, BitDepth: 32-Bit Floating Point
|
|
||||||
LogElevenLabsMic: Capture device: Microphone Array (Intel® Smart Sound Technology for Digital Microphones) | Rate=48000 | Channels=2
|
|
||||||
LogElevenLabsMic: Audio capture started.
|
|
||||||
LogElevenLabsAgent: [T+57.55s] [Turn 4] Mic opened — user speaking.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Listening
|
|
||||||
LogElevenLabsMic: Audio capture stopped.
|
|
||||||
LogElevenLabsWS: [T+60.33s] User turn ended — server VAD silence detection started (turn_timeout=1s).
|
|
||||||
LogElevenLabsAgent: [T+60.33s] [Turn 4] Mic closed — user spoke 2.78s. Waiting for server response (timeout 10s)...
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: [T+60.39s] Agent started generating (61 ms after turn end — includes VAD silence timeout + LLM start).
|
|
||||||
LogElevenLabsAgent: [T+60.39s] [Turn 4] Agent generating. (0.06s after turn end)
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Agent Start Generating
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Warning: [T+62.44s] [LATENCY] First audio: 2111 ms after turn end (2111 ms after last chunk)
|
|
||||||
LogElevenLabsAgent: [T+62.44s] [Turn 4] Agent speaking — first audio chunk. (2.11s after turn end)
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Speaking
|
|
||||||
LogElevenLabsWS: Received message type: user_transcript
|
|
||||||
LogElevenLabsWS: Warning: [T+62.44s] [LATENCY] User transcript: 2112 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Transcript : Do you speak French?
|
|
||||||
LogElevenLabsAgent: Warning: [Turn 4] Agent silence hard-timeout (2s) without agent_response — declaring agent stopped.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Speaking
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsAgent: [T+66.24s] [Turn 4] Agent speaking — first audio chunk. (5.91s after turn end)
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Speaking
|
|
||||||
LogElevenLabsWS: Received message type: agent_response
|
|
||||||
LogElevenLabsWS: Warning: [T+66.27s] [LATENCY] Agent text response: 5944 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Text Response : Yes, I can communicate in French. Would you like to ask me something in French or perhaps practice a bit?
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Speaking
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Listening
|
|
||||||
LogElevenLabsWS: [T+74.14s] User turn started — mic open, audio chunks will follow.
|
|
||||||
LogAudioCaptureCore: Display: WasapiCapture AudioFormat SampeRate: 48000, BitDepth: 32-Bit Floating Point
|
|
||||||
LogElevenLabsMic: Capture device: Microphone Array (Intel® Smart Sound Technology for Digital Microphones) | Rate=48000 | Channels=2
|
|
||||||
LogElevenLabsMic: Audio capture started.
|
|
||||||
LogElevenLabsAgent: [T+74.14s] [Turn 5] Mic opened — user speaking.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Listening
|
|
||||||
LogElevenLabsMic: Audio capture stopped.
|
|
||||||
LogElevenLabsWS: [T+76.88s] User turn ended — server VAD silence detection started (turn_timeout=1s).
|
|
||||||
LogElevenLabsAgent: [T+76.88s] [Turn 5] Mic closed — user spoke 2.75s. Waiting for server response (timeout 10s)...
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: [T+77.28s] Agent started generating (393 ms after turn end — includes VAD silence timeout + LLM start).
|
|
||||||
LogElevenLabsAgent: [T+77.28s] [Turn 5] Agent generating. (0.39s after turn end)
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Agent Start Generating
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Warning: [T+78.31s] [LATENCY] First audio: 1428 ms after turn end (1428 ms after last chunk)
|
|
||||||
LogElevenLabsAgent: [T+78.31s] [Turn 5] Agent speaking — first audio chunk. (1.43s after turn end)
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Speaking
|
|
||||||
LogElevenLabsWS: Received message type: user_transcript
|
|
||||||
LogElevenLabsWS: Warning: [T+78.34s] [LATENCY] User transcript: 1460 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Transcript : Yes, tell me something in French.
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Listening
|
|
||||||
LogElevenLabsAgent: StartListening: interrupting agent (speaking) to allow user to speak.
|
|
||||||
LogElevenLabsWS: Sending interrupt — ignoring incoming content until server acks.
|
|
||||||
LogElevenLabsAgent: [T+84.24s] [Turn 5] Agent stopped speaking (spoke 5.93s, full turn round-trip 7.36s).
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Speaking
|
|
||||||
LogElevenLabsWS: [T+84.24s] User turn started — mic open, audio chunks will follow.
|
|
||||||
LogAudioCaptureCore: Display: WasapiCapture AudioFormat SampeRate: 48000, BitDepth: 32-Bit Floating Point
|
|
||||||
LogElevenLabsMic: Capture device: Microphone Array (Intel® Smart Sound Technology for Digital Microphones) | Rate=48000 | Channels=2
|
|
||||||
LogElevenLabsMic: Audio capture started.
|
|
||||||
LogElevenLabsAgent: [T+84.24s] [Turn 6] Mic opened — user speaking.
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Received message type: agent_response
|
|
||||||
LogElevenLabsWS: Warning: [T+84.43s] [LATENCY] Agent text response: 7549 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Text Response : Bien sûr! Je peux vous dire: "Bonjour, comment allez-vous aujourd'hui?" This means "Hello, how are you today?"
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Listening
|
|
||||||
LogElevenLabsMic: Audio capture stopped.
|
|
||||||
LogElevenLabsWS: [T+88.56s] User turn ended — server VAD silence detection started (turn_timeout=1s).
|
|
||||||
LogElevenLabsAgent: [T+88.56s] [Turn 6] Mic closed — user spoke 4.32s. Waiting for server response (timeout 10s)...
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Warning: [T+92.37s] [LATENCY] First audio: 3811 ms after turn end (3811 ms after last chunk)
|
|
||||||
LogElevenLabsWS: Received message type: user_transcript
|
|
||||||
LogElevenLabsWS: Warning: [T+92.37s] [LATENCY] User transcript: 3811 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Transcript : Mais c'est super! Tu parles très bien!
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Received message type: agent_response
|
|
||||||
LogElevenLabsWS: Warning: [T+97.23s] [LATENCY] Agent text response: 8677 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Text Response : Merci beaucoup! Je suis ravie que vous le pensiez. Y a-t-il autre chose que vous aimeriez savoir ou discuter en français?
|
|
||||||
LogElevenLabsAgent: Warning: [T+98.56s] [Turn 6] Response timeout — server did not start generating after 10.0s. Firing OnAgentResponseTimeout.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Listening
|
|
||||||
LogElevenLabsWS: [T+110.60s] User turn started — mic open, audio chunks will follow.
|
|
||||||
LogAudioCaptureCore: Display: WasapiCapture AudioFormat SampeRate: 48000, BitDepth: 32-Bit Floating Point
|
|
||||||
LogElevenLabsMic: Capture device: Microphone Array (Intel® Smart Sound Technology for Digital Microphones) | Rate=48000 | Channels=2
|
|
||||||
LogElevenLabsMic: Audio capture started.
|
|
||||||
LogElevenLabsAgent: [T+110.60s] [Turn 7] Mic opened — user speaking.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Listening
|
|
||||||
LogElevenLabsMic: Audio capture stopped.
|
|
||||||
LogElevenLabsWS: [T+113.82s] User turn ended — server VAD silence detection started (turn_timeout=1s).
|
|
||||||
LogElevenLabsAgent: [T+113.82s] [Turn 7] Mic closed — user spoke 3.22s. Waiting for server response (timeout 10s)...
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Warning: [T+119.37s] [LATENCY] First audio: 5546 ms after turn end (5546 ms after last chunk)
|
|
||||||
LogElevenLabsWS: Received message type: user_transcript
|
|
||||||
LogElevenLabsWS: Warning: [T+119.37s] [LATENCY] User transcript: 5546 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Transcript : And do you speak German?
|
|
||||||
LogElevenLabsWS: Received message type: agent_response
|
|
||||||
LogElevenLabsWS: Warning: [T+119.40s] [LATENCY] Agent text response: 5577 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Text Response : Yes, I can also communicate in German. Would you like me to say something in German for you?
|
|
||||||
LogElevenLabsAgent: Warning: [T+123.83s] [Turn 7] Response timeout — server did not start generating after 10.0s. Firing OnAgentResponseTimeout.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Listening
|
|
||||||
LogElevenLabsWS: [T+131.53s] User turn started — mic open, audio chunks will follow.
|
|
||||||
LogAudioCaptureCore: Display: WasapiCapture AudioFormat SampeRate: 48000, BitDepth: 32-Bit Floating Point
|
|
||||||
LogElevenLabsMic: Capture device: Microphone Array (Intel® Smart Sound Technology for Digital Microphones) | Rate=48000 | Channels=2
|
|
||||||
LogElevenLabsMic: Audio capture started.
|
|
||||||
LogElevenLabsAgent: [T+131.53s] [Turn 8] Mic opened — user speaking.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Listening
|
|
||||||
LogElevenLabsMic: Audio capture stopped.
|
|
||||||
LogElevenLabsWS: [T+134.42s] User turn ended — server VAD silence detection started (turn_timeout=1s).
|
|
||||||
LogElevenLabsAgent: [T+134.42s] [Turn 8] Mic closed — user spoke 2.88s. Waiting for server response (timeout 10s)...
|
|
||||||
LogElevenLabsAgent: Warning: [T+144.42s] [Turn 8] Response timeout — server did not start generating after 10.0s. Firing OnAgentResponseTimeout.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Listening
|
|
||||||
LogElevenLabsWS: [T+148.56s] User turn started — mic open, audio chunks will follow.
|
|
||||||
LogAudioCaptureCore: Display: WasapiCapture AudioFormat SampeRate: 48000, BitDepth: 32-Bit Floating Point
|
|
||||||
LogElevenLabsMic: Capture device: Microphone Array (Intel® Smart Sound Technology for Digital Microphones) | Rate=48000 | Channels=2
|
|
||||||
LogElevenLabsMic: Audio capture started.
|
|
||||||
LogElevenLabsAgent: [T+148.56s] [Turn 9] Mic opened — user speaking.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Listening
|
|
||||||
LogElevenLabsMic: Audio capture stopped.
|
|
||||||
LogElevenLabsWS: [T+150.35s] User turn ended — server VAD silence detection started (turn_timeout=1s).
|
|
||||||
LogElevenLabsAgent: [T+150.35s] [Turn 9] Mic closed — user spoke 1.79s. Waiting for server response (timeout 10s)...
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Warning: [T+151.56s] [LATENCY] First audio: 1211 ms after turn end (1211 ms after last chunk)
|
|
||||||
LogElevenLabsWS: Received message type: user_transcript
|
|
||||||
LogElevenLabsWS: Warning: [T+151.59s] [LATENCY] User transcript: 1245 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Transcript : Hello.
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Received message type: agent_response
|
|
||||||
LogElevenLabsWS: Warning: [T+152.87s] [LATENCY] Agent text response: 2526 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Text Response : Hello! How can I help you today?
|
|
||||||
LogElevenLabsAgent: Warning: [T+160.36s] [Turn 9] Response timeout — server did not start generating after 10.0s. Firing OnAgentResponseTimeout.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Start Listening
|
|
||||||
LogElevenLabsWS: [T+162.59s] User turn started — mic open, audio chunks will follow.
|
|
||||||
LogAudioCaptureCore: Display: WasapiCapture AudioFormat SampeRate: 48000, BitDepth: 32-Bit Floating Point
|
|
||||||
LogElevenLabsMic: Capture device: Microphone Array (Intel® Smart Sound Technology for Digital Microphones) | Rate=48000 | Channels=2
|
|
||||||
LogElevenLabsMic: Audio capture started.
|
|
||||||
LogElevenLabsAgent: [T+162.59s] [Turn 10] Mic opened — user speaking.
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Stop Listening
|
|
||||||
LogElevenLabsMic: Audio capture stopped.
|
|
||||||
LogElevenLabsWS: [T+165.69s] User turn ended — server VAD silence detection started (turn_timeout=1s).
|
|
||||||
LogElevenLabsAgent: [T+165.69s] [Turn 10] Mic closed — user spoke 3.10s. Waiting for server response (timeout 10s)...
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: agent_chat_response_part
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Warning: [T+168.47s] [LATENCY] First audio: 2778 ms after turn end (2778 ms after last chunk)
|
|
||||||
LogElevenLabsWS: Received message type: user_transcript
|
|
||||||
LogElevenLabsWS: Warning: [T+168.50s] [LATENCY] User transcript: 2810 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Transcript : Why are you so slow?
|
|
||||||
LogElevenLabsAgent: Warning: [T+175.70s] [Turn 10] Response timeout — server did not start generating after 10.0s. Firing OnAgentResponseTimeout.
|
|
||||||
LogElevenLabsWS: Received message type: audio
|
|
||||||
LogElevenLabsWS: Received message type: agent_response
|
|
||||||
LogElevenLabsWS: Warning: [T+175.85s] [LATENCY] Agent text response: 10161 ms after turn end
|
|
||||||
LogBlueprintUserMessages: [test_AI_Actor_C_2] [AI Agent] Text Response : I apologize if my response seemed slow. I'm designed to be as quick as possible while still being helpful. Is there something specific I can do to improve your experience?
|
|
||||||
LogSlate: Updating window title bar state: overlay mode, drag disabled, window buttons hidden, title bar hidden
|
|
||||||
LogWorld: BeginTearingDown for /Game/UEDPIE_0_TestMap
|
|
||||||
|
|
||||||
Show less
|
|
||||||
|
|
||||||
|
|
||||||
Excellent logs — je vois deux bugs distincts :
|
|
||||||
|
|
||||||
Bug A — bIgnoreIncomingContent jamais effacé : après l'interruption au Turn 5, le serveur n'envoie pas d'interruption_event (ou on ne le reçoit pas). Le flag reste true → tout le contenu du Turn 6, 7, 8, 9, 10 est silencieusement ignoré → timeout systématique. Preuve : les [LATENCY] First audio du WebSocketProxy s'affichent (l'audio arrive) mais JAMAIS le Agent speaking — first audio chunk de l'AgentComponent (l'audio est jeté).
|
|
||||||
|
|
||||||
Bug B — HardSilenceTimeoutTicks = 120 (2s) trop court : l'agent répondait en deux batches audio avec 4.7s de gap (Turn 2 : T+23.85s puis T+28.55s). Le timeout de 2s s'est déclenché pendant le gap → faux "Stop Speaking" → Blueprint ouvre le mic → puis le 2e batch arrive → "Start Speaking" à nouveau.
|
|
||||||
|
|
||||||
Fixes :
|
|
||||||
@ -323,7 +323,7 @@ void FAnimNode_PS_AI_ConvAgent_Posture::Update_AnyThread(const FAnimationUpdateC
|
|||||||
// Instead, keep the last valid cached values as a hold-over.
|
// Instead, keep the last valid cached values as a hold-over.
|
||||||
if (PostureComponent.IsValid())
|
if (PostureComponent.IsValid())
|
||||||
{
|
{
|
||||||
CachedEyeCurves = PostureComponent->GetCurrentEyeCurves();
|
PostureComponent->FillCurrentEyeCurves(CachedEyeCurves);
|
||||||
CachedHeadRotation = PostureComponent->GetCurrentHeadRotation();
|
CachedHeadRotation = PostureComponent->GetCurrentHeadRotation();
|
||||||
CachedHeadCompensation = PostureComponent->GetHeadAnimationCompensation();
|
CachedHeadCompensation = PostureComponent->GetHeadAnimationCompensation();
|
||||||
CachedEyeCompensation = PostureComponent->GetEyeAnimationCompensation();
|
CachedEyeCompensation = PostureComponent->GetEyeAnimationCompensation();
|
||||||
|
|||||||
@ -151,6 +151,35 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::TickComponent(float DeltaTime, ELevel
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ── Reconnection ────────────────────────────────────────────────────────
|
||||||
|
if (bWantsReconnect && FPlatformTime::Seconds() >= NextReconnectTime)
|
||||||
|
{
|
||||||
|
++ReconnectAttemptCount;
|
||||||
|
if (ReconnectAttemptCount > MaxReconnectAttempts)
|
||||||
|
{
|
||||||
|
// Give up — full disconnect cleanup.
|
||||||
|
bWantsReconnect = false;
|
||||||
|
UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Error,
|
||||||
|
TEXT("Reconnection failed after %d attempts — giving up."), MaxReconnectAttempts);
|
||||||
|
bNetIsConversing = false;
|
||||||
|
ApplyConversationPosture();
|
||||||
|
NetConversatingPlayer = nullptr;
|
||||||
|
NetConversatingPawn = nullptr;
|
||||||
|
OnAgentDisconnected.Broadcast(1006, TEXT("Reconnection failed"));
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Log,
|
||||||
|
TEXT("Reconnecting... attempt %d/%d"), ReconnectAttemptCount, MaxReconnectAttempts);
|
||||||
|
// Ensure proxy is in a clean state for reconnection.
|
||||||
|
if (WebSocketProxy)
|
||||||
|
{
|
||||||
|
WebSocketProxy->Disconnect();
|
||||||
|
}
|
||||||
|
StartConversation_Internal();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Silence detection.
|
// Silence detection.
|
||||||
// ISSUE-8: broadcast OnAgentStoppedSpeaking OUTSIDE AudioQueueLock.
|
// ISSUE-8: broadcast OnAgentStoppedSpeaking OUTSIDE AudioQueueLock.
|
||||||
// OnProceduralUnderflow (audio thread) also acquires AudioQueueLock — if we broadcast
|
// OnProceduralUnderflow (audio thread) also acquires AudioQueueLock — if we broadcast
|
||||||
@ -162,7 +191,7 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::TickComponent(float DeltaTime, ELevel
|
|||||||
if (bAgentSpeaking)
|
if (bAgentSpeaking)
|
||||||
{
|
{
|
||||||
FScopeLock Lock(&AudioQueueLock);
|
FScopeLock Lock(&AudioQueueLock);
|
||||||
if (AudioQueue.Num() == 0)
|
if (AudioQueue.Num() - AudioQueueReadOffset == 0)
|
||||||
{
|
{
|
||||||
SilentTickCount++;
|
SilentTickCount++;
|
||||||
|
|
||||||
@ -290,6 +319,10 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::EndConversation()
|
|||||||
{
|
{
|
||||||
if (GetOwnerRole() == ROLE_Authority)
|
if (GetOwnerRole() == ROLE_Authority)
|
||||||
{
|
{
|
||||||
|
// Cancel any pending reconnection.
|
||||||
|
bWantsReconnect = false;
|
||||||
|
ReconnectAttemptCount = 0;
|
||||||
|
|
||||||
StopListening();
|
StopListening();
|
||||||
// ISSUE-4: StopListening() may set bWaitingForAgentResponse=true (normal turn end path).
|
// ISSUE-4: StopListening() may set bWaitingForAgentResponse=true (normal turn end path).
|
||||||
// Cancel it immediately — there is no response coming because we are ending the session.
|
// Cancel it immediately — there is no response coming because we are ending the session.
|
||||||
@ -299,6 +332,7 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::EndConversation()
|
|||||||
|
|
||||||
if (WebSocketProxy)
|
if (WebSocketProxy)
|
||||||
{
|
{
|
||||||
|
bIntentionalDisconnect = true;
|
||||||
WebSocketProxy->Disconnect();
|
WebSocketProxy->Disconnect();
|
||||||
WebSocketProxy = nullptr;
|
WebSocketProxy = nullptr;
|
||||||
}
|
}
|
||||||
@ -706,6 +740,15 @@ const FPS_AI_ConvAgent_ConversationInfo_ElevenLabs& UPS_AI_ConvAgent_ElevenLabsC
|
|||||||
// ─────────────────────────────────────────────────────────────────────────────
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
void UPS_AI_ConvAgent_ElevenLabsComponent::HandleConnected(const FPS_AI_ConvAgent_ConversationInfo_ElevenLabs& Info)
|
void UPS_AI_ConvAgent_ElevenLabsComponent::HandleConnected(const FPS_AI_ConvAgent_ConversationInfo_ElevenLabs& Info)
|
||||||
{
|
{
|
||||||
|
// Reset reconnection state on successful connection.
|
||||||
|
if (bWantsReconnect)
|
||||||
|
{
|
||||||
|
UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Log,
|
||||||
|
TEXT("Reconnection successful after %d attempt(s)."), ReconnectAttemptCount);
|
||||||
|
}
|
||||||
|
bWantsReconnect = false;
|
||||||
|
ReconnectAttemptCount = 0;
|
||||||
|
|
||||||
SessionStartTime = FPlatformTime::Seconds();
|
SessionStartTime = FPlatformTime::Seconds();
|
||||||
TurnIndex = 0;
|
TurnIndex = 0;
|
||||||
LastClosedTurnIndex = 0;
|
LastClosedTurnIndex = 0;
|
||||||
@ -739,6 +782,17 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::HandleDisconnected(int32 StatusCode,
|
|||||||
{
|
{
|
||||||
UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Log, TEXT("Agent disconnected. Code=%d Reason=%s"), StatusCode, *Reason);
|
UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Log, TEXT("Agent disconnected. Code=%d Reason=%s"), StatusCode, *Reason);
|
||||||
|
|
||||||
|
// During reconnection, a failed attempt also triggers HandleDisconnected.
|
||||||
|
// Just schedule the next attempt without full cleanup.
|
||||||
|
if (bWantsReconnect)
|
||||||
|
{
|
||||||
|
const double Delay = FMath::Min(FMath::Pow(2.0, static_cast<double>(ReconnectAttemptCount)), MaxReconnectDelay);
|
||||||
|
NextReconnectTime = FPlatformTime::Seconds() + Delay;
|
||||||
|
UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Log,
|
||||||
|
TEXT("Reconnection attempt failed — retrying in %.0fs..."), Delay);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
// ISSUE-13: stop audio playback and clear the queue if the WebSocket drops while the
|
// ISSUE-13: stop audio playback and clear the queue if the WebSocket drops while the
|
||||||
// agent is speaking. Without this the audio component kept playing buffered PCM after
|
// agent is speaking. Without this the audio component kept playing buffered PCM after
|
||||||
// disconnect. StopAgentAudio also fires OnAgentStoppedSpeaking if bAgentSpeaking was true,
|
// disconnect. StopAgentAudio also fires OnAgentStoppedSpeaking if bAgentSpeaking was true,
|
||||||
@ -757,6 +811,26 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::HandleDisconnected(int32 StatusCode,
|
|||||||
FScopeLock Lock(&MicSendLock);
|
FScopeLock Lock(&MicSendLock);
|
||||||
MicAccumulationBuffer.Reset();
|
MicAccumulationBuffer.Reset();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ── Auto-reconnect on unexpected disconnect ─────────────────────────────
|
||||||
|
// StatusCode 1000 = normal close (user called EndConversation).
|
||||||
|
// bIntentionalDisconnect covers EndConversation and ServerReleaseConversation.
|
||||||
|
if (!bIntentionalDisconnect && StatusCode != 1000
|
||||||
|
&& MaxReconnectAttempts > 0 && GetOwnerRole() == ROLE_Authority)
|
||||||
|
{
|
||||||
|
bWantsReconnect = true;
|
||||||
|
ReconnectAttemptCount = 0;
|
||||||
|
const double Delay = 1.0; // First attempt after 1 second.
|
||||||
|
NextReconnectTime = FPlatformTime::Seconds() + Delay;
|
||||||
|
UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Warning,
|
||||||
|
TEXT("Unexpected disconnect — will attempt reconnection in %.0fs (max %d attempts)."),
|
||||||
|
Delay, MaxReconnectAttempts);
|
||||||
|
OnAgentError.Broadcast(TEXT("Connection lost — reconnecting..."));
|
||||||
|
// Keep bNetIsConversing / NetConversatingPawn so the NPC stays occupied.
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
bIntentionalDisconnect = false;
|
||||||
// Reset replicated state on disconnect.
|
// Reset replicated state on disconnect.
|
||||||
if (GetOwnerRole() == ROLE_Authority)
|
if (GetOwnerRole() == ROLE_Authority)
|
||||||
{
|
{
|
||||||
@ -772,6 +846,18 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::HandleDisconnected(int32 StatusCode,
|
|||||||
void UPS_AI_ConvAgent_ElevenLabsComponent::HandleError(const FString& ErrorMessage)
|
void UPS_AI_ConvAgent_ElevenLabsComponent::HandleError(const FString& ErrorMessage)
|
||||||
{
|
{
|
||||||
UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Error, TEXT("Agent error: %s"), *ErrorMessage);
|
UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Error, TEXT("Agent error: %s"), *ErrorMessage);
|
||||||
|
|
||||||
|
// During reconnection, a connection error should schedule the next attempt
|
||||||
|
// (OnWsConnectionError fires but OnWsClosed may not follow).
|
||||||
|
if (bWantsReconnect)
|
||||||
|
{
|
||||||
|
const double Delay = FMath::Min(FMath::Pow(2.0, static_cast<double>(ReconnectAttemptCount)), MaxReconnectDelay);
|
||||||
|
NextReconnectTime = FPlatformTime::Seconds() + Delay;
|
||||||
|
UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Log,
|
||||||
|
TEXT("Reconnection error — retrying in %.0fs..."), Delay);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
OnAgentError.Broadcast(ErrorMessage);
|
OnAgentError.Broadcast(ErrorMessage);
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -783,7 +869,7 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::HandleAudioReceived(const TArray<uint
|
|||||||
int32 QueueBefore;
|
int32 QueueBefore;
|
||||||
{
|
{
|
||||||
FScopeLock Lock(&AudioQueueLock);
|
FScopeLock Lock(&AudioQueueLock);
|
||||||
QueueBefore = AudioQueue.Num() / sizeof(int16);
|
QueueBefore = (AudioQueue.Num() - AudioQueueReadOffset) / sizeof(int16);
|
||||||
}
|
}
|
||||||
if (bDebug && DebugVerbosity >= 2)
|
if (bDebug && DebugVerbosity >= 2)
|
||||||
{
|
{
|
||||||
@ -1107,13 +1193,28 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::OnProceduralUnderflow(
|
|||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (AudioQueue.Num() > 0)
|
const int32 Available = AudioQueue.Num() - AudioQueueReadOffset;
|
||||||
|
if (Available > 0)
|
||||||
{
|
{
|
||||||
const int32 BytesRequired = SamplesRequired * sizeof(int16);
|
const int32 BytesRequired = SamplesRequired * sizeof(int16);
|
||||||
const int32 BytesToPush = FMath::Min(AudioQueue.Num(), BytesRequired);
|
const int32 BytesToPush = FMath::Min(Available, BytesRequired);
|
||||||
|
|
||||||
InProceduralWave->QueueAudio(AudioQueue.GetData(), BytesToPush);
|
InProceduralWave->QueueAudio(AudioQueue.GetData() + AudioQueueReadOffset, BytesToPush);
|
||||||
AudioQueue.RemoveAt(0, BytesToPush, EAllowShrinking::No);
|
AudioQueueReadOffset += BytesToPush;
|
||||||
|
|
||||||
|
// Compact when read offset exceeds half the buffer to prevent unbounded growth.
|
||||||
|
// This replaces the old RemoveAt(0,N) per-callback O(n) memmove with a periodic
|
||||||
|
// compaction that happens much less frequently (~every 0.5-1s instead of 60Hz).
|
||||||
|
if (AudioQueueReadOffset > AudioQueue.Num() / 2)
|
||||||
|
{
|
||||||
|
const int32 Remaining = AudioQueue.Num() - AudioQueueReadOffset;
|
||||||
|
if (Remaining > 0)
|
||||||
|
{
|
||||||
|
FMemory::Memmove(AudioQueue.GetData(), AudioQueue.GetData() + AudioQueueReadOffset, Remaining);
|
||||||
|
}
|
||||||
|
AudioQueue.SetNum(Remaining, EAllowShrinking::No);
|
||||||
|
AudioQueueReadOffset = 0;
|
||||||
|
}
|
||||||
|
|
||||||
// Log when queue recovers (new data arrived after being dry)
|
// Log when queue recovers (new data arrived after being dry)
|
||||||
if (bQueueWasDry)
|
if (bQueueWasDry)
|
||||||
@ -1124,7 +1225,7 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::OnProceduralUnderflow(
|
|||||||
const double T = FPlatformTime::Seconds() - SessionStartTime;
|
const double T = FPlatformTime::Seconds() - SessionStartTime;
|
||||||
UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Log,
|
UE_LOG(LogPS_AI_ConvAgent_ElevenLabs, Log,
|
||||||
TEXT("[T+%.2fs] [Turn %d] AudioQueue recovered — feeding real data again (%d bytes remaining)."),
|
TEXT("[T+%.2fs] [Turn %d] AudioQueue recovered — feeding real data again (%d bytes remaining)."),
|
||||||
T, LastClosedTurnIndex, AudioQueue.Num());
|
T, LastClosedTurnIndex, AudioQueue.Num() - AudioQueueReadOffset);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -1269,6 +1370,7 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::StopAgentAudio()
|
|||||||
{
|
{
|
||||||
FScopeLock Lock(&AudioQueueLock);
|
FScopeLock Lock(&AudioQueueLock);
|
||||||
AudioQueue.Empty();
|
AudioQueue.Empty();
|
||||||
|
AudioQueueReadOffset = 0;
|
||||||
|
|
||||||
bAgentGenerating = false; // Always clear — covers interruptions during generation phase.
|
bAgentGenerating = false; // Always clear — covers interruptions during generation phase.
|
||||||
bAgentResponseReceived = false; // Reset — next response will re-confirm when done.
|
bAgentResponseReceived = false; // Reset — next response will re-confirm when done.
|
||||||
@ -1536,12 +1638,17 @@ void UPS_AI_ConvAgent_ElevenLabsComponent::ServerRequestConversation_Implementat
|
|||||||
|
|
||||||
void UPS_AI_ConvAgent_ElevenLabsComponent::ServerReleaseConversation_Implementation()
|
void UPS_AI_ConvAgent_ElevenLabsComponent::ServerReleaseConversation_Implementation()
|
||||||
{
|
{
|
||||||
|
// Cancel any pending reconnection.
|
||||||
|
bWantsReconnect = false;
|
||||||
|
ReconnectAttemptCount = 0;
|
||||||
|
|
||||||
StopListening();
|
StopListening();
|
||||||
bWaitingForAgentResponse = false;
|
bWaitingForAgentResponse = false;
|
||||||
StopAgentAudio();
|
StopAgentAudio();
|
||||||
|
|
||||||
if (WebSocketProxy)
|
if (WebSocketProxy)
|
||||||
{
|
{
|
||||||
|
bIntentionalDisconnect = true;
|
||||||
WebSocketProxy->Disconnect();
|
WebSocketProxy->Disconnect();
|
||||||
WebSocketProxy = nullptr;
|
WebSocketProxy = nullptr;
|
||||||
}
|
}
|
||||||
|
|||||||
@ -472,7 +472,7 @@ void UPS_AI_ConvAgent_WebSocket_ElevenLabsProxy::HandleAudioResponse(const TShar
|
|||||||
(*AudioEvent)->TryGetNumberField(TEXT("event_id"), EventId);
|
(*AudioEvent)->TryGetNumberField(TEXT("event_id"), EventId);
|
||||||
if (EventId > 0 && EventId <= LastInterruptEventId)
|
if (EventId > 0 && EventId <= LastInterruptEventId)
|
||||||
{
|
{
|
||||||
UE_LOG(LogPS_AI_ConvAgent_WS_ElevenLabs, Verbose, TEXT("Discarding audio event_id=%d (interrupted at %d)."), EventId, LastInterruptEventId);
|
UE_LOG(LogPS_AI_ConvAgent_WS_ElevenLabs, Verbose, TEXT("Discarding audio event_id=%d (interrupted at %d)."), EventId, LastInterruptEventId.load());
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -175,6 +175,14 @@ public:
|
|||||||
ToolTip = "Seconds to wait for a server response after the user stops speaking.\nFires OnAgentResponseTimeout if exceeded. Normal latency is 0.1-0.8s.\nSet to 0 to disable. Default: 10s."))
|
ToolTip = "Seconds to wait for a server response after the user stops speaking.\nFires OnAgentResponseTimeout if exceeded. Normal latency is 0.1-0.8s.\nSet to 0 to disable. Default: 10s."))
|
||||||
float ResponseTimeoutSeconds = 10.0f;
|
float ResponseTimeoutSeconds = 10.0f;
|
||||||
|
|
||||||
|
/** Maximum auto-reconnection attempts on unexpected WebSocket disconnect.
|
||||||
|
* Uses exponential backoff (1s, 2s, 4s, ... up to 30s).
|
||||||
|
* Set to 0 to disable auto-reconnection. */
|
||||||
|
UPROPERTY(EditAnywhere, BlueprintReadWrite, Category = "PS AI ConvAgent|ElevenLabs",
|
||||||
|
meta = (ClampMin = "0",
|
||||||
|
ToolTip = "Auto-reconnect attempts on unexpected disconnect.\n0 = disabled. Uses exponential backoff."))
|
||||||
|
int32 MaxReconnectAttempts = 5;
|
||||||
|
|
||||||
// ── Multi-agent / external mic ───────────────────────────────────────────
|
// ── Multi-agent / external mic ───────────────────────────────────────────
|
||||||
|
|
||||||
/** When true, StartListening/StopListening manage the turn state but do NOT
|
/** When true, StartListening/StopListening manage the turn state but do NOT
|
||||||
@ -537,10 +545,11 @@ private:
|
|||||||
std::atomic<bool> bAgentSpeaking{false};
|
std::atomic<bool> bAgentSpeaking{false};
|
||||||
// True from the first agent_chat_response_part until the first audio chunk arrives.
|
// True from the first agent_chat_response_part until the first audio chunk arrives.
|
||||||
// Used to block StartListening() while the server is processing the previous turn.
|
// Used to block StartListening() while the server is processing the previous turn.
|
||||||
bool bAgentGenerating = false;
|
// Atomic: defensive — currently game-thread only, but documents thread-safety contract.
|
||||||
|
std::atomic<bool> bAgentGenerating{false};
|
||||||
// True between StopListening() and HandleAgentResponseStarted() (first chat response part).
|
// True between StopListening() and HandleAgentResponseStarted() (first chat response part).
|
||||||
// Used to fire OnAgentResponseTimeout if the server takes too long to start generating.
|
// Used to fire OnAgentResponseTimeout if the server takes too long to start generating.
|
||||||
bool bWaitingForAgentResponse = false;
|
std::atomic<bool> bWaitingForAgentResponse{false};
|
||||||
|
|
||||||
// ── Turn tracking ─────────────────────────────────────────────────────────
|
// ── Turn tracking ─────────────────────────────────────────────────────────
|
||||||
// Monotonically increasing counter. Incremented each time StartListening() successfully
|
// Monotonically increasing counter. Incremented each time StartListening() successfully
|
||||||
@ -561,7 +570,11 @@ private:
|
|||||||
double AgentSpeakStart = 0.0; // Set in EnqueueAgentAudio (first chunk) — when audio begins.
|
double AgentSpeakStart = 0.0; // Set in EnqueueAgentAudio (first chunk) — when audio begins.
|
||||||
|
|
||||||
// Accumulates incoming PCM bytes until the audio component needs data.
|
// Accumulates incoming PCM bytes until the audio component needs data.
|
||||||
|
// Uses a read offset instead of RemoveAt(0,N) to avoid O(n) memmove every
|
||||||
|
// underflow callback (~60Hz). Compacted periodically when read offset
|
||||||
|
// exceeds half the buffer.
|
||||||
TArray<uint8> AudioQueue;
|
TArray<uint8> AudioQueue;
|
||||||
|
int32 AudioQueueReadOffset = 0;
|
||||||
FCriticalSection AudioQueueLock;
|
FCriticalSection AudioQueueLock;
|
||||||
|
|
||||||
// Pre-buffer state: delay playback start to absorb TTS inter-chunk gaps.
|
// Pre-buffer state: delay playback start to absorb TTS inter-chunk gaps.
|
||||||
@ -611,6 +624,15 @@ private:
|
|||||||
* Formula: bytes = SampleRate * (ms / 1000) * BytesPerSample = 16000 * ms / 1000 * 2 = 32 * ms */
|
* Formula: bytes = SampleRate * (ms / 1000) * BytesPerSample = 16000 * ms / 1000 * 2 = 32 * ms */
|
||||||
int32 GetMicChunkMinBytes() const { return MicChunkDurationMs * 32; }
|
int32 GetMicChunkMinBytes() const { return MicChunkDurationMs * 32; }
|
||||||
|
|
||||||
|
// ── Reconnection ────────────────────────────────────────────────────────
|
||||||
|
// Exponential backoff reconnection when the WebSocket drops unexpectedly.
|
||||||
|
// Only active on the Authority (server/standalone) where the WebSocket lives.
|
||||||
|
bool bWantsReconnect = false;
|
||||||
|
bool bIntentionalDisconnect = false;
|
||||||
|
int32 ReconnectAttemptCount = 0;
|
||||||
|
double NextReconnectTime = 0.0;
|
||||||
|
static constexpr double MaxReconnectDelay = 30.0;
|
||||||
|
|
||||||
// ── Opus codec (network audio compression) ───────────────────────────────
|
// ── Opus codec (network audio compression) ───────────────────────────────
|
||||||
TSharedPtr<IVoiceEncoder> OpusEncoder; // All: server encodes agent audio, clients encode mic audio
|
TSharedPtr<IVoiceEncoder> OpusEncoder; // All: server encodes agent audio, clients encode mic audio
|
||||||
TSharedPtr<IVoiceDecoder> OpusDecoder; // All: clients decode agent audio, server decodes mic audio
|
TSharedPtr<IVoiceDecoder> OpusDecoder; // All: clients decode agent audio, server decodes mic audio
|
||||||
|
|||||||
@ -232,7 +232,8 @@ public:
|
|||||||
// ── Getters (read by AnimNode) ───────────────────────────────────────────
|
// ── Getters (read by AnimNode) ───────────────────────────────────────────
|
||||||
|
|
||||||
/** Get current eye gaze curves (8 ARKit eye look curves).
|
/** Get current eye gaze curves (8 ARKit eye look curves).
|
||||||
* Returns a COPY scaled by activation alpha — safe to call from any thread. */
|
* Returns a COPY scaled by activation alpha — safe to call from any thread.
|
||||||
|
* NOTE: Prefer FillCurrentEyeCurves() for hot paths to avoid per-frame allocation. */
|
||||||
UFUNCTION(BlueprintCallable, Category = "PS AI ConvAgent|Posture")
|
UFUNCTION(BlueprintCallable, Category = "PS AI ConvAgent|Posture")
|
||||||
TMap<FName, float> GetCurrentEyeCurves() const
|
TMap<FName, float> GetCurrentEyeCurves() const
|
||||||
{
|
{
|
||||||
@ -244,6 +245,24 @@ public:
|
|||||||
return Scaled;
|
return Scaled;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/** Fill an existing TMap with current eye gaze curves (zero-alloc after first call).
|
||||||
|
* Reuses the caller's existing hash table — no heap allocation when keys match.
|
||||||
|
* Thread-safe (takes PostureDataLock internally). */
|
||||||
|
void FillCurrentEyeCurves(TMap<FName, float>& OutCurves) const
|
||||||
|
{
|
||||||
|
FScopeLock Lock(&PostureDataLock);
|
||||||
|
if (CurrentActiveAlpha < 0.001f)
|
||||||
|
{
|
||||||
|
for (auto& Pair : OutCurves) { Pair.Value = 0.0f; }
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const float Alpha = (CurrentActiveAlpha >= 0.999f) ? 1.0f : CurrentActiveAlpha;
|
||||||
|
for (const auto& Pair : CurrentEyeCurves)
|
||||||
|
{
|
||||||
|
OutCurves.FindOrAdd(Pair.Key) = Pair.Value * Alpha;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/** Get current head rotation offset (applied by AnimNode as FQuat to avoid
|
/** Get current head rotation offset (applied by AnimNode as FQuat to avoid
|
||||||
* Euler round-trip that reintroduces parasitic tilt on diagonals).
|
* Euler round-trip that reintroduces parasitic tilt on diagonals).
|
||||||
* Thread-safe copy, blended by activation alpha. */
|
* Thread-safe copy, blended by activation alpha. */
|
||||||
|
|||||||
@ -6,6 +6,7 @@
|
|||||||
#include "UObject/NoExportTypes.h"
|
#include "UObject/NoExportTypes.h"
|
||||||
#include "PS_AI_ConvAgent_Definitions.h"
|
#include "PS_AI_ConvAgent_Definitions.h"
|
||||||
#include "IWebSocket.h"
|
#include "IWebSocket.h"
|
||||||
|
#include <atomic>
|
||||||
#include "PS_AI_ConvAgent_WebSocket_ElevenLabsProxy.generated.h"
|
#include "PS_AI_ConvAgent_WebSocket_ElevenLabsProxy.generated.h"
|
||||||
|
|
||||||
// ─────────────────────────────────────────────────────────────────────────────
|
// ─────────────────────────────────────────────────────────────────────────────
|
||||||
@ -239,12 +240,13 @@ private:
|
|||||||
// Timestamp when user turn ended (StopListening).
|
// Timestamp when user turn ended (StopListening).
|
||||||
double UserTurnEndTime = 0.0;
|
double UserTurnEndTime = 0.0;
|
||||||
// Whether we are waiting for the first response after user stopped speaking.
|
// Whether we are waiting for the first response after user stopped speaking.
|
||||||
bool bWaitingForResponse = false;
|
// Atomic: defensive — documents thread-safety contract.
|
||||||
|
std::atomic<bool> bWaitingForResponse{false};
|
||||||
// Whether we already logged the first audio response latency for this turn.
|
// Whether we already logged the first audio response latency for this turn.
|
||||||
bool bFirstAudioResponseLogged = false;
|
std::atomic<bool> bFirstAudioResponseLogged{false};
|
||||||
// Whether OnAgentResponseStarted has already been fired for the current turn.
|
// Whether OnAgentResponseStarted has already been fired for the current turn.
|
||||||
// Reset at turn start so only the first agent_chat_response_part fires the event.
|
// Reset at turn start so only the first agent_chat_response_part fires the event.
|
||||||
bool bAgentResponseStartedFired = false;
|
std::atomic<bool> bAgentResponseStartedFired{false};
|
||||||
|
|
||||||
// Timestamp when the conversation was initiated (conversation_initiation_metadata received).
|
// Timestamp when the conversation was initiated (conversation_initiation_metadata received).
|
||||||
// Used to compute [T+Xs] session-relative timestamps in all log messages.
|
// Used to compute [T+Xs] session-relative timestamps in all log messages.
|
||||||
@ -255,7 +257,8 @@ private:
|
|||||||
// Audio events whose event_id <= LastInterruptEventId belong to the cancelled
|
// Audio events whose event_id <= LastInterruptEventId belong to the cancelled
|
||||||
// generation and must be discarded. Only AUDIO is filtered — transcripts,
|
// generation and must be discarded. Only AUDIO is filtered — transcripts,
|
||||||
// agent_response, agent_chat_response_part etc. are always processed.
|
// agent_response, agent_chat_response_part etc. are always processed.
|
||||||
int32 LastInterruptEventId = 0;
|
// Atomic: defensive — documents thread-safety contract.
|
||||||
|
std::atomic<int32> LastInterruptEventId{0};
|
||||||
|
|
||||||
public:
|
public:
|
||||||
// Set by UPS_AI_ConvAgent_ElevenLabsComponent before calling Connect().
|
// Set by UPS_AI_ConvAgent_ElevenLabsComponent before calling Connect().
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user