j.foucher/PS_AI_Agent

Fork 0

j.foucher f23acc8c1c Ajout sample CPP

2026-02-21 20:48:10 +01:00

5.7 KiB

Raw Blame History

ElevenLabs Conversational AI - C++ Implementation

C++ implementation of ElevenLabs Conversational AI client

Features

Real-time Audio Processing: Full-duplex audio streaming with low-latency playback
WebSocket Integration: Secure WSS connection to ElevenLabs Conversational AI platform
Cross-platform Audio: PortAudio-based implementation supporting Windows, macOS, and Linux
Echo Suppression: Built-in acoustic feedback prevention
Modern C++: Clean, maintainable C++17 codebase with proper RAII and exception handling
Flexible Architecture: Modular design allowing easy customization and extension

Architecture

graph TB
    subgraph "User Interface"
        A[main.cpp] --> B[Conversation]
    end
    
    subgraph "Core Components"
        B --> C[DefaultAudioInterface]
        B --> D[WebSocket Client]
        C --> E[PortAudio]
        D --> F[Boost.Beast + OpenSSL]
    end
    
    subgraph "ElevenLabs Platform"
        F --> G[WSS API Endpoint]
        G --> H[Conversational AI Agent]
    end
    
    subgraph "Audio Flow"
        I[Microphone] --> C
        C --> J[Base64 Encoding]
        J --> D
        D --> K[Audio Events]
        K --> L[Base64 Decoding]
        L --> C
        C --> M[Speakers]
    end
    
    subgraph "Message Types"
        N[user_audio_chunk]
        O[agent_response]
        P[user_transcript]
        Q[audio_event]
        R[ping/pong]
    end
    
    style B fill:#e1f5fe
    style C fill:#f3e5f5
    style D fill:#e8f5e8
    style H fill:#fff3e0

Quick Start

Prerequisites

C++17 compatible compiler: GCC 11+, Clang 14+, or MSVC 2022+
CMake 3.14 or higher
Dependencies (install via package manager):

macOS (Homebrew)

brew install boost openssl portaudio nlohmann-json cmake pkg-config

Ubuntu/Debian

sudo apt update
sudo apt install build-essential cmake pkg-config
sudo apt install libboost-system-dev libboost-thread-dev
sudo apt install libssl-dev libportaudio2-dev nlohmann-json3-dev

Windows (vcpkg)

vcpkg install boost-system boost-thread openssl portaudio nlohmann-json

Building

# Clone the repository
git clone https://github.com/Jitendra2603/elevenlabs-convai-cpp.git
cd elevenlabs-convai-cpp

# Build the project
mkdir build && cd build
cmake ..
cmake --build . --config Release

Running

# Set your agent ID (get this from ElevenLabs dashboard)
export AGENT_ID="your-agent-id-here"

# Run the demo
./convai_cpp

The application will:

Connect to your ElevenLabs Conversational AI agent
Start capturing audio from your default microphone
Stream audio to the agent and play responses through speakers
Display conversation transcripts in the terminal
Continue until you press Enter to quit

📋 Usage Examples

Basic Conversation

export AGENT_ID="agent_"
./convai_cpp
# Speak into your microphone and hear the AI agent respond

Configuration

Audio Settings

The audio interface is configured for optimal real-time performance:

Sample Rate: 16 kHz
Format: 16-bit PCM mono
Input Buffer: 250ms (4000 frames)
Output Buffer: 62.5ms (1000 frames)

WebSocket Connection

Endpoint: wss://api.elevenlabs.io/v1/convai/conversation
Protocol: WebSocket Secure (WSS) with TLS 1.2+
Authentication: Optional (required for private agents)

Project Structure

elevenlabs-convai-cpp/
├── CMakeLists.txt              # Build configuration
├── README.md                   # This file
├── LICENSE                     # MIT license
├── CONTRIBUTING.md             # Contribution guidelines
├── .gitignore                  # Git ignore rules
├── include/                    # Header files
│   ├── AudioInterface.hpp      # Abstract audio interface
│   ├── DefaultAudioInterface.hpp # PortAudio implementation
│   └── Conversation.hpp        # Main conversation handler
└── src/                        # Source files
    ├── main.cpp                # Demo application
    ├── Conversation.cpp        # WebSocket and message handling
    └── DefaultAudioInterface.cpp # Audio I/O implementation

Technical Details

Audio Processing Pipeline

Capture: PortAudio captures 16-bit PCM audio at 16kHz
Encoding: Raw audio is base64-encoded for WebSocket transmission
Streaming: Audio chunks sent as user_audio_chunk messages
Reception: Server sends audio_event messages with agent responses
Decoding: Base64 audio data decoded back to PCM
Playback: Audio queued and played through PortAudio output stream

Echo Suppression

The implementation includes a simple, effective echo suppression mechanism:

Microphone input is suppressed during agent speech playback
Prevents acoustic feedback loops that cause the agent to respond to itself
Uses atomic flags for thread-safe coordination between input/output

WebSocket Message Handling

Supported message types:

conversation_initiation_client_data - Session initialization
user_audio_chunk - Microphone audio data
audio_event - Agent speech audio
agent_response - Agent text responses
user_transcript - Speech-to-text results
ping/pong - Connection keepalive

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

5.7 KiB Raw Blame History