Voice Agents Latency Benchmark
Compare latency performance across different voice agent platforms. Lower latency means faster, more natural conversations.
Latency measurements represent the complete round-trip time from when a user stops speaking until they hear the agent's response begin. This includes audio processing at both ends, turn detection, STT, LLM, TTS, network transmission delays and any delay introduced by each platform.
The test suite ran for these tests was the basic one without tools calling or complex scenarios.
For other test suites, regions, full report or custom tests please reach out to contact@livetok.io.
25002000150010005000
VAPI
Retell AI
Pipecat
LiveKit
OpenAI RealTime
LiveTok Gemini
Latency (ms)
| Platform | STT | LLM | TTS | Latency | Std Dev | |
|---|---|---|---|---|---|---|
| VAPI | Deepgram | OpenAI | ElevenLabs | 1561ms | ±618ms | |
| Retell AI | Deepgram | OpenAI | ElevenLabs | 985ms | ±135ms | |
| Pipecat | Deepgram | OpenAI | ElevenLabs | 1611ms | ±300ms | |
| LiveKit | Deepgram | OpenAI | ElevenLabs | 1612ms | ±208ms | |
| OpenAI RealTime | N/A | OpenAI | N/A | 1295ms | ±312ms | |
| LiveTok Gemini | N/A | Gemini | N/A | 1379ms | ±65ms |
Configuration: Deepgram Nova-2, GPT-4o-mini, ElevenLabs Flash v2.5 and Native audio models for OpenAI and Gemini Speech-to-Speech models.