Voice Agents Latency Benchmark

Compare latency performance across different voice agent platforms. Lower latency means faster, more natural conversations.

Latency measurements represent the complete round-trip time from when a user stops speaking until they hear the agent's response begin. This includes audio processing at both ends, turn detection, STT, LLM, TTS, network transmission delays and any delay introduced by each platform.

The test suite ran for these tests was the basic one without tools calling or complex scenarios.

For other test suites, regions, full report or custom tests please reach out to contact@livetok.io.

25002000150010005000

VAPI

Retell AI

Pipecat

LiveKit

OpenAI RealTime

LiveTok Gemini

Latency (ms)

Platform	STT	LLM	TTS	Latency	Std Dev
VAPI	Deepgram	OpenAI	ElevenLabs	1561ms	±618ms
Retell AI	Deepgram	OpenAI	ElevenLabs	985ms	±135ms
Pipecat	Deepgram	OpenAI	ElevenLabs	1611ms	±300ms
LiveKit	Deepgram	OpenAI	ElevenLabs	1612ms	±208ms
OpenAI RealTime	N/A	OpenAI	N/A	1295ms	±312ms
LiveTok Gemini	N/A	Gemini	N/A	1379ms	±65ms

Configuration: Deepgram Nova-2, GPT-4o-mini, ElevenLabs Flash v2.5 and Native audio models for OpenAI and Gemini Speech-to-Speech models.