Realtime Translation API
Add live speech-to-speech translation to any application in minutes. Stream audio in, receive translated audio and transcripts back — all over a single WebSocket connection.
Why Realtime Translation API?
Built for applications that need instant, continuous translation. One persistent connection handles the entire session — no polling, no delays, no complexity.
Ultra-Low Latency
Streaming audio is translated and returned in real time. No request-response round trips — results arrive as the speaker talks.
Audio In, Audio Out
Send raw microphone audio and receive translated speech back. Both input and output transcripts are also streamed so you can display captions live.
130+ Languages
Translate to any major language using standard BCP-47 language codes. Switch target languages between sessions with no SDK changes.
Secure by Default
Every connection is authenticated with your JWT token. Sessions are isolated per user — your audio is never shared or stored.
Connect & Authenticate
The Realtime Translation API runs over a dedicated Socket.IO namespace. Pass your API key as a query parameter when creating the connection — the server validates it before any session can begin.
wss://api.gpttranslator.co └── namespace: /api/realtime-translator
import { io } from 'socket.io-client'; // Connect to the Realtime Translation namespace const socket = io('https://api.gpttranslator.co/api/realtime-translator', { transports: ['websocket'], query: { apiKey: 'YOUR_API_KEY' // your GPT Translator API key }, });
Note: Obtain your API key from the GPT Translator dashboard under API Keys. Connections without a valid API key are rejected immediately.
Never expose your API key in client-side code
The code example above is for illustration only. Embedding your API key directly in browser JavaScript exposes it to anyone who inspects your page source or network traffic — anyone can then use your key and drain your token balance.
✅ Recommended: proxy through your own backend
Your server holds the API key securely and opens the Socket.IO connection on behalf of the browser. The browser connects to your server, never to the translation API directly.
// Your backend holds the key — the browser never sees it
const socket = io('https://api.gpttranslator.co/api/realtime-translator', {
transports: ['websocket'],
query: { apiKey: process.env.GPT_TRANSLATOR_API_KEY },
});
// Proxy events between the browser client and the translation API
browserSocket.on('translation_start', (data) => socket.emit('translation_start', data));
browserSocket.on('audio_chunk', (chunk) => socket.emit('audio_chunk', chunk));
browserSocket.on('translation_stop', () => socket.emit('translation_stop'));
socket.on('output_audio', (data) => browserSocket.emit('output_audio', data));
socket.on('output_transcript', (data) => browserSocket.emit('output_transcript', data));
socket.on('input_transcript', (data) => browserSocket.emit('input_transcript', data));Session Lifecycle
Every translation session follows a simple five-step sequence. Understanding this order will help you build a robust integration.
Start a session
you emitEmit translation_start with the target language code. The server reserves your token balance and begins initializing the session.
Wait for readiness signals
you listenThe server emits three status events in order: session_initializing, translation_ready, and finally ready_for_audio. Only start sending audio after ready_for_audio.
Stream audio chunks
you emitContinuously emit audio_chunk events with base64-encoded PCM16 audio captured from the microphone. Keep chunks small (around 100 ms each) for the lowest latency.
Receive translated output
you listenAs translation happens, the server streams back output_audio (translated speech), output_transcript (translated text), and input_transcript (recognized source text) — all as incremental deltas.
End the session
you emitEmit translation_stop to gracefully close the session. Token usage is finalized and a translation_closed event confirms the session has ended.
Events Reference
A complete reference of every event in the Realtime Translation API — what you send and what you receive.
Events You Emit
you emittranslation_startStarts a new translation session. Pass the BCP-47 language code for the target language.
{
"targetLanguage": "es" // BCP-47 language code (e.g. "fr", "de", "zh", "ja")
}audio_chunkSends a chunk of raw microphone audio. The audioData field must be a base64-encoded PCM16 mono audio buffer sampled at 24 000 Hz.
{
"audioData": "<base64-encoded PCM16 mono audio>"
// Sample rate: 24 000 Hz
// Encoding: 16-bit PCM, little-endian
// Send continuously while the user speaks
}translation_stopGracefully ends the session. The server finalises billing and emits translation_closed.
Events You Listen To
you listensession_initializingstatusEmitted immediately after translation_start. Indicates the session is being set up — not yet ready for audio.
translation_readystatusEmitted once the server-side connection is established. Still waiting for final session activation.
ready_for_audiostatusThe session is fully active. Begin streaming audio_chunk events now.
output_audiostreamA chunk of translated speech audio. The delta is a base64-encoded PCM16 audio buffer. Buffer consecutive deltas and play them in order.
{
"delta": "<base64-encoded PCM16 audio chunk>"
// Decoded and played directly to the output speaker
// Arrives incrementally — buffer chunks for smooth playback
}output_transcriptstreamA fragment of the translated text. Append deltas together to build the full translated transcript in your UI.
{
"delta": "Hola, ¿cómo estás?" // translated text fragment
}input_transcriptstreamA fragment of the recognised source speech. Append deltas together to build the source-language caption.
{
"delta": "Hello, how are you?" // recognized source speech fragment
}translation_closedstatusConfirms the session has ended cleanly — either after translation_stop or when the server closes the session due to exhausted tokens.
translation_errorerrorAn unexpected error occurred. The message field explains what went wrong. The session may still be active — you can retry or call translation_stop.
{
"message": "Translation session error. Please try again."
}insufficient_tokenserrorYour token balance is exhausted. The session is automatically closed. Upgrade your plan to continue.
{
"message": "You have reached the limit of words for translation...",
"remainingTokens": 0
}Complete Integration Example
A minimal but complete JavaScript example showing how to connect, start a session, stream audio, handle all server events, and cleanly stop the session.
import { io } from 'socket.io-client';
const socket = io('https://api.gpttranslator.co/api/realtime-translator', {
transports: ['websocket'],
query: { apiKey: 'YOUR_API_KEY' },
});
// ─── Listen for server events ───────────────────────────────────────
socket.on('session_initializing', () => {
console.log('Session is being prepared...');
});
socket.on('translation_ready', () => {
console.log('Session connected. Waiting for activation...');
});
socket.on('ready_for_audio', () => {
console.log('Ready! Start streaming audio chunks now.');
startMicrophone(); // begin capturing and sending audio
});
socket.on('output_audio', ({ delta }) => {
// delta is a base64-encoded PCM16 audio chunk
playAudioChunk(atob(delta));
});
socket.on('output_transcript', ({ delta }) => {
// Append translated text delta to your UI
appendToTranscript('translated', delta);
});
socket.on('input_transcript', ({ delta }) => {
// Append recognized source speech to your UI
appendToTranscript('original', delta);
});
socket.on('translation_closed', () => {
console.log('Session ended.');
stopMicrophone();
});
socket.on('translation_error', ({ message }) => {
console.error('Error:', message);
stopMicrophone();
});
socket.on('insufficient_tokens', ({ message, remainingTokens }) => {
console.warn('Out of tokens:', message, 'remaining:', remainingTokens);
stopMicrophone();
showUpgradePrompt();
});
// ─── Start a session ────────────────────────────────────────────────
socket.emit('translation_start', { targetLanguage: 'es' });
// ─── Stream audio chunks ────────────────────────────────────────────
function onAudioData(pcm16Base64) {
socket.emit('audio_chunk', { audioData: pcm16Base64 });
}
// ─── Stop the session ───────────────────────────────────────────────
function stopSession() {
socket.emit('translation_stop');
}How Billing Works
Realtime translation is billed by session duration — the time between the session being activated and translation_stop or token exhaustion. No per-character or per-word counting.
Per-Second Billing
Usage is measured in seconds from the moment your session is fully active until it ends. Partial seconds are rounded up.
Tokens Deducted Live
Tokens are deducted from your balance at the end of each session based on actual duration. Your available balance is checked before a session can start.
Automatic Cap
The session is automatically terminated when your remaining token balance would run out. You receive an insufficient_tokens event before the session closes.
Estimated Token Consumption
Note: Token estimates are approximate. Actual consumption depends on audio noise levels and session activity. Check your dashboard for real-time usage.
Audio Format Requirements
The API expects a specific audio format for best accuracy and lowest latency. Use the code snippet below to capture and convert microphone audio in the browser.
PCM 16-bit, little-endian24 000 Hz1 (mono)Base64 string// Capture PCM16 audio from the browser microphone
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 24000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);
processor.onaudioprocess = (e) => {
const float32 = e.inputBuffer.getChannelData(0);
// Convert Float32 → Int16 PCM
const int16 = new Int16Array(float32.length);
for (let i = 0; i < float32.length; i++) {
int16[i] = Math.max(-32768, Math.min(32767, float32[i] * 32768));
}
// Base64-encode and send
const base64 = btoa(String.fromCharCode(...new Uint8Array(int16.buffer)));
socket.emit('audio_chunk', { audioData: base64 });
};
source.connect(processor);
processor.connect(audioContext.destination);Frequently Asked Questions
Common questions about integrating and using the Realtime Translation API.