GPT Translator Logo
Live & Streaming

Realtime Translation API

Add live speech-to-speech translation to any application in minutes. Stream audio in, receive translated audio and transcripts back — all over a single WebSocket connection.

Why Realtime Translation API?

Built for applications that need instant, continuous translation. One persistent connection handles the entire session — no polling, no delays, no complexity.

Ultra-Low Latency

Streaming audio is translated and returned in real time. No request-response round trips — results arrive as the speaker talks.

Audio In, Audio Out

Send raw microphone audio and receive translated speech back. Both input and output transcripts are also streamed so you can display captions live.

130+ Languages

Translate to any major language using standard BCP-47 language codes. Switch target languages between sessions with no SDK changes.

Secure by Default

Every connection is authenticated with your JWT token. Sessions are isolated per user — your audio is never shared or stored.

Getting Started

Connect & Authenticate

The Realtime Translation API runs over a dedicated Socket.IO namespace. Pass your API key as a query parameter when creating the connection — the server validates it before any session can begin.

Endpoint
wss://api.gpttranslator.co
  └── namespace: /api/realtime-translator
JavaScript — Socket.IO
import { io } from 'socket.io-client';

// Connect to the Realtime Translation namespace
const socket = io('https://api.gpttranslator.co/api/realtime-translator', {
  transports: ['websocket'],
  query: {
    apiKey: 'YOUR_API_KEY'  // your GPT Translator API key
  },
});

Note: Obtain your API key from the GPT Translator dashboard under API Keys. Connections without a valid API key are rejected immediately.

Never expose your API key in client-side code

The code example above is for illustration only. Embedding your API key directly in browser JavaScript exposes it to anyone who inspects your page source or network traffic — anyone can then use your key and drain your token balance.

Recommended: proxy through your own backend

Your server holds the API key securely and opens the Socket.IO connection on behalf of the browser. The browser connects to your server, never to the translation API directly.

Node.js backend (server-side)
// Your backend holds the key — the browser never sees it
const socket = io('https://api.gpttranslator.co/api/realtime-translator', {
  transports: ['websocket'],
  query: { apiKey: process.env.GPT_TRANSLATOR_API_KEY },
});

// Proxy events between the browser client and the translation API
browserSocket.on('translation_start', (data) => socket.emit('translation_start', data));
browserSocket.on('audio_chunk', (chunk) => socket.emit('audio_chunk', chunk));
browserSocket.on('translation_stop', () => socket.emit('translation_stop'));

socket.on('output_audio', (data) => browserSocket.emit('output_audio', data));
socket.on('output_transcript', (data) => browserSocket.emit('output_transcript', data));
socket.on('input_transcript', (data) => browserSocket.emit('input_transcript', data));
Session Flow

Session Lifecycle

Every translation session follows a simple five-step sequence. Understanding this order will help you build a robust integration.

1

Start a session

you emit

Emit translation_start with the target language code. The server reserves your token balance and begins initializing the session.

socket.emit('translation_start', { "targetLanguage": "es" });
2

Wait for readiness signals

you listen

The server emits three status events in order: session_initializing, translation_ready, and finally ready_for_audio. Only start sending audio after ready_for_audio.

socket.on('session_initializing', callback);
socket.on('translation_ready', callback);
socket.on('ready_for_audio', callback);
3

Stream audio chunks

you emit

Continuously emit audio_chunk events with base64-encoded PCM16 audio captured from the microphone. Keep chunks small (around 100 ms each) for the lowest latency.

socket.emit('audio_chunk', { "audioData": "<base64-pcm16-string>" });
4

Receive translated output

you listen

As translation happens, the server streams back output_audio (translated speech), output_transcript (translated text), and input_transcript (recognized source text) — all as incremental deltas.

socket.on('output_audio', callback);
socket.on('output_transcript', callback);
socket.on('input_transcript', callback);
5

End the session

you emit

Emit translation_stop to gracefully close the session. Token usage is finalized and a translation_closed event confirms the session has ended.

socket.emit('translation_stop');
API Reference

Events Reference

A complete reference of every event in the Realtime Translation API — what you send and what you receive.

Events You Emit

you emit
translation_start

Starts a new translation session. Pass the BCP-47 language code for the target language.

{
  "targetLanguage": "es"  // BCP-47 language code (e.g. "fr", "de", "zh", "ja")
}
audio_chunk

Sends a chunk of raw microphone audio. The audioData field must be a base64-encoded PCM16 mono audio buffer sampled at 24 000 Hz.

{
  "audioData": "<base64-encoded PCM16 mono audio>"
  // Sample rate: 24 000 Hz
  // Encoding: 16-bit PCM, little-endian
  // Send continuously while the user speaks
}
translation_stop

Gracefully ends the session. The server finalises billing and emits translation_closed.

Events You Listen To

you listen
session_initializingstatus

Emitted immediately after translation_start. Indicates the session is being set up — not yet ready for audio.

translation_readystatus

Emitted once the server-side connection is established. Still waiting for final session activation.

ready_for_audiostatus

The session is fully active. Begin streaming audio_chunk events now.

output_audiostream

A chunk of translated speech audio. The delta is a base64-encoded PCM16 audio buffer. Buffer consecutive deltas and play them in order.

{
  "delta": "<base64-encoded PCM16 audio chunk>"
  // Decoded and played directly to the output speaker
  // Arrives incrementally — buffer chunks for smooth playback
}
output_transcriptstream

A fragment of the translated text. Append deltas together to build the full translated transcript in your UI.

{
  "delta": "Hola, ¿cómo estás?"  // translated text fragment
}
input_transcriptstream

A fragment of the recognised source speech. Append deltas together to build the source-language caption.

{
  "delta": "Hello, how are you?"  // recognized source speech fragment
}
translation_closedstatus

Confirms the session has ended cleanly — either after translation_stop or when the server closes the session due to exhausted tokens.

translation_errorerror

An unexpected error occurred. The message field explains what went wrong. The session may still be active — you can retry or call translation_stop.

{
  "message": "Translation session error. Please try again."
}
insufficient_tokenserror

Your token balance is exhausted. The session is automatically closed. Upgrade your plan to continue.

{
  "message": "You have reached the limit of words for translation...",
  "remainingTokens": 0
}
Code Example

Complete Integration Example

A minimal but complete JavaScript example showing how to connect, start a session, stream audio, handle all server events, and cleanly stop the session.

realtime-translator.js
import { io } from 'socket.io-client';

const socket = io('https://api.gpttranslator.co/api/realtime-translator', {
  transports: ['websocket'],
  query: { apiKey: 'YOUR_API_KEY' },
});

// ─── Listen for server events ───────────────────────────────────────

socket.on('session_initializing', () => {
  console.log('Session is being prepared...');
});

socket.on('translation_ready', () => {
  console.log('Session connected. Waiting for activation...');
});

socket.on('ready_for_audio', () => {
  console.log('Ready! Start streaming audio chunks now.');
  startMicrophone(); // begin capturing and sending audio
});

socket.on('output_audio', ({ delta }) => {
  // delta is a base64-encoded PCM16 audio chunk
  playAudioChunk(atob(delta));
});

socket.on('output_transcript', ({ delta }) => {
  // Append translated text delta to your UI
  appendToTranscript('translated', delta);
});

socket.on('input_transcript', ({ delta }) => {
  // Append recognized source speech to your UI
  appendToTranscript('original', delta);
});

socket.on('translation_closed', () => {
  console.log('Session ended.');
  stopMicrophone();
});

socket.on('translation_error', ({ message }) => {
  console.error('Error:', message);
  stopMicrophone();
});

socket.on('insufficient_tokens', ({ message, remainingTokens }) => {
  console.warn('Out of tokens:', message, 'remaining:', remainingTokens);
  stopMicrophone();
  showUpgradePrompt();
});

// ─── Start a session ────────────────────────────────────────────────

socket.emit('translation_start', { targetLanguage: 'es' });

// ─── Stream audio chunks ────────────────────────────────────────────

function onAudioData(pcm16Base64) {
  socket.emit('audio_chunk', { audioData: pcm16Base64 });
}

// ─── Stop the session ───────────────────────────────────────────────

function stopSession() {
  socket.emit('translation_stop');
}
Billing

How Billing Works

Realtime translation is billed by session duration — the time between the session being activated and translation_stop or token exhaustion. No per-character or per-word counting.

Per-Second Billing

Usage is measured in seconds from the moment your session is fully active until it ends. Partial seconds are rounded up.

📊

Tokens Deducted Live

Tokens are deducted from your balance at the end of each session based on actual duration. Your available balance is checked before a session can start.

🔒

Automatic Cap

The session is automatically terminated when your remaining token balance would run out. You receive an insufficient_tokens event before the session closes.

Estimated Token Consumption

1 minute
~3 000 tokens
Good for quick spot checks
5 minutes
~15 000 tokens
Short meeting or interview
30 minutes
~90 000 tokens
Extended conversation
1 hour
~180 000 tokens
Conference or lecture
Session Duration
Approx. Tokens Used
Notes

Note: Token estimates are approximate. Actual consumption depends on audio noise levels and session activity. Check your dashboard for real-time usage.

Audio Guide

Audio Format Requirements

The API expects a specific audio format for best accuracy and lowest latency. Use the code snippet below to capture and convert microphone audio in the browser.

FormatPCM 16-bit, little-endian
Sample Rate24 000 Hz
Channels1 (mono)
Transfer EncodingBase64 string
Browser microphone capture
// Capture PCM16 audio from the browser microphone
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 24000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);

processor.onaudioprocess = (e) => {
  const float32 = e.inputBuffer.getChannelData(0);

  // Convert Float32 → Int16 PCM
  const int16 = new Int16Array(float32.length);
  for (let i = 0; i < float32.length; i++) {
    int16[i] = Math.max(-32768, Math.min(32767, float32[i] * 32768));
  }

  // Base64-encode and send
  const base64 = btoa(String.fromCharCode(...new Uint8Array(int16.buffer)));
  socket.emit('audio_chunk', { audioData: base64 });
};

source.connect(processor);
processor.connect(audioContext.destination);

Frequently Asked Questions

Common questions about integrating and using the Realtime Translation API.