Translation pipeline
The backend separates ingestion, translation, validation, and reconstruction so each file format can keep its structure while the translation engine focuses on meaning, terminology, and consistency.
Uploaded files are validated, parsed, and converted into structured translatable units before model calls begin.
- File type and MIME checks reject unsupported uploads before work enters the queue.
- Documents are extracted into text nodes, sheets, slides, segments, or structured JSON depending on format.
- Glossary terms and ignored words are mapped to safe identifiers so brand names, placeholders, and protected phrases survive translation.
The system routes prepared content through selected LLMs and cloud translation providers with context-aware prompts.
- OpenAI, Claude, Gemini, Grok, Mistral, Google Cloud, and AWS translation services are supported across the backend.
- Content is chunked around model token limits while preserving surrounding context where the format allows it.
- Redis-backed Bull workers process document translation jobs asynchronously so long-running files do not block the API.
After translation, the output is checked against expected structure and corrected when model output is incomplete or malformed.
- Progress is tracked through pre-processing, translation, and post-processing phases.
- Invalid JSON, XML, document nodes, or missing structure can trigger repair and retry logic.
- Translation history records status, word counts, file size, model choice, and processing percentage.
Validated translations are placed back into the original document structure, stored, and exposed only through authorized flows.
- Protected terms and glossary mappings are restored before the translated file is finalized.
- Translated documents are reconstructed into their target file format and uploaded to secure S3 storage.
- Access checks, user ownership, organization context, and download flows keep translation results scoped to the right account.