Audio Translation

Real-time audio translation app with a FastAPI backend and a Vite + React UI. It auto-detects the spoken language during transcription, then translates the text to English or Arabic. You can extend the UI to support 150+ languages that Gemma can handle by adding new language options.

Repository layout

backend/: FastAPI service for transcription + translation
frontend/: React UI

Demo assets

Recording example:

video_test_the_system.mp4

Upload example (Hindi audio):

upload_audio_system.mp4

Sample Hindi audio file you can try: what_do_you_do_in_hindi.wav

Prerequisites

Python 3.11.9 recommended (3.10+ supported). Python 3.13 may fail due to pydub/audioop.
Node.js 18+
ffmpeg installed (used by pydub to convert audio)

Hardware Requirements

Recommended: 2× GPUs (24GB VRAM each or higher)
Designed for large models and multi-GPU workloads

If your hardware does not meet these requirements, you can still try the project using smaller / lightweight models, but change the models in the .env, although performance and accuracy may be reduced.

Configuration (.env)

Use a single env file at the repo root for both Docker Compose and the backend.

cp .env.example .env
# Edit .env with your HUGGING_FACE_HUB_TOKEN, models, ports, and base URLs

Inference servers (Docker Compose)

The dev compose file builds a vllm-audio image and runs Whisper + Gemma.

docker compose --env-file .env -f backend/dev_env/docker-compose.yaml build whisper
docker compose --env-file .env -f backend/dev_env/docker-compose.yaml up -d

Notes:

Requires Linux host networking (network_mode: host).
Use WHISPER_CUDA_VISIBLE_DEVICES in .env to select a GPU.

Stop the services:

docker compose --env-file .env -f backend/dev_env/docker-compose.yaml stop

First time only: you must run the build whisper command before up.

Backend setup (uv) (recommended)

Uses backend/pyproject.toml and backend/uv.lock.

cd backend
uv sync --python 3.11.9
# Ensure .env exists at the repo root
uv run uvicorn app.main:app --reload --port 9100

Backend setup (venv)

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Required configuration
# Ensure .env exists at the repo root
uvicorn app.main:app --reload --port 9100

Backend setup (conda)

conda create -n audio-translation python=3.11.9
conda activate audio-translation
cd backend
pip install -r requirements.txt
# Ensure .env exists at the repo root
uvicorn app.main:app --reload --port 9100

Frontend setup

cd frontend
npm install
export VITE_API_URL="http://localhost:9100"
npm run dev

How it works

The UI records or uploads audio and sends it to the backend.
The backend transcribes audio with Whisper and streams the translation from Gemma.

If you run into any problems, feel free to reach out or open an issue in the repo.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
backend		backend
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Translation

Repository layout

Demo assets

Prerequisites

Hardware Requirements

Configuration (.env)

Inference servers (Docker Compose)

Backend setup (uv) (recommended)

Backend setup (venv)

Backend setup (conda)

Frontend setup

How it works

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Audio Translation

Repository layout

Demo assets

Prerequisites

Hardware Requirements

Configuration (.env)

Inference servers (Docker Compose)

Backend setup (uv) (recommended)

Backend setup (venv)

Backend setup (conda)

Frontend setup

How it works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages