Real-time audio translation app with a FastAPI backend and a Vite + React UI. It auto-detects the spoken language during transcription, then translates the text to English or Arabic. You can extend the UI to support 150+ languages that Gemma can handle by adding new language options.
backend/: FastAPI service for transcription + translationfrontend/: React UI
Recording example:
video_test_the_system.mp4
Upload example (Hindi audio):
upload_audio_system.mp4
Sample Hindi audio file you can try: what_do_you_do_in_hindi.wav
- Python 3.11.9 recommended (3.10+ supported). Python 3.13 may fail due to
pydub/audioop. - Node.js 18+
ffmpeginstalled (used bypydubto convert audio)
- Recommended: 2× GPUs (24GB VRAM each or higher)
- Designed for large models and multi-GPU workloads
If your hardware does not meet these requirements, you can still try the project using smaller / lightweight models, but change the models in the .env, although performance and accuracy may be reduced.
Use a single env file at the repo root for both Docker Compose and the backend.
cp .env.example .env
# Edit .env with your HUGGING_FACE_HUB_TOKEN, models, ports, and base URLsThe dev compose file builds a vllm-audio image and runs Whisper + Gemma.
docker compose --env-file .env -f backend/dev_env/docker-compose.yaml build whisper
docker compose --env-file .env -f backend/dev_env/docker-compose.yaml up -dNotes:
- Requires Linux host networking (
network_mode: host). - Use
WHISPER_CUDA_VISIBLE_DEVICESin.envto select a GPU.
Stop the services:
docker compose --env-file .env -f backend/dev_env/docker-compose.yaml stopFirst time only: you must run the build whisper command before up.
Uses backend/pyproject.toml and backend/uv.lock.
cd backend
uv sync --python 3.11.9
# Ensure .env exists at the repo root
uv run uvicorn app.main:app --reload --port 9100cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Required configuration
# Ensure .env exists at the repo root
uvicorn app.main:app --reload --port 9100conda create -n audio-translation python=3.11.9
conda activate audio-translation
cd backend
pip install -r requirements.txt
# Ensure .env exists at the repo root
uvicorn app.main:app --reload --port 9100cd frontend
npm install
export VITE_API_URL="http://localhost:9100"
npm run dev- The UI records or uploads audio and sends it to the backend.
- The backend transcribes audio with Whisper and streams the translation from Gemma.
If you run into any problems, feel free to reach out or open an issue in the repo.