Local browser-based transcription app for Apple Silicon Macs. It runs a FastAPI backend with a small vanilla frontend and uses mlx-whisper for on-device speech-to-text.
- Runs locally on Apple Silicon.
- Browser-based microphone recording and audio file upload.
- Local transcription with configurable language and MLX Whisper model size.
- Transcript metadata in the UI, including language, model, and detected duration.
- Copy-to-clipboard and save-as-
.txtactions for completed transcripts. - Configurable host, port, cache directory, upload limits, and browser status strings.
- Automatic per-model download and reuse through a local cache.
- Remembers the last selected language and model in the browser.
- Simple JSON API for programmatic use.
- macOS on Apple Silicon
- Python 3.12+
uv
uv sync
uv run uvicorn app.main:app --reloadOpen http://127.0.0.1:8000 in your browser.
In the browser UI, choose a language and model, then either record in the browser or upload an audio file. The app shows recording and processing state, clears old output when a new recording starts, and displays language, model, and duration metadata with the result. Finished transcripts can be copied to the clipboard or saved locally as plain .txt files.
Runtime settings live in config.yaml.
Key options:
server.hostandserver.portcontrol where the app listens.logging.levelcontrols structured application logging.transcription.cache_dirstores downloaded Whisper model files.transcription.max_upload_size_mblimits upload size.transcription.upload_chunk_size_mbcontrols temp-file streaming chunk size while uploads are staged.transcription.default_upload_filenamedefines the fallback filename used for browser uploads.transcription.supported_languagesdefines the language picker and backend validation.transcription.supported_model_sizesdefines the available MLX Whisper models.ui.*tunes browser-side status text, timer labels, and button copy without editing JavaScript.
The default configuration includes German and English, a 100 MB upload limit, and tiny, base, small, medium, and large model options.
Model files are downloaded the first time a model size is used and then reused from the configured cache directory. Switching to a model that is not cached yet triggers a one-time download into a per-model directory under transcription.cache_dir.
The backend exposes one transcription endpoint:
POST /api/transcriptions
Multipart form fields:
audio: uploaded audio filelanguage: configured language code such asdeorenmodel_size: configured model id such assmallorlarge
Successful responses return JSON with:
transcriptlanguagemodel_sizeduration_seconds
Error responses use a consistent shape:
{
"error": {
"code": "unsupported_language",
"message": "Choose German or English."
}
}The API returns 400 for invalid requests or unsupported choices, 413 for oversized uploads, and 500 if transcription fails.
Install dependencies:
uv syncRun checks:
uv run ruff format .
uv run ruff check .
uv run pytest -vapp/: FastAPI app, API routes, services, and static frontend assetstests/: API and service testsconfig.yaml: runtime configuration
This project is currently focused on a local, minimal transcription workflow.
MIT License
