Ephes Blog

🎙️ Introducing podcast-transcript: Audio Transcription Made Simple

Nov. 23, 2024, Jochen

Hey folks! I recently built a little command-line tool called podcast-transcript that turns audio into text. While it started as a podcast transcription project during the PyDDF autumn sprint, it works great for any speech audio. The coolest part? It can transcribe a 2-hour podcast in about 90 seconds!

Quick Start 🚀

pip install podcast-transcript  # or use pipx or uvx
transcribe https://d2mmy4gxasde9x.cloudfront.net/cast_audio/pp_53.mp3

Why Groq?

After trying different approaches, I landed on using the Groq API for transcription. Here's why:

It's blazing fast
Getting an API key is free and API usage is free (with reasonable limits: 8 hours of audio per day, 2 hours per hour)
The Whisper large-v3 model handles multiple languages well (especially noticeable for German content)

Technical Bits

The tool handles some interesting challenges under the hood:

Automatically resamples audio to 16kHz mono before upload (if you don't do it before, Groq will after upload)
Splits files larger than 25MB into chunks and stitches the transcripts back together
Uses httpx for direct API calls to get detailed JSON responses inspired by Simon Willison’s approach
Outputs in multiple formats: DOTe JSON, Podlove JSON, WebVTT, and plain text

Future Plans

I'm planning to add support for local transcription using the OpenAI Whisper model. While Whisper v2 works well enough for English content, v3 shows notable improvements for other languages (especially German). I initially skipped local processing because of the PyTorch dependency, but it's on the roadmap! I also plan to add multitrack support for handling audio files with separate speaker tracks.

The code is open source and contributions are welcome. Let me know if you try it out!

Ephes Blog

Filters / Fulltext-Search

Quick Start 🚀

Why Groq?

Technical Bits

Future Plans