🎙️ Introducing podcast-transcript: Audio Transcription Made Simple
,Hey folks! I recently built a little command-line tool called podcast-transcript that turns audio into text. While it started as a podcast transcription project during the PyDDF autumn sprint, it works great for any speech audio. The coolest part? It can transcribe a 2-hour podcast in about 90 seconds!
Quick Start 🚀
pip install podcast-transcript # or use pipx or uvx
transcribe https://d2mmy4gxasde9x.cloudfront.net/cast_audio/pp_53.mp3
Why Groq?
After trying different approaches, I landed on using the Groq API for transcription. Here's why:
- It's blazing fast
- Getting an API key is free and API usage is free (with reasonable limits: 8 hours of audio per day, 2 hours per hour)
- The Whisper large-v3 model handles multiple languages well (especially noticeable for German content)
Technical Bits
The tool handles some interesting challenges under the hood:
- Automatically resamples audio to 16kHz mono before upload (if you don't do it before, Groq will after upload)
- Splits files larger than 25MB into chunks and stitches the transcripts back together
- Uses httpx for direct API calls to get detailed JSON responses inspired by Simon Willison’s approach
- Outputs in multiple formats: DOTe JSON, Podlove JSON, WebVTT, and plain text
Future Plans
I'm planning to add support for local transcription using the OpenAI Whisper model. While Whisper v2 works well enough for English content, v3 shows notable improvements for other languages (especially German). I initially skipped local processing because of the PyTorch dependency, but it's on the roadmap! I also plan to add multitrack support for handling audio files with separate speaker tracks.
The code is open source and contributions are welcome. Let me know if you try it out!