ASR Training Data Collector: a GUI for gathering speech recognition training data
A GUI tool for collecting audio training data for ASR fine-tuning, with LLM-generated prompts and Hugging Face integration.
I was working on fine-tuning an ASR model and hit a surprisingly underserved niche: there's no good GUI for the basic task of generating text prompts, recording yourself reading them, and saving the audio-text pairs in a format suitable for training. So I built one with Claude Code.
The workflow
The ASR Training Data Collector handles three things: generating source truth text according to specific parameters (including domain-specific vocabulary), recording the matching audio from your microphone, and preserving the text-audio mapping in JSONL format. The text generation uses GPT-4o Mini via OpenRouter to create prompts that take about 20-30 seconds to read at an average speaking rate — the sweet spot for ASR training clips.
Gamification and motivation
Let's be honest: recording yourself reading text prompts over and over is only mildly mind-numbing. So I added a statistics panel that shows your total gathered training time and sample counts in real time. It's a small thing, but having that number tick up gives you some motivation that the effort will be worth it.
Multi-dataset support
I originally built this for one project, then immediately needed it for another. Rather than recreating the tool each time, I added dataset profile support so you can manage multiple projects. Each profile stores its own directory path, Git remote URL, and custom audio categories. It auto-detects Hugging Face and GitHub remotes, and includes built-in sync to push your collected data to Hugging Face datasets.
The philosophy
There are fancier approaches to ASR training data collection, like using existing speech-to-text output as training data. But I believe the best results come from not trying to get too clever. This tool is for those cases where you don't need a massive volume of data, but you want what you have to be clean and properly formatted. Simple is good.
This is exactly the kind of ultra-specific tool that vibe coding is perfect for. Check it out: ASR-Training-Data-Collector on GitHub
danielrosehill/ASR-Training-Data-Collector View on GitHub