Daniel Rosehill Hey, It Works!
ASR Training Data Collector: a GUI for gathering speech recognition training data
· Daniel Rosehill

ASR Training Data Collector: a GUI for gathering speech recognition training data

A GUI tool for collecting audio training data for ASR fine-tuning, with LLM-generated prompts and Hugging Face integration.

I was working on fine-tuning an ASR model and hit a surprisingly underserved niche: there's no good GUI for the basic task of generating text prompts, recording yourself reading them, and saving the audio-text pairs in a format suitable for training. So I built one with Claude Code.

The workflow

The ASR Training Data Collector handles three things: generating source truth text according to specific parameters (including domain-specific vocabulary), recording the matching audio from your microphone, and preserving the text-audio mapping in JSONL format. The text generation uses GPT-4o Mini via OpenRouter to create prompts that take about 20-30 seconds to read at an average speaking rate — the sweet spot for ASR training clips.

Gamification and motivation

Let's be honest: recording yourself reading text prompts over and over is only mildly mind-numbing. So I added a statistics panel that shows your total gathered training time and sample counts in real time. It's a small thing, but having that number tick up gives you some motivation that the effort will be worth it.

Multi-dataset support

I originally built this for one project, then immediately needed it for another. Rather than recreating the tool each time, I added dataset profile support so you can manage multiple projects. Each profile stores its own directory path, Git remote URL, and custom audio categories. It auto-detects Hugging Face and GitHub remotes, and includes built-in sync to push your collected data to Hugging Face datasets.

The philosophy

There are fancier approaches to ASR training data collection, like using existing speech-to-text output as training data. But I believe the best results come from not trying to get too clever. This tool is for those cases where you don't need a massive volume of data, but you want what you have to be clean and properly formatted. Simple is good.

This is exactly the kind of ultra-specific tool that vibe coding is perfect for. Check it out: ASR-Training-Data-Collector on GitHub

danielrosehill/ASR-Training-Data-Collector View on GitHub