Daniel Rosehill Hey, It Works!
An index of Linux-friendly voice technology tools
· Daniel Rosehill

An index of Linux-friendly voice technology tools

A curated index of 100+ voice technology tools accessible to Linux desktop users, from real-time dictation to dev frameworks.

For years, being a Linux desktop user who wanted voice technology meant being mostly out of luck. Dragon NaturallySpeaking? Windows only. Apple Dictation? macOS only. Google's built-in voice typing? ChromeOS or Android. The major dictation providers ignored Linux entirely because of its small desktop market share, and the open-source alternatives were, frankly, terrible. Then in September 2022, OpenAI open-sourced Whisper, and the entire landscape transformed almost overnight.

We went from maybe a handful of usable options to an explosion of projects --- which, in true Linux fashion, created its own problem. There are now so many splintered efforts across the ecosystem that keeping track of what's available became a genuine challenge. I created this repository to gather, categorize, and organize the rapidly growing list of voice technology tools that are accessible to Linux desktop users. As of the last update, the index contains over 120 projects.

What counts as "Linux accessible"

I set a deliberately broad inclusion bar. For a tool to make the index, it needs to offer at least one of: a native Linux GUI application, a command-line interface, a self-hosted web UI, or SaaS with browser access. The focus is on speech-to-text and transcription tools rather than text-to-speech (which is a vibrant but separate ecosystem). I'm specifically interested in the post-Whisper era of modern ASR tools, not legacy projects from before open-source speech recognition became viable.

How the index is organized

Given the sheer number of projects, I've organized them along several dimensions. The most important distinction is between real-time transcription (voice typing --- tools for live dictation, which is what I personally use most) and asynchronous transcription (upload audio, get text back). Beyond that, I track Wayland support (critical for modern Linux desktops), whether projects are GPU-optimized or CPU-centric, and the STT deployment pattern (local inference, cloud API, or hybrid).

Projects also fall into three functional categories that I find useful. Pure ASR tools focus solely on transcription. ASR + rewrite tools combine speech recognition with LLM post-processing to refine transcripts into polished formats like emails or notes. ASR + action tools translate speech into actions: voice-to-code, voice-to-MCP commands, computer use agents, and similar.

The heavy hitters

Some of the most popular projects in the index (by GitHub stars) include NVIDIA's NeMo enterprise ASR toolkit, SpeechBrain's PyTorch speech processing library, Buzz (an offline transcription GUI available as Flatpak and Snap), faster-whisper (which achieves 4x speed improvements using CTranslate2), Silero VAD (enterprise-grade voice activity detection), and Vosk (an offline STT API supporting 20+ languages). For real-time dictation specifically, RealtimeSTT and WhisperLive from Collabora are both excellent.

The Wayland challenge

One of the most important sections in the index is the Wayland-compatible tools list. If you're running a modern Linux desktop --- GNOME on Wayland, KDE Plasma on Wayland (which is my setup), Hyprland, Sway, or niri --- X11 virtual input methods don't work. Voice typing tools need a way to simulate keyboard input, and on Wayland that requires either ydotool (and its daemon) or kernel-level virtual keyboard implementations. I've specifically flagged projects with explicit Wayland support because finding this information scattered across individual project READMEs is painful.

Notable Wayland-compatible projects include hyprvoice and hyprwhspr (for Hyprland), freespeak, wayland-voice-dictation, whisper-wayland, niri-transcribe, and several others. This is one of the most actively growing categories as more Linux users move to Wayland compositors.

The GPU question

A practical reality that the index tries to surface: for local STT inference, GPU acceleration is often limited to NVIDIA/CUDA. Having an NVIDIA GPU makes life significantly easier when running local Whisper models. AMD GPU users (like me, with my Radeon RX 7700 XT) can use Vulkan-accelerated whisper.cpp for inference, but the broader PyTorch/HuggingFace ecosystem still assumes CUDA. This is slowly improving with ROCm support, but it's worth knowing going in.

My personal setup and workflow

I use voice technology daily for dictation, and my setup is deliberately simple. I have a $5 USB button from AliExpress (genuinely one of my best purchases from that site) that I use as a push-to-talk trigger. Press the button, speak, release the button, text appears. After a few minutes of voice typing, the need for a dedicated hardware trigger becomes immediately apparent --- keyboard shortcuts work but feel clunky. The underlying transcription can be local or cloud-based depending on the task, and I've tried dozens of the tools in this index to find what works best for my workflow.

Understanding the Whisper ecosystem

Something that confuses newcomers is that "Whisper" isn't one thing. Because OpenAI open-sourced the model, the ecosystem has fragmented into variants: the original Whisper (maintained by OpenAI), Faster-Whisper (CTranslate2-based, significantly faster), Crisper-Whisper, whisper.cpp (C++ port for CPU/Vulkan inference), and various wrappers and integrations. The index tries to surface which Whisper variant each tool uses, because it affects performance, GPU requirements, and output quality.

Beyond Whisper entirely, there are also non-Whisper local ASR options like Vosk, NVIDIA's Parakeet models via NeMo, and FunASR from Alibaba/ModelScope. And an emerging category of cloud STT integrations is moving toward Deepgram as an alternative to OpenAI's Whisper API.

Contributing and keeping up

The voice tech space on Linux is moving fast. New projects appear regularly, existing projects add Wayland support or switch STT backends, and the overall quality keeps improving. I try to keep the index updated, and I welcome contributions. The repo also includes a projects-by-stars page with live GitHub star counts so you can see what's gaining traction, and starting-points guide for newcomers who want a recommended path rather than a catalog to browse.

If you're a Linux user interested in voice technology, I hope this index saves you the discovery time it took me to build it. Check out the full list: Linux-Friendly-Voice-Tech on GitHub.

danielrosehill/Linux-Friendly-Voice-Tech ★ 5

List of resources for voice technology with support for Linux, encompassing STT, ASR primarily as well as dev frameworks.

PythonUpdated Mar 2026
linuxspeech-to-textsttvoice-controlvoice-control-desktop