The team behind OnlineTools4Free — building free, private browser tools.
Published Mar 15, 2026 · 8 min read · Reviewed by OnlineTools4Free
Text to Speech: Web Speech API & Accessibility
What Is Text to Speech?
Text to Speech (TTS) is the technology that converts written text into spoken audio. Modern TTS systems have moved far beyond the robotic voices of the past. Today's engines produce natural-sounding speech with appropriate intonation, pacing, and emphasis that closely resembles human reading.
TTS has three primary use cases: accessibility for people who cannot read visual text, convenience for consuming content hands-free (audiobooks, navigation, notifications), and language learning where hearing correct pronunciation is essential.
Every major operating system and browser now includes built-in TTS capabilities, making it accessible to web developers without external APIs or paid services. Try it instantly with our Text to Speech tool.
The Web Speech API
The Web Speech API is a browser-native JavaScript API that provides both speech recognition (speech-to-text) and speech synthesis (text-to-speech). The synthesis part is what concerns us here.
Using it requires just two lines of code: create a SpeechSynthesisUtterance object with your text and call speechSynthesis.speak() to hear it aloud. No API keys, no network requests, no dependencies. The speech is generated locally on the user's device using voices installed on their operating system.
Controlling the Voice
You can customize the speech with several properties:
- voice: Select from available voices using speechSynthesis.getVoices(). Each device offers different voices depending on the OS and installed language packs.
- rate: Speaking speed, from 0.1 (very slow) to 10 (extremely fast). Normal speech is around 1.0.
- pitch: Voice pitch, from 0 to 2. Default is 1. Lower values sound deeper; higher values sound higher-pitched.
- volume: From 0 (silent) to 1 (full volume).
- lang: Language code like en-US, fr-FR, or de-DE. The browser selects a matching voice automatically.
Browser Support and Voice Availability
The Web Speech API is supported in all modern browsers: Chrome, Edge, Safari, and Firefox. However, there are important differences in implementation:
- Chrome: Offers both local voices and Google's online voices. Online voices sound significantly more natural but require an internet connection. Chrome has a quirk where very long text may stop mid-sentence; splitting text into shorter chunks resolves this.
- Safari: Uses Apple's built-in voices, which are high quality on macOS and iOS. The Siri voices are particularly natural-sounding. Safari has the most consistent implementation across devices.
- Firefox: Uses the operating system's speech synthesis engine. On Windows this means Microsoft voices; on macOS, Apple voices; on Linux, eSpeak or Speech Dispatcher. Voice quality depends entirely on the OS.
- Edge: Shares Chrome's engine (Chromium) and also offers Microsoft's neural voices, which are among the most natural-sounding voices available in any browser.
The available voices vary by device. A Windows machine might offer 20+ Microsoft voices across many languages, while a basic Linux install might only have a single English voice. Always provide a voice selection UI so users can choose the best available option on their system.
Accessibility Benefits of TTS
Text to Speech is a critical accessibility tool for several groups of users:
- Visually impaired users: TTS is the backbone of screen readers like JAWS, NVDA, and VoiceOver. These tools read the entire interface aloud, including navigation, forms, and content.
- Dyslexia: People with dyslexia often comprehend text better when they can hear it spoken while following along visually. Synchronized highlighting (where the current word is highlighted as it is spoken) is particularly helpful.
- Cognitive disabilities: Users with attention or processing difficulties benefit from having text read aloud because listening requires less cognitive load than reading for some individuals.
- Language learners: Hearing correct pronunciation while reading helps learners connect written and spoken forms of a language. TTS with adjustable speed is especially useful for beginners.
- Situational impairments: Drivers, cooks, or anyone with their hands or eyes occupied can consume content through TTS that they could not read in that moment.
When building a website or application, adding a Read Aloud button using the Web Speech API is a low-effort, high-impact accessibility improvement. It costs nothing, requires no backend, and benefits a wide range of users.
Implementation Tips for Developers
If you are adding TTS to a web application, these practical tips will save you debugging time:
- Load voices asynchronously: On Chrome, getVoices() returns an empty array on first call. Listen for the voiceschanged event and populate your voice list when it fires.
- Chunk long text: Chrome stops speaking after approximately 15 seconds of continuous speech. Split your text into sentence-length utterances and queue them. This also allows you to track progress through the document.
- Handle pause and resume: The API supports speechSynthesis.pause() and speechSynthesis.resume(). Always provide pause/resume controls in your UI.
- Cancel before re-speaking: Always call speechSynthesis.cancel() before starting a new utterance. Otherwise, utterances queue up and the user hears a backlog of previous requests.
- Respect user preferences: Some users may have already configured their system screen reader. Avoid auto-playing TTS on page load, as it can conflict with their existing setup. Let the user trigger speech with a button.
- Provide visual feedback: Highlight the text being spoken so users can follow along. Use the boundary event on the utterance to track word-by-word progress.
Test text-to-speech output with our Text to Speech tool to hear how different voices and settings sound before implementing them in your application.
Beyond the Browser: Cloud TTS Services
When browser-native TTS is not sufficient (for example, when you need consistent voice quality across all devices or need to generate audio files), cloud TTS services offer higher-quality output:
- Google Cloud Text-to-Speech: Offers WaveNet and Neural2 voices that sound remarkably human. Supports SSML for fine-grained control over pronunciation, pauses, and emphasis.
- Amazon Polly: AWS service with neural and standard voices in dozens of languages. Outputs MP3, OGG, or PCM audio files. Good for generating audiobook-style content.
- Microsoft Azure Cognitive Services: Neural voices with emotional styles (cheerful, sad, angry). Useful for conversational AI and interactive voice applications.
- ElevenLabs: Specializes in voice cloning and ultra-realistic speech. Can replicate specific voices from audio samples. Popular for content creators and podcasters.
These services require API keys and typically charge per character or per minute of generated audio. For most websites, the browser's built-in Web Speech API is sufficient and free. Reserve cloud services for applications where voice quality is a core product feature.
Text to Speech
Listen to text using Web Speech API with voice selection, speed, and pitch controls.
OnlineTools4Free Team
The OnlineTools4Free Team
We are a small team of developers and designers building free, privacy-first browser tools. Every tool on this platform runs entirely in your browser — your files never leave your device.
