Text to Speech in the Browser: Using the Web Speech API

Build text-to-speech features with zero dependencies using the native browser API

🗣️ 🔊 💻 🎧

TL;DR: Your browser can speak out loud using the Web Speech Synthesis API -- no API keys, no third-party services, no cost. Just create a SpeechSynthesisUtterance, call speechSynthesis.speak(), and you are off. You can control voice, speed, pitch, and even highlight words as they are spoken. Works on Chrome, Firefox, Safari, and Edge.

Here is something that blows people's minds the first time they hear it (pun intended): your browser has a built-in text-to-speech engine. No API keys to manage. No third-party services to pay for. No npm packages to install. Just a few lines of JavaScript, and your web page starts talking.

Whether you want to build an accessibility feature, a language learning tool, or just make your website read your blog posts aloud (because who does not want a personal narrator?), the Web Speech Synthesis API has you covered.

Your First Talking Web Page: Two Lines of Code

I am not exaggerating. The simplest possible text-to-speech implementation is literally this:

const utterance = new SpeechSynthesisUtterance("Hello, world!");
speechSynthesis.speak(utterance);

Open your browser's developer console right now, paste that in, and your computer will say "Hello, world!" out loud. Go ahead, try it. I will wait.

🤯 Your face the first time your browser starts talking back to you

The SpeechSynthesisUtterance (yes, that is a real class name -- welcome to web APIs) represents a speech request. Think of it as a little note you hand to the browser saying "please read this out loud." And speechSynthesis.speak() is the browser saying "sure thing."

Tuning the Voice: Speed, Pitch, and Volume

The utterance object has knobs you can turn before hitting play:

const utterance = new SpeechSynthesisUtterance("Welcome to CodeToolsPro.");

utterance.rate = 1.2;    // Speed: 0.1 (glacial) to 10 (chipmunk)
utterance.pitch = 1.0;   // Pitch: 0 (Barry White) to 2 (helium balloon)
utterance.volume = 0.8;  // Volume: 0 (mute) to 1 (full blast)
utterance.lang = "en-US"; // Language tag

speechSynthesis.speak(utterance);

Set rate to 10 and pitch to 2 for the "caffeinated chipmunk reading the news" experience. Not recommended for production. Highly recommended for entertaining yourself at 2 AM.

Picking a Voice (They Are All Different)

Every browser ships with its own set of voices, and they vary wildly by platform. macOS has "Samantha" and "Alex." Windows has "David" and "Zira." Chrome throws in extra Google voices. Here is how to see what you have got:

function getVoices() {
  return new Promise(resolve => {
    let voices = speechSynthesis.getVoices();
    if (voices.length > 0) {
      resolve(voices);
      return;
    }
    // Chrome loads voices asynchronously (because of course it does)
    speechSynthesis.onvoiceschanged = () => {
      voices = speechSynthesis.getVoices();
      resolve(voices);
    };
  });
}

getVoices().then(voices => {
  voices.forEach(voice => {
    console.log(`${voice.name} (${voice.lang}) ${voice.localService ? '[local]' : '[remote]'}`);
  });
});

Heads up: In Chrome, getVoices() returns an empty array the first time you call it because voices load asynchronously. This is the number one gotcha with this API. Always use the onvoiceschanged event as a fallback.

To use a specific voice, just find it and assign it:

const voices = await getVoices();
const voice = voices.find(v => v.name === "Google UK English Female");

const utterance = new SpeechSynthesisUtterance("Hello from the UK!");
if (voice) utterance.voice = voice;
speechSynthesis.speak(utterance);

Pause, Resume, and Stop

Just like a music player, you get full playback controls:

speechSynthesis.pause();   // Pause
speechSynthesis.resume();  // Resume
speechSynthesis.cancel();  // Stop everything

// Check what's happening
speechSynthesis.speaking;  // Currently talking?
speechSynthesis.paused;    // On pause?
speechSynthesis.pending;   // Queued up and waiting?

The Cool Part: Word-by-Word Highlighting

The API fires events as it speaks, including a boundary event at each word. This means you can highlight words in your UI as they are spoken -- like karaoke, but for blog posts:

utterance.onboundary = (event) => {
  if (event.name === "word") {
    const word = utterance.text.substring(
      event.charIndex, event.charIndex + event.charLength
    );
    console.log("Speaking:", word);
    // Highlight this word in your UI
  }
};
🎤 Karaoke mode for your documentation. Your tech lead will either love it or fire you.

Building a "Read Aloud" Button

Here is a complete, production-ready implementation you can drop into any article or blog:

class ReadAloud {
  constructor(textElement, buttonElement) {
    this.textEl = textElement;
    this.btn = buttonElement;
    this.isPlaying = false;
    this.btn.addEventListener('click', () => this.toggle());
  }

  toggle() {
    if (this.isPlaying) {
      speechSynthesis.cancel();
      this.isPlaying = false;
      this.btn.textContent = "Read Aloud";
    } else {
      this.speak();
    }
  }

  speak() {
    const text = this.textEl.innerText.replace(/\s+/g, ' ').trim();
    const utterance = new SpeechSynthesisUtterance(text);
    utterance.rate = 1.1;

    utterance.onstart = () => {
      this.isPlaying = true;
      this.btn.textContent = "Stop Reading";
    };
    utterance.onend = () => {
      this.isPlaying = false;
      this.btn.textContent = "Read Aloud";
    };

    speechSynthesis.speak(utterance);
  }
}

The Gotchas You Should Know About

Chrome's 15-second bug: Chrome has a known issue where speech stops after about 15 seconds of continuous playback. The workaround is to split long text into sentences and queue them one by one:

function speakLongText(text) {
  const sentences = text.match(/[^.!?]+[.!?]+/g) || [text];
  sentences.forEach(sentence => {
    speechSynthesis.speak(new SpeechSynthesisUtterance(sentence.trim()));
  });
}

User interaction required: Browsers block auto-playing speech on page load. You need a click or tap first. This is intentional -- nobody wants a website yelling at them unexpectedly.

Voice quality varies a lot. Local (offline) voices tend to sound robotic. Google's remote voices in Chrome sound much more natural but need internet. On macOS, the Siri voices are excellent.

No file export. The API only plays through speakers -- you cannot capture the audio as a file. For that, you would need a server-side service like Google Cloud TTS or Amazon Polly.

Always check for browser support before using the API: if ('speechSynthesis' in window) { ... }. The good news is that support is excellent across Chrome, Firefox, Safari, Edge, and mobile browsers.

Try It Yourself

Type or paste any text into our Text to Speech tool to hear it read aloud. Choose from all available voices on your device, adjust speed and pitch, and test different languages.

Open Text to Speech →