Reading Aloud Using AIVoiceSpeech from a Web Page

Table of contents

The purpose
1. Prerequisites
Environment
Start AivisSpeech-Engine（server）
Acquiring SpeakerID
Code
Result
Reference

The purpose

On the following page, text was read aloud using SpeechSynthesis, a basic JavaScript function, but the pronunciation was not very good.

Therefore, I will perform the read-aloud using AIVoiceSpeech, which has clearer pronunciation.

Prerequisites

This time, we will use a local web application as an example (assuming access from http://localhost/. File:// cannot be used either). This is due to a CORS Policy issue. Although it seems possible to allow access from other PCs by changing settings, I have not tried it.

Environment

AivisSpeech Engine version 1.0.0

Start AivisSpeech-Engine（server）

If you have not installed AIVoiceSpeech, please refer to the following page for installation instructions.

Execute the following command from the command line to start the server:

C:\Program Files\AivisSpeech\AivisSpeech-Engine¥run.exe

If you performed a per-user installation, the path will be as follows:

\AppData\Local\Programs\AivisSpeech\AivisSpeech-Engine\run.exe

Initially, it will take some time because the model needs to be downloaded. (Even if you’ve used AIVoiceSpeech before, the download will occur if this is your first time directly launching AIVoiceSpeech-Engine.)

[2025/01/27 09:02:52] INFO:     Started server process [19536]
[2025/01/27 09:02:52] INFO:     Waiting for application startup.
[2025/01/27 09:02:52] INFO:     Application startup complete.
[2025/01/27 09:02:52] INFO:     Uvicorn running on http://localhost:10101 (Press CTRL+C to quit)

Once the above logs are displayed, open http://localhost:10101 in a browser like Chrome.

If it looks like the image below, the server is running correctly.

Acquiring SpeakerID

Access the following URL:

http://localhost:10101/speakers

Check “Pretty Print” for better readability. Note down the id of the settings you want to use (be careful, it’s not speaker_uuid). Here, we will use 888753760 for Anneli’s Normal setting.

Code

Now, let’s create the HTML and JavaScript. However, requesting AIVoiceSpeech-Engine from File:// will fail, so you’ll need to start a local server using Node.js or LiveServer.

For instructions on how to use LiveServer, refer to the following:

index.html

<!doctype html>
<html lang="en">

<head>
  <meta charset="UTF-8" />
  <title>TEST</title>
</head>

<body>
  <input type="text" id="text" class="txt" value="読み上げます" required><br><br>
  <input type="button" value="読み上げ" id="execute"><br><br>
  <script type="module" src="/src/main.js"></script>
</body>

</html>

In the HTML, I created a text box for the text to be read aloud and a button to initiate the reading.

main.js（In the code, you’ll replace 888753760 with the value you obtained for SpeakerID.）

document.getElementById("execute").onclick = function (event) {
  readText(document.getElementById("text").value)
};

function readText(text) {
  const xhr = new XMLHttpRequest();
  const url = 'http://localhost:10101/audio_query?speaker=888753760&text=' + text;
  xhr.open("POST", url, false);
  xhr.send();
  const res_str = xhr.responseText;

  const xhr_synth = new XMLHttpRequest();
  const url_synth = 'http://localhost:10101/synthesis?speaker=888753760';
  xhr_synth.open("POST", url_synth);
  xhr_synth.setRequestHeader("Content-Type", "application/json");
  xhr_synth.responseType = "arraybuffer";
  xhr_synth.onreadystatechange = async () => {
    if (xhr_synth.readyState === XMLHttpRequest.DONE && xhr_synth.status === 200) {
      const context = new AudioContext();
      const audioBuffer = await (new Promise((res, rej) => {
        context.decodeAudioData(xhr_synth.response, res, rej);
      }));
      const source = context.createBufferSource();
      source.buffer = audioBuffer;
      source.connect(context.destination);
      source.start(0);
    }
  }
  xhr_synth.send(res_str);
}

Here’s a rough outline of the process:

You’ll send the information to be read aloud to the server via a POST request to http://localhost:10101/audio_query.

  const xhr = new XMLHttpRequest();
  const url = 'http://localhost:10101/audio_query?speaker=888753760&text=' + text;
  xhr.open("POST", url, false);
  xhr.send();
  const res_str = xhr.responseText;

Using the JSON obtained from the above request, you will create a WAV file with a POST request to http://localhost:10101/synthesis.

  const xhr_synth = new XMLHttpRequest();
  const url_synth = 'http://localhost:10101/synthesis?speaker=888753760';
  xhr_synth.open("POST", url_synth);
  xhr_synth.setRequestHeader("Content-Type", "application/json");
  xhr_synth.responseType = "arraybuffer";
  xhr_synth.send(res_str);

Once the WAV download is complete, use the following code to play the WAV file:

      const context = new AudioContext();
      const audioBuffer = await (new Promise((res, rej) => {
        context.decodeAudioData(xhr_synth.response, res, rej);
      }));
      const source = context.createBufferSource();
      source.buffer = audioBuffer;
      source.connect(context.destination);
      source.start(0);
    }

Result

successfully performed text-to-speech using AIVoiceSpeech.

Reference

GitHub - Aivis-Project/AivisSpeech-Engine: AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine

AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine - Aivis-Project/AivisSpeech-Engine