The purpose
On the following page, text was read aloud using SpeechSynthesis, a basic JavaScript function, but the pronunciation was not very good.
Therefore, I will perform the read-aloud using AIVoiceSpeech, which has clearer pronunciation.
Prerequisites
This time, we will use a local web application as an example (assuming access from http://localhost/. File:// cannot be used either). This is due to a CORS Policy issue. Although it seems possible to allow access from other PCs by changing settings, I have not tried it.
Environment
AivisSpeech Engine version 1.0.0
Start AivisSpeech-Engine(server)
If you have not installed AIVoiceSpeech, please refer to the following page for installation instructions.
Execute the following command from the command line to start the server:
C:\Program Files\AivisSpeech\AivisSpeech-Engine¥run.exeIf you performed a per-user installation, the path will be as follows:
\AppData\Local\Programs\AivisSpeech\AivisSpeech-Engine\run.exeInitially, it will take some time because the model needs to be downloaded. (Even if you’ve used AIVoiceSpeech before, the download will occur if this is your first time directly launching AIVoiceSpeech-Engine.)
[2025/01/27 09:02:52] INFO: Started server process [19536]
[2025/01/27 09:02:52] INFO: Waiting for application startup.
[2025/01/27 09:02:52] INFO: Application startup complete.
[2025/01/27 09:02:52] INFO: Uvicorn running on http://localhost:10101 (Press CTRL+C to quit)Once the above logs are displayed, open http://localhost:10101 in a browser like Chrome.
If it looks like the image below, the server is running correctly.

Acquiring SpeakerID
Access the following URL:
http://localhost:10101/speakersCheck “Pretty Print” for better readability. Note down the id of the settings you want to use (be careful, it’s not speaker_uuid). Here, we will use 888753760 for Anneli’s Normal setting.

Code
Now, let’s create the HTML and JavaScript. However, requesting AIVoiceSpeech-Engine from File:// will fail, so you’ll need to start a local server using Node.js or LiveServer.
For instructions on how to use LiveServer, refer to the following:
index.html
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>TEST</title>
</head>
<body>
<input type="text" id="text" class="txt" value="読み上げます" required><br><br>
<input type="button" value="読み上げ" id="execute"><br><br>
<script type="module" src="/src/main.js"></script>
</body>
</html>In the HTML, I created a text box for the text to be read aloud and a button to initiate the reading.
main.js(In the code, you’ll replace 888753760 with the value you obtained for SpeakerID.)
document.getElementById("execute").onclick = function (event) {
readText(document.getElementById("text").value)
};
function readText(text) {
const xhr = new XMLHttpRequest();
const url = 'http://localhost:10101/audio_query?speaker=888753760&text=' + text;
xhr.open("POST", url, false);
xhr.send();
const res_str = xhr.responseText;
const xhr_synth = new XMLHttpRequest();
const url_synth = 'http://localhost:10101/synthesis?speaker=888753760';
xhr_synth.open("POST", url_synth);
xhr_synth.setRequestHeader("Content-Type", "application/json");
xhr_synth.responseType = "arraybuffer";
xhr_synth.onreadystatechange = async () => {
if (xhr_synth.readyState === XMLHttpRequest.DONE && xhr_synth.status === 200) {
const context = new AudioContext();
const audioBuffer = await (new Promise((res, rej) => {
context.decodeAudioData(xhr_synth.response, res, rej);
}));
const source = context.createBufferSource();
source.buffer = audioBuffer;
source.connect(context.destination);
source.start(0);
}
}
xhr_synth.send(res_str);
}
Here’s a rough outline of the process:
You’ll send the information to be read aloud to the server via a POST request to http://localhost:10101/audio_query.
const xhr = new XMLHttpRequest();
const url = 'http://localhost:10101/audio_query?speaker=888753760&text=' + text;
xhr.open("POST", url, false);
xhr.send();
const res_str = xhr.responseText;Using the JSON obtained from the above request, you will create a WAV file with a POST request to http://localhost:10101/synthesis.
const xhr_synth = new XMLHttpRequest();
const url_synth = 'http://localhost:10101/synthesis?speaker=888753760';
xhr_synth.open("POST", url_synth);
xhr_synth.setRequestHeader("Content-Type", "application/json");
xhr_synth.responseType = "arraybuffer";
xhr_synth.send(res_str);Once the WAV download is complete, use the following code to play the WAV file:
const context = new AudioContext();
const audioBuffer = await (new Promise((res, rej) => {
context.decodeAudioData(xhr_synth.response, res, rej);
}));
const source = context.createBufferSource();
source.buffer = audioBuffer;
source.connect(context.destination);
source.start(0);
}Result
successfully performed text-to-speech using AIVoiceSpeech.


comment