The purpose
On the following page, text was read aloud using SpeechSynthesis, a basic JavaScript function, but the pronunciation was not very good.
Therefore, I will perform the read-aloud using AIVoiceSpeech, which has clearer pronunciation.
Prerequisites
This time, we will use a local web application as an example (assuming access from http://localhost/
. File://
cannot be used either). This is due to a CORS Policy issue. Although it seems possible to allow access from other PCs by changing settings, I have not tried it.
Environment
AivisSpeech Engine version 1.0.0
Start AivisSpeech-Engine(server)
If you have not installed AIVoiceSpeech, please refer to the following page for installation instructions.
Execute the following command from the command line to start the server:
C:\Program Files\AivisSpeech\AivisSpeech-Engine¥run.exe
If you performed a per-user installation, the path will be as follows:
\AppData\Local\Programs\AivisSpeech\AivisSpeech-Engine\run.exe
Initially, it will take some time because the model needs to be downloaded. (Even if you’ve used AIVoiceSpeech before, the download will occur if this is your first time directly launching AIVoiceSpeech-Engine.)
[2025/01/27 09:02:52] INFO: Started server process [19536]
[2025/01/27 09:02:52] INFO: Waiting for application startup.
[2025/01/27 09:02:52] INFO: Application startup complete.
[2025/01/27 09:02:52] INFO: Uvicorn running on http://localhost:10101 (Press CTRL+C to quit)
Once the above logs are displayed, open http://localhost:10101
in a browser like Chrome.
If it looks like the image below, the server is running correctly.

Acquiring SpeakerID
Access the following URL:
http://localhost:10101/speakers
Check “Pretty Print” for better readability. Note down the id
of the settings you want to use (be careful, it’s not speaker_uuid
). Here, we will use 888753760
for Anneli’s Normal setting.

Code
Now, let’s create the HTML and JavaScript. However, requesting AIVoiceSpeech-Engine from File://
will fail, so you’ll need to start a local server using Node.js or LiveServer.
For instructions on how to use LiveServer, refer to the following:
index.html
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>TEST</title>
</head>
<body>
<input type="text" id="text" class="txt" value="読み上げます" required><br><br>
<input type="button" value="読み上げ" id="execute"><br><br>
<script type="module" src="/src/main.js"></script>
</body>
</html>
In the HTML, I created a text box for the text to be read aloud and a button to initiate the reading.
main.js(In the code, you’ll replace 888753760
with the value you obtained for SpeakerID.)
document.getElementById("execute").onclick = function (event) {
readText(document.getElementById("text").value)
};
function readText(text) {
const xhr = new XMLHttpRequest();
const url = 'http://localhost:10101/audio_query?speaker=888753760&text=' + text;
xhr.open("POST", url, false);
xhr.send();
const res_str = xhr.responseText;
const xhr_synth = new XMLHttpRequest();
const url_synth = 'http://localhost:10101/synthesis?speaker=888753760';
xhr_synth.open("POST", url_synth);
xhr_synth.setRequestHeader("Content-Type", "application/json");
xhr_synth.responseType = "arraybuffer";
xhr_synth.onreadystatechange = async () => {
if (xhr_synth.readyState === XMLHttpRequest.DONE && xhr_synth.status === 200) {
const context = new AudioContext();
const audioBuffer = await (new Promise((res, rej) => {
context.decodeAudioData(xhr_synth.response, res, rej);
}));
const source = context.createBufferSource();
source.buffer = audioBuffer;
source.connect(context.destination);
source.start(0);
}
}
xhr_synth.send(res_str);
}
Here’s a rough outline of the process:
You’ll send the information to be read aloud to the server via a POST request to http://localhost:10101/audio_query
.
const xhr = new XMLHttpRequest();
const url = 'http://localhost:10101/audio_query?speaker=888753760&text=' + text;
xhr.open("POST", url, false);
xhr.send();
const res_str = xhr.responseText;
Using the JSON obtained from the above request, you will create a WAV file with a POST request to http://localhost:10101/synthesis
.
const xhr_synth = new XMLHttpRequest();
const url_synth = 'http://localhost:10101/synthesis?speaker=888753760';
xhr_synth.open("POST", url_synth);
xhr_synth.setRequestHeader("Content-Type", "application/json");
xhr_synth.responseType = "arraybuffer";
xhr_synth.send(res_str);
Once the WAV download is complete, use the following code to play the WAV file:
const context = new AudioContext();
const audioBuffer = await (new Promise((res, rej) => {
context.decodeAudioData(xhr_synth.response, res, rej);
}));
const source = context.createBufferSource();
source.buffer = audioBuffer;
source.connect(context.destination);
source.start(0);
}
Result
successfully performed text-to-speech using AIVoiceSpeech.
comment