TTS TypeScript Quickstart Guide

Step-by-step guide for integrating the TTS API using Hume’s TypeScript SDK.

This guide shows how to get started using Hume’s Text-to-Speech capabilities in TypeScript using Hume’s TypeScript SDK. It demonstrates:

  1. Converting text to speech with a new voice.
  2. Saving a voice to your voice library for future use.
  3. Giving “acting instructions” to modulate the voice.
  4. Generating multiple variations of the same text at once.
  5. Providing context to maintain consistency across multiple generations.

The complete code for the example in this guide is available on GitHub.

Environment Setup

Create a new TypeScript project and install the required packages:

$npm init -y
>npm install hume dotenv
>npm install --save-dev typescript @types/node

Authenticating the HumeClient

You must authenticate to use the Hume TTS API. Your API key can be retrieved from the Hume AI platform.

This example uses dotenv. Place your API key in a .env file at the root of your project.

.env
$echo "HUME_API_KEY=your_api_key_here" > .env

Then create a new file index.ts and use your API key to instantiate the HumeClient.

1import { HumeClient } from "hume"
2import dotenv from "dotenv"
3
4dotenv.config()
5
6const hume = new HumeClient({
7 apiKey: process.env.HUME_API_KEY!
8})

Helper functions

Define functions to aid in writing generated audio to a temporary file and playing audio:

1import fs from "fs/promises"
2import path from "path"
3import * as os from "os"
4import * as child_process from "child_process"
5
6const outputDir = path.join(os.tmpdir(), `hume-audio-${Date.now()}`)
7
8const writeResultToFile = async (base64EncodedAudio: string, filename: string) => {
9 const filePath = path.join(outputDir, `${filename}.wav`)
10 await fs.writeFile(filePath, Buffer.from(base64EncodedAudio, "base64"))
11 console.log('Wrote', filePath)
12}
13
14const startAudioPlayer = () => {
15 const proc = child_process.spawn('ffplay', ['-nodisp', '-autoexit', '-infbuf', '-i', '-'], {
16 detached: true,
17 stdio: ['pipe', 'ignore', 'ignore'],
18 })
19
20 proc.on('error', (err) => {
21 if ((err as any).code === 'ENOENT') {
22 console.error('ffplay not found. Please install ffmpeg to play audio.')
23 }
24 })
25
26 return {
27 sendAudio: (audio: string) => {
28 const buffer = Buffer.from(audio, "base64")
29 proc.stdin.write(buffer)
30 },
31 stop: () => {
32 proc.stdin.end()
33 proc.unref()
34 }
35 }
36}
37
38const main = async () => {
39 await fs.mkdir(outputDir)
40 console.log('Writing to', outputDir)
41
42 // All the code examples in the remainder of the guide
43 // belong within this main function.
44}
45
46main().then(() => console.log('Done')).catch(console.error)

Calling Text-to-Speech

To use Hume TTS, you can call hume.tts.synthesizeJson with a list of utterances. Inside each utterance, put the text to speak, and optionally provide a description of how the voice speaking the text should sound. If you don’t provide a description, Hume will examine text and attempt to determine an appropriate voice.

The base64-encoded bytes of an audio file with your speech will be present at .generations[0].audio in the returned object. By default, there will only be a single variation in the .generations array, and the audio will be in wav format.

The .generations[0].generationId field will contain an ID you can use to refer to this specific generation of speech in future requests.

1const speech1 = await hume.tts.synthesizeJson({
2 utterances: [{
3 description: "A refined, British aristocrat",
4 text: "Take an arrow from the quiver."
5 }]
6})
7await writeResultToFile(speech1.generations[0].audio, "speech1_0")

Saving voices

Use hume.tts.voices.create to save the voice of a generated piece of audio to your voice library for future use:

1await hume.tts.voices.create({
2 name: `aristocrat-${Date.now()}`,
3 generationId: speech1.generations[0].generationId,
4})

Continuity

Inside an utterance, specify the name or ID of a voice to generate more speech from that voice.

To generate speech that is meant to follow previously generated speech, specify context with the generationId of that speech.

You can specify a number up to 5 in numGenerations to generate multiple variations of the same speech at the same time.

1const speech2 = await hume.tts.synthesizeJson({
2 utterances: [{
3 voice: { name: "aristocrat" },
4 text: "Now take a bow."
5 }],
6 context: {
7 generationId: speech1.generations[0].generationId
8 },
9 numGenerations: 2,
10})
11await writeResultToFile(speech2.generations[0].audio, "speech2_0")
12await writeResultToFile(speech2.generations[1].audio, "speech2_1")

Acting Instructions

If you specify both voice and description, the description field will behave as “acting instructions”. It will keep the character of the specified voice, but modulated to match description.

1const speech3 = await hume.tts.synthesizeJson({
2 utterances: [{
3 voice: { name: "aristocrat" },
4 description: "Murmured softly, with a heavy dose of sarcasm and contempt",
5 text: "Does he even know how to use that thing?"
6 }],
7 context: {
8 generationId: speech2.generations[0].generationId
9 },
10 numGenerations: 1
11})
12await writeResultToFile(speech3.generations[0].audio, "speech3_0")

Streaming speech

You can stream utterances using the synthesizeJsonStreaming method. This allows you to process audio chunks as they become available rather than waiting for the entire speech generation to complete.

By default, each audio chunk returned by synthesizeJsonStreaming is a whole mp3 file, complete with headers. However, our AudioPlayer (backed by ffplay) expects to receive a single streamed audio file. We pass stripHeaders: true to so that concatenating the chunks returns a single, valid audio file.

If you intend to play the generated audio in real-time, you should enable instantMode: true. This will enable audio to begin playing almost immediately, without compromising quality, but will incur a 10% higher cost to your account.

1const audioPlayer = startAudioPlayer()
2for await (const snippet of await hume.tts.synthesizeJsonStreaming({
3 context: {
4 generationId: speech3.generations[0].generationId,
5 },
6 utterances: [{text: "He's drawn the bow..."}, {text: "he's fired the arrow..."}, {text: "I can't believe it! A perfect bullseye!"}],
7 stripHeaders: true,
8
9 // instantMode: true
10})) {
11 audioPlayer.sendAudio(snippet.audio)
12}
13audioPlayer.stop()

Running the Example

To run the example:

$npx ts-node index.ts

Note: The audio playback functionality requires FFmpeg to be installed on your system.