Text-to-speech (Json)

Synthesizes one or more input texts into speech using the specified voice. If no voice is provided, a novel voice will be generated dynamically. Optionally, additional context can be included to influence the speech’s style and prosody.

The response includes the base64-encoded audio and metadata in JSON format.

Headers

X-Hume-Api-KeystringRequired

Query parameters

access_tokenstringOptionalDefaults to

Access token used for authenticating the client. If not provided, an api_key must be provided to authenticate.

The access token is generated using both an API key and a Secret key, which provides an additional layer of security compared to using just an API key.

For more details, refer to the Authentication Strategies Guide.

Request

This endpoint expects an object.

utteranceslist of objectsRequired

A list of Utterances to be converted to speech output.

An Utterance is a unit of input for Octave, and includes input text, an optional description to serve as the prompt for how the speech should be delivered, an optional voice specification, and additional controls to guide delivery for speed and trailing_silence.

contextobjectOptional

Utterances to use as context for generating consistent speech style and prosody across multiple requests. These will not be converted to speech output.

formatobjectOptional

Specifies the output audio file format.

num_generationsintegerOptional>=1<=5Defaults to 1

Number of generations of the audio to produce.

split_utterancesbooleanOptionalDefaults to true

Controls how audio output is segmented in the response.

When enabled (true), input utterances are automatically split into natural-sounding speech segments.
When disabled (false), the response maintains a strict one-to-one mapping between input utterances and output snippets.

This setting affects how the snippets array is structured in the response, which may be important for applications that need to track the relationship between input text and generated audio segments. When setting to false, avoid including utterances with long text, as this can result in distorted output.

strip_headersbooleanOptionalDefaults to false

If enabled, the audio for all the chunks of a generation, once concatenated together, will constitute a single audio file. Otherwise, if disabled, each chunk’s audio will be its own audio file, each with its own headers (if applicable).

instant_modebooleanOptionalDefaults to true

Enables ultra-low latency streaming, significantly reducing the time until the first audio chunk is received. Recommended for real-time applications requiring immediate audio playback. For further details, see our documentation on instant mode.

A voice must be specified when instant mode is enabled. Dynamic voice generation is not supported with this mode.
Instant mode is only supported for streaming endpoints (e.g., /v0/tts/stream/json, /v0/tts/stream/file).
Ensure only a single generation is requested (num_generations must be 1 or omitted).

Response

Successful Response

generationslist of objects

request_idstring or null

A unique ID associated with this request for tracking and troubleshooting. Use this ID when contacting support for troubleshooting assistance.

1	from hume import HumeClient
2	from hume.tts import FormatMp3, PostedContextWithUtterances, PostedUtterance
3
4	client = HumeClient(
5	api_key="YOUR_API_KEY",
6	)
7	client.tts.synthesize_json(
8	utterances=[
9	PostedUtterance(
10	text="Beauty is no quality in things themselves: It exists merely in the mind which contemplates them.",
11	description="Middle-aged masculine voice with a clear, rhythmic Scots lilt, rounded vowels, and a warm, steady tone with an articulate, academic quality.",
12	)
13	],
14	context=PostedContextWithUtterances(
15	utterances=[
16	PostedUtterance(
17	text="How can people see beauty so differently?",
18	description="A curious student with a clear and respectful tone, seeking clarification on Hume's ideas with a straightforward question.",
19	)
20	],
21	),
22	format=FormatMp3(),
23	num_generations=1,
24	)

1	{
2	"generations": [
3	{
4	"audio": "//PExAA0DDYRvkpNfhv3JI5JZ...etc.",
5	"duration": 7.44225,
6	"encoding": {
7	"format": "mp3",
8	"sample_rate": 48000
9	},
10	"file_size": 120192,
11	"generation_id": "795c949a-1510-4a80-9646-7d0863b023ab",
12	"snippets": [
13	[
14	{
15	"audio": "//PExAA0DDYRvkpNfhv3JI5JZ...etc.",
16	"generation_id": "795c949a-1510-4a80-9646-7d0863b023ab",
17	"id": "37b1b1b1-1b1b-1b1b-1b1b-1b1b1b1b1b1b",
18	"text": "Beauty is no quality in things themselves: It exists merely in the mind which contemplates them.",
19	"utterance_index": 0
20	}
21	]
22	]
23	}
24	],
25	"request_id": "66e01f90-4501-4aa0-bbaf-74f45dc15aa725906"
26	}

Headers

Query parameters

Request

Response

Errors