Text-to-speech (Json)

POST

Synthesizes one or more input texts into speech using the specified voice. If no voice is provided, a novel voice will be generated dynamically. Optionally, additional context can be included to influence the speech’s style and prosody.

The response includes the base64-encoded audio and metadata in JSON format.

Request

This endpoint expects an object.
utteranceslist of objectsRequired

Utterances to be converted to speech output.

contextobjectOptional

Utterances to use as context for generating consistent speech style and prosody across multiple requests. These will not be converted to speech output.

formatobjectOptional

Specifies the output audio file format.

num_generationsintegerOptional>=1<=5Defaults to 1

Number of generations of the audio to produce.

Response

Successful Response

generationslist of objects
request_idstringOptional

A unique ID associated with this request for tracking and troubleshooting. Use this ID when contacting support for troubleshooting assistance.

Errors

Built with