Text-to-speech (TTS)
Introduction to Hume’s TTS API, including its features, usage limits, and key concepts for integration.
OCTAVE TTS, the first text-to-speech system built on LLM intelligence. Unlike conventional TTS that merely “reads” words, OCTAVE is a “speech-language model” that understands what words mean in context, unlocking a new level of expressiveness and nuance.
At OCTAVE’s core it is a state-of-the-art large language model (LLM) that Hume AI trained to understand and synthesize speech. This speech-language model can predict the tune, rhythm and timbre of speech, knowing when to whisper secrets, shout triumphantly, or calmly explain a fact. This combined approach lets OCTAVE interpret plot twists, emotional cues, and character traits within a script or prompt, then transform that understanding into lifelike speech.
Features
Key capabilities | Context-aware expression | Because OCTAVE’s LLM recognizes nuanced meanings, it adapts pitch, tempo, and emphasis to match each word’s emotional intent. |
Any voice you can imagine | From describing a “patient, empathetic counselor” to requesting a “dramatic medieval knight,” OCTAVE instantly creates a fitting voice. | |
Expression control through instruction following | Want a sentence spoken in a particular way with the right emphasis? OCTAVE can emulate any emotions or styles you describe from “righteous indignation” to “hurried whispering.” | |
Long-form versatility | Perfect for audiobooks, podcasts, or voiceover work, OCTAVE preserves emotional consistency across chapters or scene changes—even when characters shift from joy to despair. | |
Developer tools | REST API | A RESTful API that enables text-to-speech (TTS) integration with OCTAVE. Use this API to synthesize speech, customize voice parameters, and create and store reusable voice profiles. |
TypeScript SDK | A strongly-typed library that streamlines OCTAVE TTS integration in TypeScript and JavaScript applications. | |
Python SDK | A wrapper for OCTAVE’s TTS services that simplifies voice synthesis in Python applications. | |
CLI | A command-line tool that allows direct interaction with OCTAVE’s TTS API, ideal for testing, automation, and rapid prototyping. See the CLI quickstart guide. | |
Open source examples | Example repositories provide a starting point for developers and demonstrate OCTAVE’s capabilities. |
Using text-to-speech
The TTS API provides a RESTful interface for generating expressive speech from text. You send text to synthesize along with optional voice specifications and descriptions, and receive audio in your chosen format.
All requests to the API require authentication. Learn more about our supported authentication strategies here.
Basic speech synthesis
At its simplest, you can generate speech by sending text with an optional voice description. The description helps shape the voice’s characteristics and expression:
Using saved voices
If you have saved voices in your library, reference them by name or ID instead of the model generating the voice. You can still provide descriptions to adjust how the voice performs the text:
Advanced options
Speech consistency
For longer content or multiple requests, use the context parameter to maintain consistent speech style. You can provide previous utterances or a generation ID as context:
Multiple generations
Request up to 5 variations of synthesized speech by setting num_generations. This is useful when you want to explore different interpretations of your voice description:
Response format
The API returns a JSON response containing:
Each response includes:
- Base64-encoded audio in your specified format (
MP3
,WAV
, orPCM
). - A unique
generation_id
for saving voices or maintaining consistency. - Audio metadata including duration, file size, and audio encoding.
- A list of segmented utterances (
segments
) divided into natural sounding units. - A
request_id
for tracking and debugging.
Quickstart
Accelerate your project setup with our comprehensive quickstart guides, designed to integrate OCTAVE TTS into your TypeScript or Python applications. Each guide walks you through API integration and demonstrates text-to-speech synthesis, helping you get up and running quickly.
Integrate OCTAVE TTS into web and Node.js applications using our TypeScript SDK.
Use our Python SDK to integrate OCTAVE TTS into your Python applications.
Get started synthesizing text-to-speech with our command-line tool.
API limits
- Request rate limit: 50 requests per minute
- Maximum text length: 5,000 characters
- Maximum description length: 1,000 characters
- Maximum generations per request: 5
- Supported audio formats:
MP3
,WAV
,PCM