Agora

Guide to integrating Hume TTS with Agora's Conversational AI Engine.

Agora is a real-time communication and conversational AI platform. With Agora’s API, developers can build AI voice agents with any LLM and integrate with Hume’s expressive text-to-speech API for high-quality voice synthesis.

Hume’s expressive TTS can be integrated into your Agora agents to deliver natural, emotionally-aware speech in conversational AI. This guide covers setup instructions, integration patterns, and configuration best practices for using Hume TTS with Agora.

Wanna get right to the code? See our complete Agora example project on GitHub.

Authentication

To use Hume TTS with Agora, you’ll need both Hume and Agora credentials. Follow these steps to obtain your credentials and set up environment variables.

1

Get your Hume API key

To get your Hume API key, sign in to the Hume Platform and follow the Getting your API key guide.

2

Get your Agora credentials

Sign up for an Agora account and create a project in the Agora Console. Copy the following credentials from your project dashboard: Agora App ID, Certificate, Customer ID, and Secret

3

Configure environment variables

Create a .env.local file in your Next.js project and define the required environment variables:

.env.local
# Agora Configuration
NEXT_PUBLIC_AGORA_APP_ID=
NEXT_PUBLIC_AGORA_APP_CERTIFICATE=
NEXT_PUBLIC_AGORA_CUSTOMER_ID=
NEXT_PUBLIC_AGORA_CUSTOMER_SECRET=
NEXT_PUBLIC_AGORA_CONVO_AI_BASE_URL=https://api.agora.io/api/conversational-ai-agent/v2/projects/
NEXT_PUBLIC_AGENT_UID=
# LLM Configuration
NEXT_PUBLIC_LLM_URL=https://api.openai.com/v1/chat/completions
NEXT_PUBLIC_LLM_MODEL=gpt-4
NEXT_PUBLIC_LLM_API_KEY=
# TTS Configuration
NEXT_PUBLIC_TTS_VENDOR=hume
# Hume Configuration
NEXT_PUBLIC_HUME_API_KEY=
NEXT_PUBLIC_HUME_VOICE_ID=
# Modalities Configuration
NEXT_PUBLIC_INPUT_MODALITIES=text,audio
NEXT_PUBLIC_OUTPUT_MODALITIES=text,audio

Usage

Agora’s Conversational AI Engine enables you to build voice AI agents with any LLM by orchestrating the complete speech-to-speech pipeline: automatic speech recognition (ASR) converts user speech to text, your chosen LLM processes the text and generates a response, and Hume TTS synthesizes the LLM’s text output into natural, expressive speech.

Building a Conversational AI Agent

The Conversational AI Engine handles the entire voice interaction flow, allowing you to focus on configuring your LLM and TTS provider. When using Hume TTS, the Agora engine manages audio streaming and interruption handling.

Integration workflow:

  1. Configure your LLM: Connect any LLM provider (OpenAI, Azure OpenAI, Google Gemini, Anthropic Claude, or a custom model) to generate responses to user speech.

  2. Set Hume as your TTS provider: Configure Hume TTS in your Agora agent to synthesize the LLM’s text responses into natural, emotionally-aware speech.

  3. Select a voice: Choose from Hume’s extensive Voice Library or use a custom voice you’ve created for consistent agent personality.

  4. Deploy your agent: Agora’s engine handles real-time audio streaming, interruption detection, and maintains the conversation flow between the user and your AI agent.

Configuration example:

For a complete Next.js implementation with Agora and Hume TTS, see our Agora example project.

Sample Configuration
1"tts": {
2"vendor": "hume",
3"params": {
4 "key": "<HUME_API_KEY>",
5 "voice_id": process.env.NEXT_PUBLIC_HUME_VOICE_ID,
6 "trailing_silence": 0.35,
7 "speed": 1,
8}
9}

Best Practices

When building conversational AI agents with Agora and Hume TTS, consider the following:

  • Voice selection: Choose a voice from Hume’s Voice Library that matches your agent’s personality, or create a custom voice for brand consistency.

  • LLM prompt engineering: Design your LLM prompts to work well with voice interactions: keep responses concise and natural for spoken delivery.

  • Interruption handling: Agora’s Conversational AI Engine automatically handles interruptions, allowing users to interrupt the agent mid-response for more natural conversations.

Constraints

  • Audio format compatibility: Hume TTS outputs audio at 48kHz sample rate. Agora supports various sample rates; ensure proper resampling if your Agora configuration requires a different rate.

  • One utterance per request: Each Hume TTS API request processes a single utterance. Split multi-utterance text into separate requests for granular control.

Resources