EVI Python Quickstart

A quickstart guide for integrating the Empathic Voice Interface (EVI) with Python.

In this guide, you’ll learn how to integrate EVI into your Python applications using Hume’s Python SDK.

  1. Environment setup: Download package and system dependencies to run EVI.
  2. Import statements and helpers: Import needed symbols and define helper functions.
  3. Authentication: Use your API credentials to authenticate your EVI application.
  4. Connection: Set up a secure WebSocket connection to interact with EVI.
  5. Handling incoming messages: Process messages and queue audio for playback.
  6. Audio input: Capture audio data from an input device and send to EVI.

Hume’s Python SDK supports EVI using Python versions 3.9, 3.10, and 3.11 on macOS and Linux platforms. The full specification be found in the Python SDK’s readme.

Environment setup

This guide uses the Hume SDK Python package hume with the [microphone] package extra. You can install this with uv (recommended), poetry, or pip. It also uses the python-dotenv package for loading environment variables from an .env file.

$uv init
>uv add hume[microphone] python-dotenv

System dependencies

The Hume Python SDK uses the sounddevice library for audio recording and playback, which relies on the PortAudio C Library to be installed on your system. On macOS and Windows, PortAudio is typically included with the sounddevice package, so no additional installation is required. However, on Linux, you will need to manually install PortAudio correctly for your distribution.

Installation on debian-based Linux systems
$sudo apt-get --yes update
>sudo apt-get --yes install libasound2-dev libportaudio2

Import statements and helpers

First, we import needed symbols from the Python standard library and the Hume SDK, and define some helpers that are useful for printing readable output to the terminal.

quickstart.py
1import asyncio
2import base64
3import datetime
4import os
5from dotenv import load_dotenv
6from hume import MicrophoneInterface, Stream
7from hume.client import AsyncHumeClient
8from hume.empathic_voice.chat.socket_client import ChatConnectOptions
9from hume.empathic_voice.chat.types import SubscribeEvent
10
11def extract_top_n_emotions(emotion_scores: dict, n: int) -> dict:
12 sorted_emotions = sorted(emotion_scores.items(), key=lambda item: item[1], reverse=True)
13 top_n_emotions = {emotion: score for emotion, score in sorted_emotions[:n]}
14 return top_n_emotions
15
16def print_emotions(emotion_scores: dict) -> None:
17 print(' | '.join([f"{emotion} ({score:.2f})" for emotion, score in emotion_scores.items()]))
18
19def log(text: str) -> None:
20 now = datetime.datetime.now(tz=datetime.timezone.utc).strftime("%H:%M:%S")
21 print(f"[{now}] {text}")

Authentication

Log into your Hume AI Account and obtain an API key. Store it as HUME_API_KEY inside your project’s .env file.

Read HUME_API_KEY and use it to instantiate the AsyncHumeClient class. This is the main entry point provided by the Hume Python SDK.

You can specify EVI’s voice and behavior for a chat by Creating a Configuration through the API or the Hume platform web interface. Set HUME_CONFIG_ID in .env or as an environment variable and read it.

quickstart.py
1async def main() -> None:
2 load_dotenv()
3 HUME_API_KEY = os.getenv("HUME_API_KEY")
4 HUME_CONFIG_ID = os.getenv("HUME_CONFIG_ID")
5 client = AsyncHumeClient(api_key=HUME_API_KEY)
6 ...

Connection

To connect to an EVI chat, use the client.empathic_voice.chat.connect_with_callbacks method provided in the AsyncHumeClient. When connecting to the chat, you specify the EVI config inside the ChatConnectOptions object. EVI chats are event-based, so you specify on_open, on_message, on_close, and on_error callback functions to define what your application will do in response to the events that occur during the chat.

quickstart.py
1async def main() -> None:
2 ...
3 async def on_message(message: SubscribeEvent):
4 # (Completed in later steps)
5 ...
6 async with client.empathic_voice.chat.connect_with_callbacks(
7 options=ChatConnectOptions(config_id=HUME_CONFIG_ID),
8 on_open=lambda: print("WebSocket connection opened."),
9 on_message=on_message,
10 on_close=lambda: print("WebSocket connection closed."),
11 on_error=lambda err: print(f"Error: {err}")
12 ) as socket:
13 # (Completed in later steps)
14 ...

Handling incoming messages

After you successfully connect to an EVI chat, messages will be passed to your on_message handler. These are described by the Hume SDK’s SubscribeEvent type.

Audio segments for playback arrive on messages of the audio_output type. The Hume SDK provides a Stream type that is suitable for queuing audio segments for playback. You should instantiate a single Stream instance to act as your playback queue.

quickstart.py
1async def main() -> None:
2 ...
3 stream = Stream.new()
4 async def on_message(message: SubscribeEvent):
5 if message.type == "chat_metadata":
6 log(
7 f"<{message.type}> Chat ID: {message.chat_id}, Chat Group ID: {message.chat_group_id}"
8 )
9 elif message.type == "user_message" or message.type == "assistant_message":
10 log(f"{message.message.role}: {message.message.content}")
11 print_emotions(
12 extract_top_n_emotions(dict(message.models.prosody and message.models.prosody.scores or {}), 3)
13 )
14 elif message.type == "audio_output":
15 await stream.put(
16 base64.b64decode(message.data.encode("utf-8"))
17 )
18 elif message.type == "error":
19 raise RuntimeError(
20 f"Received error message from Hume websocket ({message.code}): {message.message}"
21 )
22 else:
23 log(f"<{message.type}>")
24 ...

Audio input

The Hume SDK provides a MicrophoneInterface class that handles both

  • Sending recorded audio through the WebSocket to EVI
  • Playing back queued audio from a byte_stream of type Stream that you initialize it with.

Pass the chat socket provided by the connect_with_callbacks method in order to use the MicrophoneInterface.start:

quickstart.py
1async def main() -> None:
2 ...
3 stream = Stream.new()
4 ...
5 async with client.empathic_voice.chat.connect_with_callbacks(
6 ...
7 ) as socket:
8 await MicrophoneInterface.start(
9 socket,
10 allow_user_interrupt=False,
11 byte_stream=stream
12 )

Specify a microphone device

MicrophoneInterface.start will attempt to use the system’s default audio input device. To specify a specific audio input device, you can pass it via the optional device parameter in MicrophoneInterface.start.

To view a list of available audio devices, run the following command:

List available audio devices
$python -c "import sounddevice; print(sounddevice.query_devices())"
># outputs something like
> 0 DELL U2720QM, Core Audio (0 in, 2 out)
> 1 I, Phone 15 Pro Max Microphone, Core Audio (1 in, 0 out)
>> 2 Studio Display Microphone, Core Audio (1 in, 0 out)
> 3 Studio Display Speakers, Core Audio (0 in, 8 out)
> 4 MacBook Pro Microphone, Core Audio (1 in, 0 out)
>< 5 MacBook Pro Speakers, Core Audio (0 in, 2 out)
> 6 Pro Tools Audio Bridge 16, Core Audio (16 in, 16 out)
> 7 Pro Tools Audio Bridge 2-A, Core Audio (2 in, 2 out)
> 8 Pro Tools Audio Bridge 2-B, Core Audio (2 in, 2 out)
> 9 Pro Tools Audio Bridge 32, Core Audio (32 in, 32 out)
> 10 Pro Tools Audio Bridge 64, Core Audio (64 in, 64 out)
> 11 Pro Tools Audio Bridge 6, Core Audio (6 in, 6 out)
> 12 Apowersoft Audio Device, Core Audio (2 in, 2 out)
> 13 ZoomAudioDevice, Core Audio (2 in, 2 out)

If the MacBook Pro Microphone is the desired device, specify device 4 in the Microphone context. For example:

Python
1# Specify device 4 in MicrophoneInterface
2await MicrophoneInterface.start(
3 socket,
4 device=4,
5 allow_user_interrupt=False,
6 byte_stream=stream
7)

For troubleshooting faulty device detection - particularly with systems using ALSA, the Advanced Linux Sound Architecture, the device may also be directly specified using the sounddevice library:

Setting default sounddevice library device
1# Directly import the sounddevice library
2import sounddevice as sd
3
4# Set the default device prior to scheduling audio input task
5sd.default.device = 4

Interruption

The allow_interrupt parameter in the MicrophoneInterface class allows control over whether the user can send a message while the assistant is speaking:

Allowing an interrupt
1# Specify allowing interruption
2await MicrophoneInterface.start(
3 socket,
4 allow_user_interrupt=True,
5 byte_stream=stream
6)
  • allow_interrupt=True: Allows the user to send microphone input even when the assistant is speaking. This enables more fluid, overlapping conversation.
  • allow_interrupt=False: Prevents the user from sending microphone input while the assistant is speaking, ensuring that the user does not interrupt the assistant. This is useful in scenarios where clear, uninterrupted communication is important.

Put it all together

Finally, add the following code at the end of your script to run the main function:

quickstart.py
1if __name__ == "__main__":
2 asyncio.run(main())

View the complete quickstart.py code on GitHub

Next steps

Congratulations! You’ve successfully implemented a real-time conversational application using Hume’s Empathic Voice Interface (EVI).

Next, consider exploring these areas to enhance your EVI application:

For further details and practical examples, explore the API Reference and our Hume API Examples on GitHub.