EVI Python Quickstart Guide

A quickstart guide for implementing the Empathic Voice Interface (EVI) with Python.

This guide provides detailed instructions for integrating EVI into your Python projects using Hume’s Python SDK. It is divided into seven key components:

  1. Environment setup: Download package and system dependencies to run EVI.
  2. Dependency imports: Import all necessary dependencies into your script.
  3. Defining a WebSocketHandler class: Create a class to manage the WebSocket connection.
  4. Authentication: Use your API credentials to authenticate your EVI application.
  5. Connecting to EVI: Set up a secure WebSocket connection to interact with EVI.
  6. Handling audio: Capture audio data from an input device, and play audio produced by EVI.
  7. Asynchronous event loop: Initiate and manage an asynchronous event loop that handles simultaneous, real-time execution of message processing and audio playback.

To see a full implementation within a terminal application, visit our API examples repository on GitHub: evi-python-example

Hume’s Python SDK supports EVI using Python versions 3.9, 3.10, and 3.11 on macOS and Linux platforms. The full specification be found on the Python SDK GitHub page.

1

Environment setup

Before starting the project, it is essential to set up the development environment.

Creating a virtual environment (optional)

Setting up a virtual environment is a best practice to isolate your project’s dependencies from your global Python installation, avoiding potential conflicts.

You can create a virtual environment using either Python’s built-in venv module or the conda environment manager. See instructions for both below:

  1. Create the virtual environment.

Note that when you create a virtual environment using Python’s built-in venv tool, the virtual environment will use the same Python version as the global Python installation that you used to create it.

Creating the virtual environment with venv
$python -m venv evi-env
  1. Activate the virtual environment using the appropriate command for your system platform.
Activating the virtual environment with venv
$source evi-env/bin/activate

The code above demonstrates virtual environment activation on a POSIX platform with a bash/zsh shell. Visit the venv documentation to learn more about using venv on your platform.

Package dependenices

There are two package dependencies for using EVI:

  1. Hume Python SDK (required)

The hume[microphone] package contains the Hume Python SDK. This guide employs EVI’s WebSocket and message handling infrastructure as well as various asynchronous programming and audio utilities.

Installing the Hume Python SDK package
$pip install "hume[microphone]"
  1. Environment variables (recommended)

The python-dotenv package contains the logic for using environment variables to store and load sensitive variables such as API credentials from a .env file.

Installing the environment variable package
$pip install python-dotenv

In sample code snippets below, the API key, Secret key, and an EVI configuration id have been saved to environment variables.

While not strictly required, using environment variables is considered best practice because it keeps sensitive information like API keys and configuration settings separate from your codebase. This not only enhances security but also makes your application more flexible and easier to manage across different environments.

System dependencies

For audio playback and processing, additional system-level dependencies are required. Below are download instructions for each supported operating system:

To ensure audio playback functionality, macOS users will need to install ffmpeg, a powerful multimedia framework that handles audio and video processing.

A common way to install ffmpeg on macOS is by using a package manager such as Homebrew. To do so, follow these steps:

  1. Install Homebrew onto your system according to the instructions on the Homebrew website.

  2. Once Homebrew is installed, you can install ffmpeg with brew:

Installing ffmpeg with Homebrew
$brew install ffmpeg

If you prefer not to use Homebrew, you can download a pre-built ffmpeg binary from the ffmpeg website or use other package managers like MacPorts.

2

Dependency imports

The following import statements are used in the example project to handle asynchronous operations, environment variables, audio processing, and communication with the Hume API:

Imports
1import asyncio
2import base64
3import datetime
4import os
5from dotenv import load_dotenv
6from hume.client import AsyncHumeClient
7from hume.empathic_voice.chat.socket_client import ChatConnectOptions, ChatWebsocketConnection
8from hume.empathic_voice.chat.types import SubscribeEvent
9from hume.empathic_voice.types import UserInput
10from hume.core.api_error import ApiError
11from hume import MicrophoneInterface, Stream
3

Defining a WebSocketHandler class

Next, we define a WebSocketHandler class to encapsulate WebSocket functionality in one organized component. The handler allows us to implement application-specific behavior upon the socket opening, closing, receiving messages, and handling errors. It also manages the continuous audio stream from a microphone.

By using a class, you can maintain the WebSocket connection and audio stream state in one place, making it simpler to manage both real-time communication and audio processing.

Below are the key methods:

MethodDescription
__init__()Initializes the handler, setting up placeholders for the WebSocket connection.
set_socket(socket: ChatWebsocketConnection)Associates the WebSocket connection with the handler.
on_open()Called when the WebSocket connection is established, enabling any necessary initialization.
on_message(data: SubscribeEvent)Handles incoming messages from the WebSocket, processing different types of messages.
on_close()Invoked when the WebSocket connection is closed, allowing for cleanup operations.
on_error(error: Exception)Manages errors that occur during WebSocket communication, providing basic error logging.
4

Authentication

In order to establish an authenticated connection, we instantiate the Hume client with our API key and include our Secret key in the query parameters passed into the WebSocket connection.

You can obtain your API credentials by logging into the Hume Platform and visiting the API keys page.

Authenticating EVI
1async def main() -> None:
2 # Retrieve any environment variables stored in the .env file
3 load_dotenv()
4
5 # Retrieve the API key, Secret key, and EVI config id from the environment variables
6 HUME_API_KEY = os.getenv("HUME_API_KEY")
7 HUME_SECRET_KEY = os.getenv("HUME_SECRET_KEY")
8 HUME_CONFIG_ID = os.getenv("HUME_CONFIG_ID")
9
10 # Initialize the asynchronous client, authenticating with your API key
11 client = AsyncHumeClient(api_key=HUME_API_KEY)
12
13 # Define options for the WebSocket connection, such as an EVI config id and a secret key for token authentication
14 options = ChatConnectOptions(config_id=HUME_CONFIG_ID, secret_key=HUME_SECRET_KEY)
15
16 # ...
5

Connecting to EVI

With the Hume client instantiated with our credentials, we can now establish an authenticated WebSocket connection with EVI and pass in our handlers.

Connecting to EVI
1async def main() -> None:
2 # ...
3 # Define options for the WebSocket connection, such as an EVI config id and a secret key for token authentication
4 options = ChatConnectOptions(config_id=HUME_CONFIG_ID, secret_key=HUME_SECRET_KEY)
5
6 # Instantiate the WebSocketHandler
7 websocket_handler = WebSocketHandler()
8
9 # Open the WebSocket connection with the configuration options and the handler's functions
10 async with client.empathic_voice.chat.connect_with_callbacks(
11 options=options,
12 on_open=websocket_handler.on_open,
13 on_message=websocket_handler.on_message,
14 on_close=websocket_handler.on_close,
15 on_error=websocket_handler.on_error
16 ) as socket:
17
18 # Set the socket instance in the handler
19 websocket_handler.set_socket(socket)
20 # ...
6

Handling audio

The MicrophoneInterface class captures audio input from the user’s device and streams it over the WebSocket connection.

Audio playback occurs when the WebSocketHandler receives audio data over the WebSocket connection in its asynchronous byte stream from an audio_output message.

In this example, byte_strs is a stream of audio data that the WebSocket connection populates.

Capturing and sending audio to EVI
1async def main() -> None:
2 # Open the WebSocket connection with the configuration options and the handler's functions
3 async with client.empathic_voice.chat.connect_with_callbacks(...) as socket:
4 # Set the socket instance in the handler
5 websocket_handler.set_socket(socket)
6
7 # Create an asynchronous task to continuously detect and process input from the microphone, as well as play audio
8 microphone_task = asyncio.create_task(
9 MicrophoneInterface.start(
10 socket,
11 byte_stream=websocket_handler.byte_strs
12 )
13 )
14
15 # Await the microphone task
16 await microphone_task

Specifying a microphone device

You can specify your microphone device using the device parameter in the MicrophoneInterface object’s start method.

To view a list of available audio devices, run the following command:

List available audio devices
python -c "import sounddevice; print(sounddevice.query_devices())"

Below is an example output:

Example audio device list
$ 0 DELL U2720QM, Core Audio (0 in, 2 out)
> 1 I, Phone 15 Pro Max Microphone, Core Audio (1 in, 0 out)
>> 2 Studio Display Microphone, Core Audio (1 in, 0 out)
> 3 Studio Display Speakers, Core Audio (0 in, 8 out)
> 4 MacBook Pro Microphone, Core Audio (1 in, 0 out)
>< 5 MacBook Pro Speakers, Core Audio (0 in, 2 out)
> 6 Pro Tools Audio Bridge 16, Core Audio (16 in, 16 out)
> 7 Pro Tools Audio Bridge 2-A, Core Audio (2 in, 2 out)
> 8 Pro Tools Audio Bridge 2-B, Core Audio (2 in, 2 out)
> 9 Pro Tools Audio Bridge 32, Core Audio (32 in, 32 out)
> 10 Pro Tools Audio Bridge 64, Core Audio (64 in, 64 out)
> 11 Pro Tools Audio Bridge 6, Core Audio (6 in, 6 out)
> 12 Apowersoft Audio Device, Core Audio (2 in, 2 out)
> 13 ZoomAudioDevice, Core Audio (2 in, 2 out)

If the MacBook Pro Microphone is the desired device, specify device 4 in the Microphone context. For example:

Python
1# Specify device 4 in MicrophoneInterface
2MicrophoneInterface.start(
3 socket,
4 device=4,
5 allow_user_interrupt=True,
6 byte_stream=websocket_handler.byte_strs
7)

For troubleshooting faulty device detection - particularly with systems using ALSA, the Advanced Linux Sound Architecture, the device may also be directly specified using the sounddevice library:

Setting default sounddevice library device
1# Directly import the sounddevice library
2import sounddevice as sd
3
4# Set the default device prior to scheduling audio input task
5sd.default.device = 4

Allowing interruption

The allow_interrupt parameter in the MicrophoneInterface class allows control over whether the user can send a message while the assistant is speaking:

Allowing an interrupt
1# Specify allowing interruption
2MicrophoneInterface.start(
3 socket,
4 allow_user_interrupt=True,
5 byte_stream=websocket_handler.byte_strs
6)
  • allow_interrupt=True: Allows the user to send microphone input even when the assistant is speaking. This enables more fluid, overlapping conversation.
  • allow_interrupt=False: Prevents the user from sending microphone input while the assistant is speaking, ensuring that the user does not interrupt the assistant. This is useful in scenarios where clear, uninterrupted communication is important.
7

Asynchronous event loop

Initialize, execute, and manage the lifecycle of the asynchronous event loop, making sure that the main() coroutine and its runs effectively and that the application shuts down cleanly after the coroutine finishes executing.

Initialize the async event loop in global scope
1asyncio.run(main())