EVI Python Quickstart Guide
A quickstart guide for implementing the Empathic Voice Interface (EVI) with Python.
This guide provides detailed instructions for integrating EVI into your Python projects using Hume’s Python SDK. It is divided into seven key components:
- Environment setup: Download package and system dependencies to run EVI.
- Dependency imports: Import all necessary dependencies into your script.
- Defining a WebSocketHandler class: Create a class to manage the WebSocket connection.
- Authentication: Use your API credentials to authenticate your EVI application.
- Connecting to EVI: Set up a secure WebSocket connection to interact with EVI.
- Handling audio: Capture audio data from an input device, and play audio produced by EVI.
- Asynchronous event loop: Initiate and manage an asynchronous event loop that handles simultaneous, real-time execution of message processing and audio playback.
To see a full implementation within a terminal application, visit our API examples repository on GitHub: evi-python-example
Hume’s Python SDK supports EVI using Python versions 3.9
, 3.10
, and 3.11
on macOS and Linux platforms. The full specification be found on the Python SDK GitHub page.
Environment setup
Before starting the project, it is essential to set up the development environment.
Creating a virtual environment (optional)
Setting up a virtual environment is a best practice to isolate your project’s dependencies from your global Python installation, avoiding potential conflicts.
You can create a virtual environment using either Python’s built-in venv
module or the conda
environment manager. See instructions for both below:
venv
conda
- Create the virtual environment.
Note that when you create a virtual environment using Python’s built-in venv
tool, the virtual environment will use the same Python version as the global Python installation that you used to create it.
- Activate the virtual environment using the appropriate command for your system platform.
The code above demonstrates virtual environment activation on a POSIX platform with a bash/zsh shell. Visit the venv documentation to learn more about using venv
on your platform.
Package dependenices
There are two package dependencies for using EVI:
- Hume Python SDK (required)
The hume[microphone]
package contains the Hume Python SDK. This guide employs EVI’s WebSocket and message handling infrastructure as well as various asynchronous programming and audio utilities.
- Environment variables (recommended)
The python-dotenv
package contains the logic for using environment variables to store and load sensitive variables such as API credentials from a .env
file.
In sample code snippets below, the API key, Secret key, and an EVI configuration id have been saved to environment variables.
While not strictly required, using environment variables is considered best practice because it keeps sensitive information like API keys and configuration settings separate from your codebase. This not only enhances security but also makes your application more flexible and easier to manage across different environments.
System dependencies
For audio playback and processing, additional system-level dependencies are required. Below are download instructions for each supported operating system:
macOS
Linux
To ensure audio playback functionality, macOS users will need to install ffmpeg
, a powerful multimedia framework that handles audio and video processing.
A common way to install ffmpeg
on macOS is by using a package manager such as Homebrew. To do so, follow these steps:
-
Install Homebrew onto your system according to the instructions on the Homebrew website.
-
Once Homebrew is installed, you can install
ffmpeg
withbrew
:
If you prefer not to use Homebrew, you can download a pre-built ffmpeg
binary from the ffmpeg website or use other package managers like MacPorts.
Dependency imports
The following import statements are used in the example project to handle asynchronous operations, environment variables, audio processing, and communication with the Hume API:
Import statements
Statement explanations
Defining a WebSocketHandler class
Next, we define a WebSocketHandler
class to encapsulate WebSocket functionality in one organized component. The handler allows us to implement application-specific behavior upon the socket opening, closing, receiving messages, and handling errors. It also manages the continuous audio stream from a microphone.
By using a class, you can maintain the WebSocket connection and audio stream state in one place, making it simpler to manage both real-time communication and audio processing.
Below are the key methods:
Authentication
In order to establish an authenticated connection, we instantiate the Hume client with our API key and include our Secret key in the query parameters passed into the WebSocket connection.
You can obtain your API credentials by logging into the Hume Platform and visiting the API keys page.
Connecting to EVI
With the Hume client instantiated with our credentials, we can now establish an authenticated WebSocket connection with EVI and pass in our handlers.
Handling audio
The MicrophoneInterface
class captures audio input from the user’s device and streams it over the WebSocket connection.
Audio playback occurs when the WebSocketHandler
receives audio data over the WebSocket connection in its asynchronous byte stream from an audio_output
message.
In this example, byte_strs
is a stream of audio data that the WebSocket connection populates.
Specifying a microphone device
You can specify your microphone device using the device
parameter in the MicrophoneInterface
object’s start
method.
To view a list of available audio devices, run the following command:
Below is an example output:
If the MacBook Pro Microphone
is the desired device, specify device 4 in the Microphone context. For example:
For troubleshooting faulty device detection - particularly with systems using ALSA, the Advanced Linux Sound Architecture, the device may also be directly specified using the sounddevice
library:
Allowing interruption
The allow_interrupt
parameter in the MicrophoneInterface
class allows control over whether the user can send a message while the assistant is speaking:
allow_interrupt=True
: Allows the user to send microphone input even when the assistant is speaking. This enables more fluid, overlapping conversation.allow_interrupt=False
: Prevents the user from sending microphone input while the assistant is speaking, ensuring that the user does not interrupt the assistant. This is useful in scenarios where clear, uninterrupted communication is important.