EVI TypeScript Quickstart
A quickstart guide for implementing the Empathic Voice Interface (EVI) with TypeScript.
This tutorial provides step-by-step instructions for implementing EVI using Hume’s TypeScript SDK, and is broken down into five sections:
- Authentication: Instantiate the Hume client using your API credentials.
- Connecting to EVI: Initialize a WebSocket connection to interact with EVI.
- Capturing & recording audio: Capture and prepare audio input to stream over the WebSocket.
- Audio playback: Play back the EVI’s audio output to the user.
- Interruption: Client-side management of user interruptions during the chat.
This guide references our TypeScript Quickstart example project. To see the full implementation, visit our API examples repository on GitHub: evi-typescript-quickstart.
Authenticate
To establish an authenticated connection, first instantiate the Hume client with your API credentials. Visit our Getting your API keys page for details on how to obtain your credentials.
This example uses direct API key authentication for simplicity. For production browser environments, implement the Token authentication strategy instead to prevent exposing your API key in client-side code.
Connect
With the Hume client instantiated with your credentials, you can now establish an authenticated WebSocket connection with EVI and assign WebSocket event handlers. For now you can include placeholder event handlers to update in later steps.
Audio input
Next we’ll go over capturing and streaming audio input over the WebSocket. First, handle user permissions
to access the microphone. Next, use the Media Stream API to access the audio stream, and the MediaRecorder API
to capture and base64 encode the audio chunks. Finally, stream the audio input by sending each chunk over the
WebSocket as audio_input messages using
the SDK’s sendAudioInput
method.
Accepted audio formats include: mp3
, wav
, aac
, ogg
, flac
, webm
, avr
, cdda
,
cvs/vms
, aiff
, au
, amr
, mp2
, mp4
, ac3
, avi
, wmv
, mpeg
, ircam
.
Audio output
EVI responds with multiple message types over the WebSocket:
user_message
: This message encapsulates the transcription of the audio input. Additionally, it includes expression measurement predictions related to the speaker’s vocal prosody.assistant_message
: EVI dispatches anAssistantMessage
for every sentence within the response. This message not only relays the content of the response but also features predictions regarding the expressive qualities of the generated audio response.audio_output
: AnAudioOutput
message accompanies eachAssistantMessage
. This contains the actual audio (binary) response corresponding to anAssistantMessage
.assistant_end
: EVI delivers anAssistantEnd
message as the final piece of communication, signifying the conclusion of the response to the audio input.
To play the audio output from the response, define your logic for converting the received binary to a Blob,
and create an HTMLAudioElement
to play the audio.
Then update the client’s message event handler to invoke the logic to play back the audio when received. To manage playback for the incoming audio, you can implement a basic queue and sequentially play back the audio.
Interrupt
Interruptibility is a distinguishing feature of EVI. If you send an audio input through the WebSocket while receiving response messages for a previous audio input, the response to the previous audio input will stop. Additionally, the interface will send back a user_interruption message, and begin responding to the new audio input.