EVI TypeScript Quickstart

A quickstart guide for implementing the Empathic Voice Interface (EVI) with TypeScript.

This tutorial provides step-by-step instructions for implementing EVI using Hume’s TypeScript SDK, and is broken down into five sections:

  1. Authentication: Instantiate the Hume client using your API credentials.
  2. Connecting to EVI: Initialize a WebSocket connection to interact with EVI.
  3. Capturing & recording audio: Capture and prepare audio input to stream over the WebSocket.
  4. Audio playback: Play back the EVI’s audio output to the user.
  5. Interruption: Client-side management of user interruptions during the chat.

This guide references our TypeScript Quickstart example project. To see the full implementation, visit our API examples repository on GitHub: evi-typescript-quickstart.

1

Authenticate

To establish an authenticated connection, first instantiate the Hume client with your API credentials. Visit our Getting your API keys page for details on how to obtain your credentials.

This example uses direct API key authentication for simplicity. For production browser environments, implement the Token authentication strategy instead to prevent exposing your API key in client-side code.

TypeScript
1import { Hume, HumeClient } from 'hume';
2
3// instantiate the Hume client and authenticate
4const client = new HumeClient({
5 apiKey: import.meta.env.HUME_API_KEY,
6});
2

Connect

With the Hume client instantiated with your credentials, you can now establish an authenticated WebSocket connection with EVI and assign WebSocket event handlers. For now you can include placeholder event handlers to update in later steps.

TypeScript
1import { Hume, HumeClient } from 'hume';
2
3// Instantiate the Hume client and authenticate
4const client = new HumeClient({
5 apiKey: import.meta.env.HUME_API_KEY,
6});
7
8// Connect to EVI
9const socket = await client.empathicVoice.chat.connect({
10 configId: import.meta.env.HUME_CONFIG_ID,
11});
12
13// Define event handlers and assign them to WebSocket
14socket.on('open', handleWebSocketOpenEvent);
15socket.on('message', handleWebSocketMessageEvent);
16socket.on('error', handleWebSocketErrorEvent);
17socket.on('close', handleWebSocketCloseEvent);
3

Audio input

Next we’ll go over capturing and streaming audio input over the WebSocket. First, handle user permissions to access the microphone. Next, use the Media Stream API to access the audio stream, and the MediaRecorder API to capture and base64 encode the audio chunks. Finally, stream the audio input by sending each chunk over the WebSocket as audio_input messages using the SDK’s sendAudioInput method.

TypeScript
1import {
2 convertBlobToBase64,
3 ensureSingleValidAudioTrack,
4 getAudioStream,
5 getBrowserSupportedMimeType,
6} from 'hume';
7
8/**--- Audio Recording State ---*/
9let recorder: MediaRecorder | null = null;
10let audioStream: MediaStream | null = null;
11const mimeTypeResult = getBrowserSupportedMimeType();
12const mimeType: MimeType = mimeTypeResult.success
13 ? mimeTypeResult.mimeType
14 : MimeType.WEBM;
15
16// define function for capturing audio
17async function startAudioCapture(): Promise<void> {
18 try {
19 audioStream = await getAudioStream();
20 // Validate the stream
21 ensureSingleValidAudioTrack(audioStream);
22
23 recorder = new MediaRecorder(audioStream, { mimeType });
24 recorder.ondataavailable = handleAudioDataAvailable;
25 recorder.onerror = (event) => {
26 console.error("MediaRecorder error:", event);
27 }
28 recorder.start(50);
29 } catch (error) {
30 console.error(
31 "Failed to initialize or start audio capture:", error
32 );
33 throw error;
34 }
35}
36
37// define a WebSocket open event handler to capture audio
38async function handleWebSocketOpen(): Promise<void> {
39 console.log('WebSocket connection opened.');
40 try {
41 await startAudioCapture();
42 } catch (error) {
43 console.error("Failed to capture audio:", error);
44 alert("Failed to access microphone. Disconnecting.");
45 if (
46 socket &&
47 socket.readyState !== WebSocket.CLOSING &&
48 socket.readyState !== WebSocket.CLOSED
49 ) {
50 socket.close();
51 }
52 }
53}

Accepted audio formats include: mp3, wav, aac, ogg, flac, webm, avr, cdda, cvs/vms, aiff, au, amr, mp2, mp4, ac3, avi, wmv, mpeg, ircam.

4

Audio output

EVI responds with multiple message types over the WebSocket:

  1. user_message: This message encapsulates the transcription of the audio input. Additionally, it includes expression measurement predictions related to the speaker’s vocal prosody.
  2. assistant_message: EVI dispatches an AssistantMessage for every sentence within the response. This message not only relays the content of the response but also features predictions regarding the expressive qualities of the generated audio response.
  3. audio_output: An AudioOutput message accompanies each AssistantMessage. This contains the actual audio (binary) response corresponding to an AssistantMessage.
  4. assistant_end: EVI delivers an AssistantEnd message as the final piece of communication, signifying the conclusion of the response to the audio input.

To play the audio output from the response, define your logic for converting the received binary to a Blob, and create an HTMLAudioElement to play the audio.

Then update the client’s message event handler to invoke the logic to play back the audio when received. To manage playback for the incoming audio, you can implement a basic queue and sequentially play back the audio.

TypeScript
1/**--- Audio Playback State ---*/
2const audioQueue: Blob[] = [];
3let currentAudio: HTMLAudioElement | null = null;
4let isPlaying = false;
5
6// Play the audio within the playback queue, converting each
7// Blob into playable HTMLAudioElements
8function playNextAudioChunk(): void {
9 // Don't play if already playing or queue is empty
10 if (isPlaying || audioQueue.length === 0) return;
11
12 isPlaying = true;
13 const audioBlob = audioQueue.shift();
14
15 if (!audioBlob) {
16 isPlaying = false;
17 return;
18 }
19 const audioUrl = URL.createObjectURL(audioBlob);
20 currentAudio = new Audio(audioUrl);
21 currentAudio.play();
22 currentAudio.onended = () => {
23 URL.revokeObjectURL(audioUrl);
24 currentAudio = null;
25 isPlaying = false;
26 // Recursively play the next chunk if queue is not empty
27 playNextAudioChunk();
28 };
29}
30
31// define a WebSocket message event handler to play audio output
32function handleWebSocketMessage(
33 message: Hume.empathicVoice.SubscribeEvent
34) {
35 switch (message.type) {
36 case 'audio_output':
37 // Decode and queue audio for playback
38 const audioBlob = convertBase64ToBlob(
39 message.data,
40 mimeType
41 );
42 audioQueue.push(audioBlob);
43 // Attempt to play immediately if not already playing
44 playNextAudioChunk();
45 break;
46 }
47}
5

Interrupt

Interruptibility is a distinguishing feature of EVI. If you send an audio input through the WebSocket while receiving response messages for a previous audio input, the response to the previous audio input will stop. Additionally, the interface will send back a user_interruption message, and begin responding to the new audio input.

TypeScript
1// function for stopping the audio and clearing the queue
2function stopAudioPlayback(): void {
3 if (currentAudio) {
4 currentAudio.pause();
5 console.log("Audio playback paused.");
6 if (
7 currentAudio.src &&
8 currentAudio.src.startsWith('blob:')
9 ) {
10 // Revoke URL if paused mid-play
11 URL.revokeObjectURL(currentAudio.src);
12 }
13 currentAudio = null;
14 }
15 audioQueue.length = 0; // Clear the queue
16 isPlaying = false; // Reset playback state
17}
18
19// update WebSocket message event handler to handle interruption
20function handleWebSocketMessage(
21 message: Hume.empathicVoice.SubscribeEvent
22) {
23 switch (message.type) {
24 case 'user_message':
25 // Stop playback if user starts speaking
26 stopAudioPlayback();
27 break;
28 case 'audio_output':
29 // Decode and queue audio for playback
30 const audioBlob = convertBase64ToBlob(
31 message.data,
32 mimeType
33 );
34 audioQueue.push(audioBlob);
35 // Attempt to play immediately if not already playing
36 playNextAudioChunk();
37 break;
38 case 'user_interruption':
39 // Stop playback immediately when the user interrupts
40 console.log("User interruption detected.");
41 stopAudioPlayback();
42 break;
43 }
44}