EVI TypeScript Quickstart

A quickstart guide for implementing the Empathic Voice Interface (EVI) with TypeScript.

This tutorial provides step-by-step instructions for implementing EVI using Hume’s TypeScript SDK, and is structured into five sections:

  1. Authentication: Instantiate the Hume client using your API credentials.
  2. Connection: Initialize a WebSocket connection to interact with EVI.
  3. Audio capture: Capture and stream audio input.
  4. Audio playback: Play back EVI’s audio output.
  5. Interruption: Handle user interruptions client-side.

This guide primarily targets web browser implementations, leveraging browser-native APIs like MediaStream, MediaRecorder, and the Web Audio API (via the SDK’s EVIWebAudioPlayer). For non-browser environments (e.g., Node.js), audio capture and playback implementation will vary based on your runtime context.

Want to jump straight into the code?

Check out the evi-typescript-quickstart example on GitHub. It covers everything in this guide and demonstrates how to extract the transcript and expression measures to display the Chat in your UI.

1

Authentication

To establish an authenticated connection, first instantiate the Hume client with your API credentials. Visit our Getting your API keys page for details on how to obtain your credentials.

This example uses direct API key authentication for simplicity. For production browser environments, implement the Token authentication strategy instead to prevent exposing your API key to the client.

Initialize Hume client with credentials
1import { Hume, HumeClient } from 'hume';
2
3const client = new HumeClient({
4 apiKey: HUME_API_KEY, // Replace with environment variable
5});
2

Connection

With the Hume client instantiated, establish an authenticated WebSocket connection using the client’s empathicVoice.chat.connect method, and assign WebSocket event handlers.

Establish a connection with EVI
1import type { SubscribeEvent } from "hume/api/resources/empathicVoice/resources/chat";
2import type { CloseEvent } from "hume/core/websocket/events";
3
4const socket = await client.empathicVoice.chat.connect({
5 configId: HUME_CONFIG_ID, // optional
6});
7
8// Placeholder event handlers to be updated in later steps
9function handleOpen() {}
10function handleMessage(msg: SubscribeEvent) {}
11function handleError(err: Event | Error) {}
12function handleClose(e: CloseEvent) {}
13
14socket.on('open', handleOpen);
15socket.on('message', handleMessage);
16socket.on('error', handleError);
17socket.on('close', handleClose);
3

Audio capture

Next, you’ll capture audio input from the user’s microphone and stream it to EVI over the WebSocket:

  • Request microphone access from the user.
  • Obtain the audio stream using the MediaStream API.
  • Record audio chunks using the MediaRecorder API.
  • Encode each audio chunk in base64.
  • Stream encoded audio to EVI by sending audio_input messages to Hume over WebSocket using the SDK’s sendAudioInput method.
Audio capture logic
1import {
2 convertBlobToBase64,
3 ensureSingleValidAudioTrack,
4 getAudioStream,
5 getBrowserSupportedMimeType,
6} from 'hume';
7
8let recorder: MediaRecorder | null = null;
9
10async function startAudioCapture(
11 socket: ChatSocket,
12 timeSliceMs = 80
13): Promise<MediaRecorder> {
14 const mimeTypeResult = getBrowserSupportedMimeType();
15 const mimeType = mimeTypeResult.success
16 ? mimeTypeResult.mimeType
17 : MimeType.WEBM;
18
19 const micAudioStream = await getAudioStream();
20 ensureSingleValidAudioTrack(micAudioStream);
21
22 const recorder = new MediaRecorder(micAudioStream, { mimeType });
23 recorder.ondataavailable = async (e: BlobEvent) => {
24 if (e.data.size > 0 && socket.readyState === WebSocket.OPEN) {
25 const data = await convertBlobToBase64(e.data);
26 socket.sendAudioInput({ data });
27 }
28 };
29 recorder.onerror = (e) => console.error("MediaRecorder error:", e);
30 recorder.start(timeSliceMs);
31
32 return recorder;
33}

Accepted audio formats include: mp3, wav, aac, ogg, flac, webm, avr, cdda, cvs/vms, aiff, au, amr, mp2, mp4, ac3, avi, wmv, mpeg, ircam.

Invoke startAudioCapture within your handleOpen event handler to start streaming audio when a connection is established:

Start audio capture on open
1async function handleOpen() {
2 console.log("Socket opened");
3 recorder = await startAudioCapture(socket!);
4}

Lastly, ensure audio capture stops appropriately by updating your handleClose event handler:

Stop audio capture on close
1function handleClose(e: CloseEvent) {
2 console.log("Socket closed:", e);
3 recorder?.stream.getTracks().forEach((t) => t.stop());
4 recorder = null;
5}
4

Audio playback

Next, you’ll handle playback of audio responses from EVI using the Hume TypeScript SDK’s EVIWebAudioPlayer. Follow these steps:

  • Initialize the audio player when the WebSocket connection opens.
  • Queue audio responses received from EVI for playback.
  • Dispose of the audio player when the WebSocket connection closes to release resources.

Initialize the player within your handleOpen event handler, after starting audio capture:

Initialize player on open
1import { EVIWebAudioPlayer } from "hume";
2
3let player = new EVIWebAudioPlayer();
4
5async function handleOpen() {
6 console.log("Socket opened");
7 recorder = await startAudioCapture(socket!);
8 await player.init();
9}

Update your handleMessage event handler to enqueue received audio responses for playback:

Enqueue EVI response audio for playback
1import type { SubscribeEvent } from "hume/api/resources/empathicVoice/resources/chat";
2
3// Define a WebSocket message event handler to play audio output
4async function handleMessage(msg: SubscribeEvent) {
5 switch (msg.type) {
6 case 'audio_output':
7 await player.enqueue(msg);
8 break;
9 }
10}

Finally, update your handleClose event handler to dispose of the audio player when the WebSocket disconnects:

Dispose player on close
1function handleClose(e: CloseEvent) {
2 console.log("Socket closed:", e);
3 recorder?.stream.getTracks().forEach((t) => t.stop());
4 recorder = null;
5 player?.dispose();
6}
5

Interruption

EVI will immediately stop sending further response messages and wait for the user’s new input when an interruption is detected. The client must then explicitly handle the interruption by stopping ongoing audio playback.

To stop audio playback on user interruption, update your handleMessage event handler to invoke EVIWebAudioPlayer.stop to stop audio playback when you receive either a user_message or user_interruption message:

Stop playback on interruption
1function handleMessage(msg: SubscribeEvent) {
2 switch (message.type) {
3 case 'user_message':
4 player.stop();
5 break;
6 case 'audio_output':
7 await player.enqueue(msg);
8 break;
9 case 'user_interruption':
10 player.stop();
11 break;
12 }
13}

Congratulations! You’ve successfully implemented a real-time conversational application using Hume’s Empathic Voice Interface (EVI). In this quickstart, you’ve learned the core aspects of authentication, WebSocket communication, audio streaming, playback handling, and interruption management.

Next, consider exploring these areas to enhance your EVI application:

  • Configuration: Check out the Configuration Guide for detailed instructions on how you can customize EVI for your application needs.
  • Chat history: Visit the Chat History Guide to learn how you can access and manage conversation transcripts and expression measures.

For further details and practical examples, explore the API Reference and our Hume API Examples repo.