TTS .NET Quickstart Guide

This guide shows how to get started using Hume’s Text-to-Speech capabilities in .NET using Hume’s .NET SDK. It demonstrates:

Converting text to speech with a new voice.
Saving a voice to your voice library for future use.
Giving “acting instructions” to modulate the voice.
Generating multiple variations of the same text at once.
Providing context to maintain consistency across multiple generations.

The complete code for the example in this guide is available on GitHub.

Environment Setup

Create a new .NET project and install the required packages:

dotnet CLI

Visual Studio

dotnet CLI

$ dotnet new console -n TtsCsharpQuickstart
> cd TtsCsharpQuickstart
> dotnet add package HumeApi

Authenticating the HumeApiClient

You must authenticate to use the Hume TTS API. Your API key can be retrieved from the Hume AI platform.

This example uses environment variables. Set your API key as an environment variable:

Environment Variable

$ # On Windows
> set HUME_API_KEY=your_api_key_here
> 
> # On macOS/Linux
> export HUME_API_KEY=your_api_key_here

Then create a new file Program.cs and use your API key to instantiate the HumeApiClient.

1 using System;
2 using HumeApi;
3 
4 var apiKey = Environment.GetEnvironmentVariable("HUME_API_KEY");
5 if (string.IsNullOrEmpty(apiKey))
6 {
7     throw new InvalidOperationException("HUME_API_KEY not found in environment variables.");
8 }
9 
10 var client = new HumeApiClient(apiKey);

Helper function

Define a function to aid in writing generated audio to a temporary file:

1 using System.IO;
2 using System.Threading.Tasks;
3 
4 // Create an output directory in the temporary folder
5 var timestamp = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();
6 var outputDir = Path.Combine(Path.GetTempPath(), $"hume-audio-{timestamp}");
7 Directory.CreateDirectory(outputDir);
8 
9 Console.WriteLine($"Results will be written to {outputDir}");
10 
11 static async Task WriteResultToFile(string base64EncodedAudio, string filename, string outputDir)
12 {
13     var filePath = Path.Combine(outputDir, $"{filename}.wav");
14     // Decode the base64-encoded audio data
15     var audioData = Convert.FromBase64String(base64EncodedAudio);
16     await File.WriteAllBytesAsync(filePath, audioData);
17     Console.WriteLine($"Wrote {filePath}");
18 }

Calling Text-to-Speech

To use Hume TTS, you can call client.Tts.SynthesizeJsonAsync with a SynthesizeJsonRequest containing a list of utterances. Inside each utterance, put the Text to speak, and optionally provide a Description of how the voice speaking the text should sound. If you don’t provide a description, Hume will examine Text and attempt to determine an appropriate voice.

The base64-encoded bytes of an audio file with your speech will be present at .Generations[0].Audio in the returned object. By default, there will only be a single variation in the .Generations array, and the audio will be in wav format.

The .Generations[0].GenerationId field will contain an ID you can use to refer to this specific generation of speech in future requests.

1 using HumeApi.Tts;
2 
3 var speech1 = await client.Tts.SynthesizeJsonAsync(new SynthesizeJsonRequest
4 {
5     Body = new PostedTts
6     {
7         Utterances = new List<PostedUtterance>
8         {
9             new PostedUtterance
10             {
11                 Description = "A refined, British aristocrat",
12                 Text = "Take an arrow from the quiver."
13             }
14         }
15     }
16 });
17 
18 await WriteResultToFile(speech1.Generations.First().Audio, "speech1_0", outputDir);

Saving voices

Use client.Tts.Voices.CreateAsync to save the voice of a generated piece of audio to your voice library for future use:

1 var name = $"aristocrat-{DateTimeOffset.UtcNow.ToUnixTimeSeconds()}";
2 var generationId = speech1.Generations.First().GenerationId;
3 
4 await client.Tts.Voices.CreateAsync(new PostedVoice
5 {
6     Name = name,
7     GenerationId = generationId
8 });

Continuity

Inside an utterance, specify the name or ID of a voice to generate more speech from that voice.

To generate speech that is meant to follow previously generated speech, specify Context with the GenerationId of that speech.

You can specify a number up to 5 in NumGenerations to generate multiple variations of the same speech at the same time.

1 var speech2 = await client.Tts.SynthesizeJsonAsync(new SynthesizeJsonRequest
2 {
3     Body = new PostedTts
4     {
5         Utterances = new List<PostedUtterance>
6         {
7             new PostedUtterance
8             {
9                 // Using a voice from your voice library
10                 Voice = new PostedUtteranceVoiceWithName { Name = name },
11                 Text = "Now take a bow."
12             }
13         },
14         // Providing previous context to maintain consistency.
15         // This should cause "bow" to rhyme with "toe" and not "cow".
16         Context = new PostedContextWithGenerationId { GenerationId = generationId },
17         NumGenerations = 2
18     }
19 });
20 
21 await WriteResultToFile(speech2.Generations.First().Audio, "speech2_0", outputDir);
22 await WriteResultToFile(speech2.Generations.Skip(1).First().Audio, "speech2_1", outputDir);

Acting Instructions

If you specify both Voice and Description, the Description field will behave as “acting instructions”. It will keep the character of the specified Voice, but modulated to match Description.

1 var speech3 = await client.Tts.SynthesizeJsonAsync(new SynthesizeJsonRequest
2 {
3     Body = new PostedTts
4     {
5         Utterances = new List<PostedUtterance>
6         {
7             new PostedUtterance
8             {
9                 Voice = new PostedUtteranceVoiceWithName { Name = name },
10                 Description = "Murmured softly, with a heavy dose of sarcasm and contempt",
11                 Text = "Does he even know how to use that thing?"
12             }
13         },
14         Context = new PostedContextWithGenerationId 
15         { 
16             GenerationId = speech2.Generations.First().GenerationId 
17         },
18         NumGenerations = 1
19     }
20 });
21 
22 await WriteResultToFile(speech3.Generations.First().Audio, "speech3_0", outputDir);

Streaming speech

You can stream utterances using the SynthesizeJsonStreamingAsync method. This allows you to process audio chunks as they become available rather than waiting for the entire speech generation to complete.

You can either write these chunks to files as we’ve done above, or play them in real-time with an audio player. Below is an example of real-time playback using a pipe-based streaming audio player:

1 using System.Diagnostics;
2 using System.Runtime.InteropServices;
3 using System.Collections.Concurrent;
4 using System.Threading;
5 
6 // Real-time streaming audio player using pipe-based approach
7 public class StreamingAudioPlayer : IDisposable
8 {
9     private Process? _audioProcess;
10     private int _chunkCounter = 0;
11     private bool _isStreaming = false;
12 
13     public Task StartStreamingAsync()
14     {
15         _isStreaming = true;
16         StartAudioProcess();
17         Console.WriteLine("Streaming audio player started...");
18         return Task.CompletedTask;
19     }
20 
21     public Task SendAudioAsync(byte[] audioBytes)
22     {
23         if (!_isStreaming || _audioProcess?.HasExited != false) return Task.CompletedTask;
24         
25         try
26         {
27             _audioProcess?.StandardInput.BaseStream.Write(audioBytes, 0, audioBytes.Length);
28             _audioProcess?.StandardInput.BaseStream.Flush();
29         }
30         catch (Exception ex)
31         {
32             Console.WriteLine($"Error sending audio chunk: {ex.Message}");
33         }
34         
35         return Task.CompletedTask;
36     }
37 
38     public async Task StopStreamingAsync()
39     {
40         _isStreaming = false;
41         
42         try
43         {
44             if (_audioProcess != null && !_audioProcess.HasExited)
45             {
46                 _audioProcess.StandardInput.Close();
47                 await _audioProcess.WaitForExitAsync();
48             }
49         }
50         catch (Exception ex)
51         {
52             Console.WriteLine($"Error stopping audio process: {ex.Message}");
53         }
54         
55         Console.WriteLine("Streaming audio player stopped.");
56     }
57 
58     private void StartAudioProcess()
59     {
60         try
61         {
62             var startInfo = new ProcessStartInfo
63             {
64                 FileName = "ffplay",
65                 Arguments = "-nodisp -autoexit -infbuf -i -",
66                 UseShellExecute = false,
67                 CreateNoWindow = true,
68                 RedirectStandardInput = true,
69                 RedirectStandardError = true,
70                 RedirectStandardOutput = true
71             };
72             
73             _audioProcess = Process.Start(startInfo);
74             
75             if (_audioProcess == null)
76             {
77                 throw new InvalidOperationException("Failed to start ffplay process");
78             }
79             
80             _audioProcess.ErrorDataReceived += (sender, e) => {
81                 if (!string.IsNullOrEmpty(e.Data))
82                     Console.WriteLine($"ffplay: {e.Data}");
83             };
84             _audioProcess.BeginErrorReadLine();
85         }
86         catch (Exception ex)
87         {
88             Console.WriteLine($"Failed to start ffplay: {ex.Message}");
89             Console.WriteLine("Please install ffmpeg to enable streaming audio playback.");
90         }
91     }
92 
93     public void Dispose()
94     {
95         try
96         {
97             if (_audioProcess != null && !_audioProcess.HasExited)
98             {
99                 _audioProcess.Kill();
100             }
101             _audioProcess?.Dispose();
102         }
103         catch { }
104     }
105 }
106 
107 private static StreamingAudioPlayer GetStreamingAudioPlayer()
108 {
109     return new StreamingAudioPlayer();
110 }
111 
112 // Streaming example with real-time audio playback
113 Console.WriteLine("Streaming audio in real-time...");
114 var voice = new PostedUtteranceVoiceWithName { Name = name };
115 
116 using var streamingPlayer = GetStreamingAudioPlayer();
117 await streamingPlayer.StartStreamingAsync();
118 
119 await foreach (var snippet in client.Tts.SynthesizeJsonStreamingAsync(new PostedTts
120 {
121     Context = new PostedContextWithGenerationId 
122     { 
123         GenerationId = speech3.Generations.First().GenerationId 
124     },
125     Utterances = new List<PostedUtterance>
126     {
127         new PostedUtterance { Text = "He's drawn the bow...", Voice = voice },
128         new PostedUtterance { Text = "he's fired the arrow...", Voice = voice },
129         new PostedUtterance { Text = "I can't believe it! A perfect bullseye!", Voice = voice }
130     },
131     Format = new Format(new Format.Wav()),
132     StripHeaders = true,
133 }))
134 {
135     await streamingPlayer.SendAudioAsync(Convert.FromBase64String(snippet.Audio));
136 }
137 
138 await streamingPlayer.StopStreamingAsync();

Running the Example

dotnet CLI

Visual Studio

$ dotnet run