For Microsoft Kool-Aid Drinkers, Non-paid MS Evangelists written by a Senior Consultant, Passionate about Tech

Text to Speech in Window Phone 8

imageDoes anyone out there recall the old cartoon, “Captain Caveman”?  Well, if you don’t recall it is probably because there is a great chance that you haven’t ever even heard of it.  He was basically a really hairy caveman, with a club, who could pull anything out of this hair and eat about anything you can imagine.  His club would also open up with miniature dinosaurs that would perform various tasks.  Here is a quick video in case you are interested.

So, you are probably wondering about now where I am going with this, huh?  Well, just as Captain Caveman could reach into his hair for goodies, so can we with the Windows Phone SDK.  One very powerful feature we can pull out is the Text to Speech API.  Windows Phone developers are given a SpeechSynthesizer object that you can use to add some very powerful functionality to your app.  It can provide subtle user feedback as well as very elaborate speech feedback.

At the time of this writing, SpeechSynthesizer supports 15 languages and each supports both a female and male voice.  You can view these languages in your phones settings under Speech.  In the Speech settings you will see the a number of settings, but the important ones to note are the Text to Speech voice and the Speech Language.  These settings will be what the SpeechSynthesizer uses as its default.  Your app will have the ability to change it programmatically as well.

SettingsSpeechSpeechSettings

Okay, enough about cartoons and gibberish … let’s get started with a simple Windows Phone 8 app that uses the Text to Speech API.

First let’s fire up Visual Studio 2012 and select a blank Windows Phone App project.

VisualStudioPhoneProject

Select the Windows Phone 8 platform for this example.

image

Let’s quickly create a simple UI for our app.  Open up MainPage.xaml and remove all the comments and everything inside the main LayoutRoot grid.  It should look something like this:

   1: <phone:PhoneApplicationPage
   2:     x:Class="PhoneApp2.MainPage"
   3:     xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
   4:     xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
   5:     xmlns:phone="clr-namespace:Microsoft.Phone.Controls;assembly=Microsoft.Phone"
   6:     xmlns:shell="clr-namespace:Microsoft.Phone.Shell;assembly=Microsoft.Phone"
   7:     xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
   8:     xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
   9:     mc:Ignorable="d"
  10:     FontFamily="{StaticResource PhoneFontFamilyNormal}"
  11:     FontSize="{StaticResource PhoneFontSizeNormal}"
  12:     Foreground="{StaticResource PhoneForegroundBrush}"
  13:     SupportedOrientations="Portrait" Orientation="Portrait"
  14:     shell:SystemTray.IsVisible="True">
  15:
  16:     <Grid x:Name="LayoutRoot" Background="Transparent">
  17:     </Grid>
  18:
  19: </phone:PhoneApplicationPage>

Next, I will add a pivot control (optional.  I might use this to expand this project in my next post), a textbox and two buttons.  The code should like similar to the following:

   1: <phone:PhoneApplicationPage
   2:     x:Class="PhoneApp2.MainPage"
   3:     xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
   4:     xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
   5:     xmlns:phone="clr-namespace:Microsoft.Phone.Controls;assembly=Microsoft.Phone"
   6:     xmlns:shell="clr-namespace:Microsoft.Phone.Shell;assembly=Microsoft.Phone"
   7:     xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
   8:     xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
   9:     mc:Ignorable="d"
  10:     FontFamily="{StaticResource PhoneFontFamilyNormal}"
  11:     FontSize="{StaticResource PhoneFontSizeNormal}"
  12:     Foreground="{StaticResource PhoneForegroundBrush}"
  13:     SupportedOrientations="Portrait" Orientation="Portrait"
  14:     shell:SystemTray.IsVisible="True">
  15:
  16:     <Grid x:Name="LayoutRoot" Background="Transparent">
  17:
  18:         <phone:Pivot HorizontalAlignment="Left" Title="Speech Example" VerticalAlignment="Top">
  19:             <phone:PivotItem CacheMode="{x:Null}" Header="Text">
  20:                 <Grid x:Name="ContentPanel" Grid.Row="1" Margin="10,10,14,-10">
  21:                     <TextBox x:Name="tbTextToConvert" HorizontalAlignment="Left" Height="232" Margin="10,10,0,0" TextWrapping="Wrap" Text="TextBox" VerticalAlignment="Top" Width="412"/>
  22:                     <Button Content="Speak in English" HorizontalAlignment="Left" VerticalAlignment="Top" Margin="10,242,0,0" Click="SpeakTextToEnglish" Width="412"/>
  23:                     <Button Content="Speak in Spanish" HorizontalAlignment="Left" VerticalAlignment="Top" Margin="10,314,0,0" Click="SpeakTextToSpanish" Width="412"/>
  24:                 </Grid>
  25:             </phone:PivotItem>
  26:             <phone:PivotItem CacheMode="{x:Null}" Header="Voice">
  27:                 <Grid/>
  28:             </phone:PivotItem>
  29:         </phone:Pivot>
  30:
  31:     </Grid>
  32:
  33: </phone:PhoneApplicationPage>

You will notice that the buttons have click events assigned that look like this in XAML:

   1: <Button Content="Speak in English" HorizontalAlignment="Left" VerticalAlignment="Top" Margin="10,242,0,0" Click="SpeakTextToEnglish" Width="412"/>
   2: <Button Content="Speak in Spanish" HorizontalAlignment="Left" VerticalAlignment="Top" Margin="10,314,0,0" Click="SpeakTextToSpanish" Width="412"/>

You will need to add the event handlers into the MainPage.cs. You can do this a number of ways.  One way would be to right click on the click event name and select Navigate to Event Handler for each and it will auto generate the event for you if it isn’t already there.  If it is, you will just be taken to the code behind for that event handler.

NavToEventHandler

The two event handlers in MainPage.cs should like similar to this:

   1: private void SpeakTextToEnglish(object sender, RoutedEventArgs e)
   2: {
   3:
   4: }
   5:
   6: private void SpeakTextToSpanish(object sender, RoutedEventArgs e)
   7: {
   8:
   9: }

If you were to run the application now, you would get something like this:

SpeechExampleTextScreen

Now, before we do any coding, we want to make sure that we turn on the speech capabilities flag in the app manifest.  To do this, you will need to open up the WMAppManifest.xml file in the properties folder of the solution.  Simply double click on the file and it will open.

OpenAppManifest

Next, select the Capabilities tab:

SelectCapTab

Finally, select the ID_CAP_SPEECH_RECOGNITION option in the list, save and close the WMAppManifest.xml file.  That should be all we need to enable the application to use speech Recognition.

SelectCapCheckbox

Ok, let’s get down to the “nitty-gritty” of our code.  The first thing we want to do is put code behind our SpeakTextToEnglish event handler.  So, open up MainPage.cs and let’s start coding.  Now, the great thing here is that we really only need to have a couple lines of code (excluding a comment) if we want to use the default speech settings on the device.  Check this out:

   1: private async void SpeakTextToEnglish(object sender, RoutedEventArgs e)
   2: {
   3:     SpeechSynthesizer synth = new SpeechSynthesizer();
   4:     // use the default voice for the device and use await/async to wait until the text is spoken
   5:     await synth.SpeakTextAsync(tbTextToConvert.Text);
   6: }

Amazing huh?  Notice all I did was create a SpeechSynthesizer object and made an await call to SpeakTextAsync().  Okay, I guess I did add async to the method declaration, but, uh, come on … that isn’t much!

Let’s run the app.  Once it is up and running, without doing anything, simply press the Speak in English button. Bang!  Now, type in something and press the Speak in English button again.  Wahoo!

Okay, so, what if you want to change the voice on the fly.  Well, that isn’t a whole lot harder actually.  Let’s go to your SpeakTextToSpanish event handler and add the following code:

   1: private async void SpeakTextToSpanish(object sender, RoutedEventArgs e)
   2:         {
   3:             SpeechSynthesizer synth = new SpeechSynthesizer();
   4:             // Grab and set the spanish male voice
   5:             var voice = (from x in InstalledVoices.All
   6:                          where x.Language == "es-MX" &&
   7:                          x.Gender == VoiceGender.Male
   8:                          select x).FirstOrDefault();
   9:             synth.SetVoice(voice);
  10:
  11:             // speak the new voice
  12:             // NOTE:  Remember, this is going to speak english text in a spanish main voice.  
  13:             // However, it isn't going to translate it.  You would need to translate it to another language first.
  14:             await synth.SpeakTextAsync(tbTextToConvert.Text);
  15:         }

Run the app again.  What happens?  Sounds like a Mexican speaking English right?  You can see that we create a a new SpeechSynthesizer object, grabbed the installed voice for a Male Spanish/Mexican out of the InstalledVoices.All object, set the new voice with SetVoice() and called SpeakTextAsync() with the text.  What happened?  What is up with this guy?  Well, that is because you have asked him to speak English.  You gave him English text to speak.  In order to get him to actually speak in Spanish, you would need to actually give him Spanish text.  You didn’t ask him to translate from English to Spanish.  You can use a number of different services to make translations in code.  I personally would use Bing Translator.  It has both a website enabled translation service as well as a developer API.

For our example, lets just use the website to translate our text and then paste it into the call to SpeakTextAsync().  Like this…

   1: await synth.SpeakTextAsync("Hola, mi nombre es Edward Glogowski.");

Now run the app and click on the SpeakTextToSpanish.  Perfecto,?

So, to summarize, you can add text to speech functionality to your Windows Phone app with very little effort.  Though translations would take extra work, the services are available to make a very robust application using just the Text to Speech API.  By now, you can imagine the number of things you can do with Text to Speech in your app.  Read labels, textboxes, chat messages, etc.  If you create something, send it my way, I would love to hear about it!

In my next post, I will show you how to use voice commands to call into your app when it isn’t running.  Maybe I will introduce you to another cartoon character you might not have known existed.  So come back and lets do some talking to your phone!

 
Comments
 
Comments

No comments yet.

Leave a Reply

You must be logged in to post a comment.