For Microsoft Kool-Aid Drinkers, Non-paid MS Evangelists written by a Senior Consultant, Passionate about Tech

Voice Commands in Windows Phone 8

FelixTo play off of my last post on Text to Speech in Windows Phone, I thought I would continue the theme of cartoon characters.  So, anyone out there heard of Felix the Cat?  Felix the Cat was a funny cartoon character in the silent film era.  He always found himself in a fix and needed to resort to his bag of tricks.  Well, let’s jump into the Windows Phone SDK bag of tricks and see how we can send voice commands to our Windows Phone 8 application.

Windows Phone 7.x introduced simple voice commands such as “Open Ebay”, “Call Ed Glogowski”, “Find food in Apple Valley”, “Text John Cannon” or even “Note It is my wife’s birthday on Friday”.

Windows Phone 8 has given developers the option to extend the voice commands to call directly into their application.

Let’s build on the solution we built in my previous post, Text to Speech in Windows Phone, and add voice commands to open up the app to various pivot locations on the main page, text and voice.  In fact, if it is the voice pivot, lets put the command text into a TextBlock field, hear the text automatically and allow the user to hear the text again in the default language by pressing a button.  How does that sound?

Once you have our old solution loaded up and ready, the first thing we need to do is create a new Voice Command Definition file to our project.  To do this, right-click on project for a context menu, select Add and then select New Item.


From the New Item Dialog, select Voice Command Definition from the Installed/Visual C# tree.  Name it something like VoiceDef.xml.


Now, let’s open up the voice command definition file.  Open the file. If you skip down to the CommandSet element, you will see that the first two elements are CommandPrefix and  Example.  CommandPrefix is the prefix to your voice command identifying your application.  The Example element is a very simple example that gets displayed to the user on how to use the voice command with the prefix for your application.  So for now, lets put in “Speak App” for the CommandPrefix and “Go to text…” for the Example.  It should look something like this:

   1: <?xml version="1.0" encoding="utf-8"?>


   3: <VoiceCommands xmlns="">

   4:   <CommandSet xml:lang="en-US">

   5:     <CommandPrefix>Speak App</CommandPrefix>

   6:     <Example> Go to text... </Example>

Okay, now we need to add a couple commands that will allow us to actually call into our app.  Go ahead and remove all the commands in the CommandSet element and add the following:

   1: <Command Name="GotoText">

   2:   <Example> Go to text </Example>

   3:   <ListenFor> go to text</ListenFor>

   4:   <ListenFor> [and] go to text </ListenFor>

   5:   <Feedback> Going to text... </Feedback>

   6:   <Navigate />

   7: </Command>


   9: <Command Name="GotoVoice">

  10:   <Example> Go to voice </Example>

  11:   <ListenFor> go to voice</ListenFor>

  12:   <ListenFor> [and] go to voice </ListenFor>

  13:   <Feedback> Going to voice... </Feedback>

  14:   <Navigate Target="/MainPage.xaml?pivot=voice"/>

  15: </Command>

Now, lets talk a little about what we just added.  First, each command has a unique name, so we added two commands: GotoText and GotoVoice.  Probably don’t need to call out the obvious, but GotoText will go to our MainPage and go to the Text pivot item, while the GotoVoice command will go to our MainPage and select the Voice pivot item.

The Example elements specify what is displayed when the user prompts for the list of voice commands for an application.

The ListenFor elements are where the magic happens.  The ListenFor element defines what it is listening for after the CommandPrefix.  You will see that for the Command “GotoText”, we have one ListenFor element that checks to see if “go to text” was spoken after the Prefix.  The other ListenFor is almost the same, but has a “[and]”.  Any word in []’s is considered optional.  So, in this case, both of these voice commands would fire this command: “Speak App go to text” and “Open Speak App and go to text”.  Optional words are used to allow the user to speak more naturally.  There are other parameters you can use within the ListenFor element, but I won’t go into those right now.

The Feedback element is used to display to the user after the command has been recognized and is attempting to load your app.

The Navigate element is optional.  You will see that I did not include one in the first command, but did in the second.  For the first command, in our apps design, we just simply want to go to the main page text pivot.  Since that is what happens when you start the app anyway, there isn’t much to do.  However, you will notice that I did put one in for the second command.  Now, as you can see, we could put a URI to any page in our app, but again, by design of the app, we are choosing to go to the voice pivot on the main page.  So, what I did was use the URI to the main page and included a parameter for the pivot we want to display on load.

Okay, that was a lot of talking, or should I say typing, can we see some cool stuff yet?  Well, not quite.  In order to use the new voice commands we just created, we will need to install the file when the app starts up.  To do this, lets create a new function, SetupVoiceCommands() and call it from the constructor for MainPage.  The function will simply install the command set from file via URI.  Which is awesome, because that means we could load it from any URI and at almost anytime in the application.  This might be useful for applications like games where the user gets more commands based on how far they are in the game.  The code should look something like the following:

   1: public MainPage()

   2: {

   3:    InitializeComponent();

   4:    SetupVoiceCommands();


   6:    // Sample code to localize the ApplicationBar

   7:    //BuildLocalizedApplicationBar();

   8: }


  10: public async void SetupVoiceCommands()

  11: {

  12:    await VoiceCommandService.InstallCommandSetsFromFileAsync(new Uri("ms-appx:///VoiceDef.xml", UriKind.RelativeOrAbsolute));

  13: }

Now let’s make a couple of UI changes.  First, lets update our voice pivot in MainPage.xaml.  Give it a name so that we can reference it easily in the code behind.  I called it “pivotMain”.  Add a TextBlock and a Button.  We will use the TextBlock to display the recognized command and a Button to have the application speak the text in the TextBlock to us.  I named the TextBlock “tVoice”.  So, the XAML for the Pivot control should look like this:

   1: <phone:Pivot x:Name="pivotMain" HorizontalAlignment="Left" Title="Speech Example" VerticalAlignment="Top">

   2:     <phone:PivotItem CacheMode="{x:Null}" Header="Text">

   3:         <Grid x:Name="ContentPanel" Grid.Row="1" Margin="10,10,14,-10">

   4:             <TextBox x:Name="tbTextToConvert" HorizontalAlignment="Left" Height="232" Margin="10,10,0,0" TextWrapping="Wrap" Text="TextBox" VerticalAlignment="Top" Width="412"/>

   5:             <Button Content="Speak in English" HorizontalAlignment="Left" VerticalAlignment="Top" Margin="10,242,0,0" Click="SpeakTextToEnglish" Width="412"/>

   6:             <Button Content="Speak in Spanish" HorizontalAlignment="Left" VerticalAlignment="Top" Margin="10,314,0,0" Click="SpeakTextToSpanish" Width="412"/>

   7:         </Grid>

   8:     </phone:PivotItem>

   9:     <phone:PivotItem CacheMode="{x:Null}" Header="Voice">

  10:         <Grid Margin="10,10,14,-10">

  11:             <TextBlock x:Name="tVoice" Text="No Text" Margin="0,0,0,77"/>

  12:             <Button Content="Speak" HorizontalAlignment="Left" VerticalAlignment="Top" Click="SpeakText" Width="426" Margin="10,314,0,0"/>

  13:         </Grid>

  14:     </phone:PivotItem>

  15: </phone:Pivot>

You can see that I added a copy click event called SpeakText that basically is a copy and past from the other event handlers (I know, create a common function for the redundant code.  But this is only a demo) and it looks like this:

   1: private async void SpeakText(object sender, RoutedEventArgs e)

   2: {

   3:     SpeechSynthesizer synth = new SpeechSynthesizer();

   4:     // use the default voice for the device and use await/async to wait until the text is spoken

   5:     await synth.SpeakTextAsync(tVoice.Text);

   6: }

Now, the only thing left to do is to take care of what happens when we get to the page after speaking the voice command.  All we do is override OnNavigateTo() to handle what was passed into the URI for that page.  You will see that all we do is check to see if there is a “pivot” parameter.  If there is one, we check to see if it’s value is “voice”.  If it is, we set the pivot to the Voice pivot.

Next, we check for a “reco” parameter that the voice command sends back with the text that it found for the command.  Now, with that, we are free to do whatever we want with it.  In our case, we are going to put it into the TextBlock and then use Text to Speech to play the command back to us.  Well, with a little extra text for fun.  See the code below:

   1: protected async override void OnNavigatedTo(NavigationEventArgs e)

   2: {

   3:     base.OnNavigatedTo(e);

   4:     if (e.NavigationMode == System.Windows.Navigation.NavigationMode.New)

   5:     {

   6:         if (NavigationContext.QueryString.Keys.Contains("pivot"))

   7:         {

   8:             string page = NavigationContext.QueryString["pivot"];

   9:             if (page.Equals("voice", StringComparison.OrdinalIgnoreCase))

  10:             {

  11:                 pivotMain.SelectedIndex = 1;

  12:                 if (NavigationContext.QueryString.Keys.Contains("reco"))

  13:                 {

  14:                     string text = NavigationContext.QueryString["reco"];

  15:                     tVoice.Text = text;

  16:                     // Speak the Text to us

  17:                     SpeechSynthesizer synth = new SpeechSynthesizer();

  18:                     // use the default voice for the device and use await/async to wait until the text is spoken

  19:                     await synth.SpeakTextAsync("You spoke to me.  Here is what you said " + text);

  20:                 }

  21:             }

  22:         }

  23:     }

  24: }

Now, run the app.  Once it is loaded.  Tap on the Windows button to go to the Start Screen.,  From there, hold the Windows key down and when prompted, say, “Speak App go to voice.”  Your program should have loaded to the “voice” pivot and spoke to you the following: “You spoke to me.  Here is what you said Speak App go to voice.”

So, as you can see, there might have been a lot of text above explaining what we did, but actually, there was very little code.  This should be exciting to you as a developer.  Just think how easy it is to add voice commands to your Windows Phone and provide much quicker access to your application and it’s functionality.

Hope you enjoyed the article.


No comments yet.

Leave a Reply

You must be logged in to post a comment.