Using Microsoft Translator Service – integrated with Speech

I’m wrapping up my last module for Pluralsight on Optical Character Recognition and in the last module I had the pleasure of playing around with Microsoft Translator Service. So why not share this with you? As soon as I find some time I’ll upload the code as a MSDN sample as well as on Github.

Originally I was going to share a full OCR demo in this post, with translation and speech. But I’ll leave it up to you to put that together as I’ve already blogged a great deal about Optical Character Recognition (with WinRT and JS + node).

Here is a dense how-to:

Sign up for the service using you MS account. The service is a pay for service, but you can get a certain amount of characters translated for free before you decide on paying. In an application you could charge the user to make up for the cost, using a top-up model or a monthly fee with a max amount of character translated a month.

After you have signed, sign in and register a new application providing a few details. It’s similar to how you would do with Bing Maps if you have used the maps :)

Once you have done that you are ready to use the service.

The flow goes like this:

You make an oauth query providing the client id and secret, from which in return you get a token (as well as expiry time, token type and scope).

When you have that you can make a new query, this time for the translation itself. Using an authentication header with the token you pass in the text, language the text is in and language you wish to translate to.

The call returns some sexy XML which you need to parse, and the first node of the root contains the translated text.

To add speech with Windows Runtime you simply create a new instance of the SpeechSynthesizer class, making sure to wrap it in a using statement to properly dispose the object afterwards. Set desired voice, or skip that part and let it use the default voice instead.

Await the SynthesizeTextToStreamAsync() method, passing in the text you want spoken out. We now have our audio stream and need a way to play it. Create a new instance of the MediaElement class and set its source to the stream, passing in the stream content type as the second parameter. And that’s it.

Now lets’ take a look at the code- and pardon for just adding everything in big method with a bunch of private read only fields. It’s for demo purposes ;) Since my WP plugin has been acting up I’m adding some images of the code as well just in case.

I’ve first of all set a private field of the HttpClient type found in the using System.Net.Http namespace, and it’s initialized in the constructor.

[sourcecode language=“csharp”]
public MainPage()
{
this.InitializeComponent();
_httpClient = new HttpClient();
}

    private readonly HttpClient \_httpClient;

[/sourcecode]

Then defined the strings I will need later. Again, they are just added like this for demo purposes. We need five of them, the client id, the secret, from those two we can piece together the request, then there is the oauth url, and the service url.

[sourcecode language=“csharp”]
static readonly string _id = WebUtility.UrlEncode(“MyID”);

    static readonly string \_secret = WebUtility.UrlEncode("Secret");  

    readonly string \_requestString = string.Format(@"grant\_type=client\_credentials  
                                        &client\_id={0}  
                                        &client\_secret={1}  
                                        &scope=http://api.microsofttranslator.com",  
                                                            \_id, \_secret);  
    readonly string \_oauthURL = "https://datamarket.accesscontrol.windows.net/v2/OAuth2-13";  

    readonly string \_apiUrl = "http://api.microsofttranslator.com/v2/Http.svc/Translate?text=";

[/sourcecode]

We can now start making the calls- starting with the token call. Both calls are wrapped in a try catch, which is omitted here but further down where you have the full example you can see it. A using statement makes sure we are disposing the client after we have finished with our calls. I’m grabbing the token by using the ‘native’ JsonValue type and the Parse() method. Didn’t quite make sense to pull in JSON.Net for just a simple parse in this demo.

[sourcecode language=“csharp”]
using (_httpClient)
{
var content = new StringContent(_requestString, Encoding.UTF8, “application/x-www-form-urlencoded”);

var response = await _httpClient.PostAsync(_oauthURL, content);

response.EnsureSuccessStatusCode();

var responseBodyAsText = await response.Content.ReadAsStringAsync();

var accessToken = JsonValue.Parse(responseBodyAsText).GetObject()[“access_token”].GetString();

[/sourcecode]

I only grabbed the token in this example, but the return gives us the following:

With the access token in place we can do the final call, and do notice I set the to-and-from language as hardcoded variables – you obviously want to pass this in the method and let the user select.

[sourcecode language=“csharp”]

var fromTo = “ja”;
var to = “en”;

var uri = string.Format("{0}{1}&from={2}&to={3}", _apiUrl, textToTranslate, fromTo, to);

_httpClient.DefaultRequestHeaders.Authorization =
new AuthenticationHeaderValue(“Bearer”, accessToken);

var translationResponse = await _httpClient.GetAsync(uri);

var transl = await translationResponse.Content.ReadAsStringAsync();

var parsedText = XDocument.Parse(transl).Root.FirstNode.ToString();

[/sourcecode]

Next part is the speech:

[sourcecode language=“csharp”]
public async Task SynthesizeTextToSpeachAsync(string text)
{
using (var speechSynthesizer = new SpeechSynthesizer())
{
speechSynthesizer.Voice = SpeechSynthesizer.AllVoices.First(x => x.Gender == VoiceGender.Female);

    var stream = await speechSynthesizer.SynthesizeTextToStreamAsync(text);  

    new MediaElement().SetSource(stream, stream.ContentType);  
}

}

[/sourcecode]

We can then after parsing the response from the service call go ahead and call the speech method, making sure we dispatch to the right thread:

[sourcecode language=“csharp”]

await this.Dispatcher.RunAsync(CoreDispatcherPriority.Normal,
() =>
{
translation.Text = parsedText;
});

await SynthesizeTextToSpeachAsync(parsedText);

[/sourcecode]

The UI is a simple TextBox for input, TextBlock for output, and a button, the translation code is in the event handler of the button click event. Here is the full code:

[sourcecode language=“csharp”]
public MainPage()
{
this.InitializeComponent();
_httpClient = new HttpClient();
}

    private readonly HttpClient \_httpClient;  

    static readonly string \_id = WebUtility.UrlEncode("OcrDemoiris");  
    static readonly string \_secret = WebUtility.UrlEncode("MyOwnUniqueSecretOnlyIKnow");  

    readonly string \_requestString = string.Format(@"grant\_type=client\_credentials  
                                        &client\_id={0}  
                                        &client\_secret={1}  
                                        &scope=http://api.microsofttranslator.com",  
                                                            \_id, \_secret);  
    readonly string \_oauthURL = "https://datamarket.accesscontrol.windows.net/v2/OAuth2-13";  

    readonly string \_apiUrl = "http://api.microsofttranslator.com/v2/Http.svc/Translate?text=";  

    private async void OnTranslate(object sender, RoutedEventArgs eventArgs)  
    {  
        try  
        {  
            using (\_httpClient)  
            {  
            var content = new StringContent(\_requestString, Encoding.UTF8, "application/x-www-form-urlencoded");  

            var response = await \_httpClient.PostAsync(\_oauthURL, content);  

            response.EnsureSuccessStatusCode();  

            var responseBodyAsText = await response.Content.ReadAsStringAsync();  

            #region Not used in this example  

            var root = JsonValue.Parse(responseBodyAsText).GetObject();  
            var tokenType = root["token\_type"].GetString();  
            var expiresIn = root["expires\_in"].GetString();  
            var scope = root["scope"].GetString();  

            #endregion  

            var accessToken = JsonValue.Parse(responseBodyAsText).GetObject()["access\_token"].GetString();  

            var textToTranslate = WebUtility.UrlEncode(input.Text);  

            var fromTo = "ja";  
            var to = "en";  

            var uri = string.Format("{0}{1}&from={2}&to={3}", \_apiUrl, textToTranslate, fromTo, to);  

            \_httpClient.DefaultRequestHeaders.Authorization =  
                new AuthenticationHeaderValue("Bearer", accessToken);  

            var translationResponse = await \_httpClient.GetAsync(uri);  

            var transl = await translationResponse.Content.ReadAsStringAsync();  

            var parsedText = XDocument.Parse(transl).Root.FirstNode.ToString();  

            await this.Dispatcher.RunAsync(CoreDispatcherPriority.Normal,  
                () =>  
                {  
                    translation.Text = parsedText;  
                });  

            await SynthesizeTextToSpeachAsync(parsedText);  
            }  
        }  
        catch (HttpRequestException h) { }  
        catch (Exception e) { }  
    }  

    public async Task SynthesizeTextToSpeachAsync(string text)  
    {  
        using (var speechSynthesizer = new SpeechSynthesizer())  
        {  
            speechSynthesizer.Voice = SpeechSynthesizer.AllVoices.First(x => x.Gender == VoiceGender.Female);  

            var stream = await speechSynthesizer.SynthesizeTextToStreamAsync(text);  

            new MediaElement().SetSource(stream, stream.ContentType);  
        }  
    }

[/sourcecode]

Using Microsoft Translator Service – integrated with Speech

The flow goes like this:

Comments