Mar 292018
 
 March 29, 2018  Posted by at 10:42 am .NET Core, Azure, C# & F#  Add comments

My dear friend Jonas is currently learning Chinese as his partner is from China. She is learning Swedish, and he is learning Chinese. Since we now work together I get to hear about his progress pretty much every day, and on several occasions, he has expressed his frustration with the quality of translation and study apps. I remember when we went to Japan many years ago I made an OCR app that translated Japanese and I decided to see which languages Azure Translator API supports for speech translation. I ended up giving the translator API a go.

Here is how you can get started:

  1. Create an Azure Translator API resource

Log into the Azure portal and create a new Translator API resource. There is a free tier that you can use.

2018-03-28_17-00-40

  1. Generate an API key for authentication

There are two ways you can authenticate, by requesting a token or generating an API Key. The API key is easier to get started with so I’ve used that for the example below

2018-03-28_17-09-18

  1. Set up an authenticated websocket connection for the translation

Create a web socket connection and send the audio that you want to translate, making sure the audio file meets the requirements. Add a header to the client: Ocp-Apim-Subscription-Key and set the value to the key you generated earlier. Keep the connection open until you’ve received the response with the translation.

The audio file should be of the type .wav, use a 16000 Hz sample rate, be Mono and use a sample bit depth of 16

2018-03-28_17-41-54

  1. If you are requesting an audio translation the last message will be of the binary type and most likely it will be sent in chunks. When the EndOfMessage property is true the message is final and you can close and dispose the connection and streams

Here is the code. The example app is a console app that expects two arguments, the audio file it should translate and output folder.


using System;
using System.IO;
using System.Net.WebSockets;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using static System.Console;

namespace JonasApp
{
    class Program
    {
        static int Main(string[] args)
        {
            // Example paths:
            //var audioPath = @"C:\Users\IrisClasson\Music\irisAudio.wav";
            //var outPath = @"C:\Users\IrisClasson\Music";

            // To get supported languages and voices
            // https://dev.microsofttranslator.com/Languages?api-version=1.0&scope=text,speech,tts

            var from = "en-US";
            var to = "zh-CN";
            var voice = "zh-TW-Yating";

            if (args.Length != 2)
            {
                Error.WriteLine($"Usage: <.wav-file-path> ");
                return 1;
            }

            if (File.Exists(args[0]) && Directory.Exists(args[1]))
            {
                TranslateSpeechAsync(args[0], $"{args[1]}\\translation.wav", from, to, voice).Wait();
            }
            else
            {
                Error.WriteLine($"ERROR: '{args[0]}' must be a file");
                return 1;
            }

            return 0;
        }

        static readonly string _key = "YOUR KEY";
        static readonly string _baseUrl = $"wss://dev.microsofttranslator.com/speech/translate";

        public static string Connected = "Connection open";
        public static string AudioSent = "Audio sent";
        public static string Waiting = "Waiting for response";
        public static string Closed = "Connection closed";

        static async Task TranslateSpeechAsync(string inFile, string outFile, string from, string to, string voice)
        {
            using (var client = new ClientWebSocket())
            {
                client.Options.SetRequestHeader("Ocp-Apim-Subscription-Key", _key);

                var uri = $"{_baseUrl}?from={from}&to={to}&api-version=1.0&features=texttospeech&voice={voice}";

                await client.ConnectAsync(new Uri(uri), CancellationToken.None);

                WriteLine(Connected);

                var audioOut = new ArraySegment(File.ReadAllBytes(inFile));
                await client.SendAsync(audioOut, WebSocketMessageType.Binary, true, CancellationToken.None);

                WriteLine(AudioSent);

                var inBuffer = new byte[10000];
                var segment = new ArraySegment(inBuffer);
                var fileStream = new FileStream(outFile, FileMode.Create);

                WriteLine(Waiting);

                var keepGoing = true;

                while (client.State == WebSocketState.Open && keepGoing)
                {
                    var result = await client.ReceiveAsync(segment, CancellationToken.None);

                    if (result.MessageType == WebSocketMessageType.Text)
                    {
                        WriteLine(Encoding.UTF8.GetString(inBuffer));
                    }
                    else if (result.MessageType == WebSocketMessageType.Binary)
                    {
                        fileStream.Write(inBuffer, 0, result.Count);
                        keepGoing = !result.EndOfMessage;
                    }
                }

                if (client.State == WebSocketState.Open)
                {
                    await client.CloseAsync(WebSocketCloseStatus.NormalClosure, string.Empty,
                        CancellationToken.None);
                    WriteLine(Closed);
                }

                fileStream.Close();
                fileStream.Dispose();
            }
        }
    }
}


 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)

What is 5 + 12 ?
Please leave these two fields as-is:
IMPORTANT! To be able to proceed, you need to solve the following simple math (so we know that you are a human) :-)