Getting Started with Azure Translator API

My dear friend Jonas is currently learning Chinese as his partner is from China. She is learning Swedish, and he is learning Chinese. Since we now work together I get to hear about his progress pretty much every day, and on several occasions, he has expressed his frustration with the quality of translation and study apps. I remember when we went to Japan many years ago I made an OCR app that translated Japanese and I decided to see which languages Azure Translator API supports for speech translation. I ended up giving the translator API a go.

Here is how you can get started:

Create an Azure Translator API resource

Log into the Azure portal and create a new Translator API resource. There is a free tier that you can use.

Generate an API key for authentication

There are two ways you can authenticate, by requesting a token or generating an API Key. The API key is easier to get started with so I’ve used that for the example below

Set up an authenticated websocket connection for the translation

Create a web socket connection and send the audio that you want to translate, making sure the audio file meets the requirements. Add a header to the client: Ocp-Apim-Subscription-Key and set the value to the key you generated earlier. Keep the connection open until you’ve received the response with the translation.

The audio file should be of the type .wav, use a 16000 Hz sample rate, be Mono and use a sample bit depth of 16

If you are requesting an audio translation the last message will be of the binary type and most likely it will be sent in chunks. When the EndOfMessage property is true the message is final and you can close and dispose the connection and streams

Here is the code. The example app is a console app that expects two arguments, the audio file it should translate and output folder.

 

using System;
using System.IO;
using System.Net.WebSockets;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using static System.Console;

namespace JonasApp
{
   class Program
   {
       static int Main(string[] args)
       {
           // Example paths:
           //var audioPath = @"C:\Users\IrisClasson\Music\irisAudio.wav";
           //var outPath = @"C:\Users\IrisClasson\Music";

           // To get supported languages and voices
           // https://dev.microsofttranslator.com/Languages?api-version=1.0&scope=text,speech,tts

           var from = "en-US";
           var to = "zh-CN";
           var voice = "zh-TW-Yating";

           if (args.Length != 2)
           {
               Error.WriteLine($"Usage: <.wav-file-path> <out_dir>");
               return 1;
           }

           if (File.Exists(args[0]) && Directory.Exists(args[1]))
           {
               TranslateSpeechAsync(args[0], $"{args[1]}\\translation.wav", from, to, voice).Wait();
           }
           else
           {
               Error.WriteLine($"ERROR: '{args[0]}' must be a file");
               return 1;
           }

           return 0;
       }

       static readonly string _key = "YOUR KEY";
       static readonly string _baseUrl = $"wss://dev.microsofttranslator.com/speech/translate";

       public static string Connected = "Connection open";
       public static string AudioSent = "Audio sent";
       public static string Waiting = "Waiting for response";
       public static string Closed = "Connection closed";

       static async Task TranslateSpeechAsync(string inFile, string outFile, string from, string to, string voice)
       {
           using (var client = new ClientWebSocket())
           {
               client.Options.SetRequestHeader("Ocp-Apim-Subscription-Key", _key);

               var uri = $"{_baseUrl}?from={from}&to={to}&api-version=1.0&features=texttospeech&voice={voice}";

               await client.ConnectAsync(new Uri(uri), CancellationToken.None);

               WriteLine(Connected);

               var audioOut = new ArraySegment<byte>(File.ReadAllBytes(inFile));
               await client.SendAsync(audioOut, WebSocketMessageType.Binary, true, CancellationToken.None);

               WriteLine(AudioSent);

               var inBuffer = new byte[10000];
               var segment = new ArraySegment<byte>(inBuffer);
               var fileStream = new FileStream(outFile, FileMode.Create);

               WriteLine(Waiting);

               var keepGoing = true;

               while (client.State == WebSocketState.Open && keepGoing)
               {
                   var result = await client.ReceiveAsync(segment, CancellationToken.None);

                   if (result.MessageType == WebSocketMessageType.Text)
                   {
                       WriteLine(Encoding.UTF8.GetString(inBuffer));
                   }
                   else if (result.MessageType == WebSocketMessageType.Binary)
                   {
                       fileStream.Write(inBuffer, 0, result.Count);
                       keepGoing = !result.EndOfMessage;
                   }
               }

               if (client.State == WebSocketState.Open)
               {
                   await client.CloseAsync(WebSocketCloseStatus.NormalClosure, string.Empty,
                       CancellationToken.None);
                   WriteLine(Closed);
               }

               fileStream.Close();
               fileStream.Dispose();
           }
       }
   }
}

Getting Started with Azure Translator API

Comments