voice control,digital assistants,natural language processing,speech recognition,voice recognition,speech to text,data processing,algorithms

 

Analysts sometimes metaphorically refer to analyzing data as ‘asking it questions.’ But, what if that were literally possible? Big companies such as Apple, Google, and Amazon are adding voice command functionality to their products, and consumers are reacting positively. With more investments and bigger breakthroughs being made, it won’t be long before similar capabilities get added to business applications. For those who are wondering, here’s how the technology works.

 

The first step towards enabling a program to respond to voice commands is to give it speech recognition. This only means that it is capable of listening to a person talk and converting what they are saying into text. Speech recognition requires that the program have access to a large database of words which contains information for how they are pronounced in various accents and contexts. Depending on the specific application, the user may be required to undergo ‘enrollment’ which means they read lines of text so that the computer can learn what their voice sounds like.

 

A common error that is made when talking about voice command programs is to mistake speech recognition for voice recognition or (vice versa). Voice recognition differs from speech recognition by allowing the program to determine who the speaker is, not just what they are saying. Voice recognition is much more difficult to implement, but it does safeguard computers from receiving commands from the wrong people.

 

After a computer knows how to convert spoken words into written words, the next step is to endow it with natural language processing (NLP). This is the hardest step. 'Natural languages’ are languages that arose naturally through the course of human evolution, such as English, Spanish, and Arabic. They are very complex and often contain contradictions. Opposed to these are ‘constructed languages’ such as HTML, Java, and Python, which are what programmers use to tell computers what to do. Constructed languages from the beginning are designed to operate as simply and as predictably as possible. In order for computers to make sense of natural languages, some leaps in judgment must be made.

 

Most NLP systems work by first breaking down sentences into smaller parts, trying to figure out how those parts relate, and then from there trying to figure out what the whole sentences mean. For example, if you told your phone to ‘Navigate me to the nearest donut shop,’ it would first investigate what you want by breaking down the sentence. ‘Me’ and ‘donut shop’ are self-explanatory, but ‘navigate and ‘nearest’ might cause issues. Your phone could be wondering if you want driving directions or a guide for public transportation. It also might be confused if you mean ‘nearest’ by travel time or nearest by distance. For instances where the correct grammar and definitions that the user meant to use might be interpretative or ambiguous, NLP systems are programmed to make statistical guesses.

 

Say you know multiple people named ‘Brian’ and you use your phone’s voice control feature to call one of them. One way that the phone could determine which Brian you mean would be to compare which person named Brian you contacted (most recently or more frequently). This illustrates how more than just grammar, definitions, and syntax goes into making NLP systems. Massive amounts of text are analyzed using machine learning to determine the ways in which words are used most frequently, and then those choices are weighted by the demographics of the author, and on top of that new analyses are made frequently so that interpretations will match the current form the natural language is taking.... It would be a lot easier if we could just sing computers their ABC’s.