Overview

Speech recognition, also referred to as automatic speech recognition(ASR) or speech - to - text, is the translating of human speech to text through the utilization of software programs and artificial intelligence. The primary purpose of this technology is to hypothesize the most accurate discrete symbol sequence out of all the valid sequences in the given language. There are many different processes that take place when we utilize speech recognition technology. Discrete symbols^[1] and artificial intelligence are a massive part in this technology to break the barrier between human language and technology.

^[2]


1952	1962	2006	2008	2011
Bell labs releases "Audrey"	"IBM shoebox"	The National Security Agency(NSA) utilizes speech recognition	Google launches a voice search application	Apple announces "Siri"

Process

Before anything can be processed through this technology, the specific device that is being used needs a microphone. What a microphone does is it translates the vibrations in human speech into a wavelike electrical signal. ^[3]This electrical signal goes through the hardware of the system and gets translated into digital signals. These digital signals are translated to discrete symbols and then the software recognizes them according to specific patterns that match the speech. This recognition of speech patterns is generally referred to as pattern recognition.

Speech Recognition Methodologies

There are three main methodologies that are utilized to aid the function of speech recognition technology. These three methodologies are acoustic phonetic approach, pattern recognition approach, and the artificial intelligence approach. Acoustic is defined as the study of different sounds, and phonetic is defined as the study of phonemes in the language. This study hypothesizes that there are individual phonetic units in a given spoken language, and these phonetic units are characterized by a set of properties that are shown in the speech signal over a specific period of time.

The pattern recognition approach consists of a statistical approach and algorithms to compare samples and determine which pattern the speech represents the most. The artificial intelligence approach is a mix of the other two methodologies and implemented through extensive research. One example of the components of artificial intelligence is natural language processing.^[4] Natural language processing takes human language and helps the technology make decisions statistically based on the given information relatively close to how we do as human beings.^[5]

Pros and Cons

There are many benefits that come with speech recognition such as being able to communicate with your device or any device if needed, the assistance to individuals who are visually/hearing impaired, assists with hands free technology, and aids in the advancement of technology.^[6]

The disadvantages of speech recognition are some to keep in mind also. Some of these disadvantages include, the lack of accuracy/misinterpretation, accents with speech recognition, background noise and time cost/productivity in some cases.^[7]

Modern Day Utilization

One of the most prominent examples are Virtual assistants such as Siri, Alexa, Google Assistant, and Cortana which all use speech recognition technology to function the way that they do. Virtual assistants are a big part of most peoples lives today. Some use these devices to get through their tasks and others use them for extra help when multitasking. Alexa is a smart speaker that you can communicate with to enhance your music experiences among many other things. Cortana, Google assistant and Siri all are mostly utilized in hand-held devices such as iphones or android phones. All of these devices function with the help of speech recognition and hold a big part in technology today.

^[8]