April 22, 2016
Speech recognition services, such as Apple’s Siri and Google’s OK, have become a convenient alternative to tedious and time-consuming data manipulation in mobile phones. Speech recognition technology has been around for many years and is used in products such as Dragon (Nuance), Cortana (Microsoft) and Alexa (Amazon). Therefore, it is natural when people think that the terms “speech recognition” and “voice recognition” are synonymous.

Speech recognition is carried out using software capable of recognizing sound waves and converting them into digital data, for example, to search or dictate text. Speech recognition can be a phenomenal tool that saves a lot of time compared to typing words.

Voice recognition is a comparison of a copy of a spoken phrase with a digital template and is used as a means of identification and authentication in such security systems as access control and timekeeping. The system creates digital patterns with a very high probability of correct interpretation. Each person’s voice includes physiological and behavioral characteristics. Physiological aspects are based on the size and shape of the mouth cavity, throat, larynx, nasal cavity, body weight of each person and other factors. Behavioral properties are based on language, level of education, place of residence, and can lead to the appearance of certain intonations, accent and dialect.

Voice recognition technology

Voice recognition is a much more accurate technology than speech recognition, as it requires much more thorough processing and analysis. Running a successful voice recognition program first requires collecting a data set of speech samples. The more voice data samples obtained, the higher the quality of the model.


The effectiveness of voice recognition is directly related to a thorough registration procedure. Registration is usually a simple and quick process that requires the user to say a key phrase or series of numbers three or four times. It is best if a person pronounces the phrase naturally, in conditions without any noise. Speaking naturally means using the same tone and volume as if you were talking to an acquaintance who is next to you. Many people make the mistake of trying to pronounce a phrase unnaturally like a robot – try to avoid these mistakes. The quality of voice processing is also affected by the device used.

Voice processing

Some solutions offer local authentication technology, but in this case the number of false positives increases significantly. Large databases provide opportunities to analyze hundreds of inspection conditions. Companies that plan to implement voice authentication solutions should aim for solutions that are characterized by the following norms: the probability of false admission – 0.01% and the probability of false non-admission – from 1% to 3%. Keep in mind that most solutions do not rely on voice as the sole factor for authentication. In multi-factor authentication solutions, voice recognition is only one of two or more factors to identify a user.


Speech recognition and voice recognition are two separate technologies using speech; the first is used for search and dictation, and the second – for user authentication. User voice verification involves analyzing the voice recording based on hundreds of unique characteristics and comparing the results. Soon, security professionals will be turning to this technology much more often to provide access control.

What is voice recognition? – definition from technical education

Voice recognition is a computing technique that creates specialized programs and systems to identify, distinguish, and authenticate the voice of an individual speaker.

Voice recognition assesses a person’s voice biometrics, such as the frequency and flow of their voice and their natural accent.

Voice recognition is also known as speaker recognition.

Techopedia explains voice recognition
Systems based on voice recognition are primarily designed to recognize the voice of a person speaking. Before attempting to recognize a speaker’s voice, voice recognition methods require some training, in which the underlying system learns the speaker’s voice, accent, and tone. Typically, this is done using a series of text words and phrases that the person must speak through a built-in or external microphone.

Voice recognition systems are related to speech recognition systems, but the former only identifies the speaker, while the latter can understand and evaluate what is said.