How do voice recognition biometrics work?


Auth & identity

March 21, 2023

Author: Stytch Team

Floating light blue icons for different biometric methods on a dark blue backdrop: a microphone for voice, an eye for retinal / iris, a face for facial recognition, and a thumbprint.

How voice recognition technology supports fast, frictionless logins

In recent years, advances in authentication technology have enabled developers to enhance the cybersecurity of their app or website without overburdening their users or disrupting the user experience. One of the most powerful innovations in this regard is biometrics, or tech-forward tools that can verify a user’s identity using only their physical or behavioral characteristics.

While fingerprint scans and facial recognition remain the most popular forms of biometrics, other emerging methods like voice remain more on the fringe. 

Below, we explore how voice authentication works, what its benefits are, and perhaps why it hasn’t seen the same level of adoption as other biometric methods. 

The basics of biometrics

Biometrics is considered something-you-are authentication, because it relies on attributes that are inherent to a user’s body or behavior. 

This sets biometrics apart from something-you-know auth factors (like passwords), which require users to create and remember complex credentials, and something-you-have auth factors (like one-time passcodes or security keys), which require users to prove possession of a registered phone number, email inbox, or physical device.

Instead, biometric factors use built-in verification tools to capture, analyze, and recognize a user’s distinctive, measurable features. There are many different types of biometric authentication, with flows that vary according to the specific feature involved. Some of the most common include:

Fingerprint recognition

Fingerprint recognition tools scan and map the unique ridges and valleys of a user’s fingerprint when they press their finger to their laptop or mobile device.

Facial recognition

Facial recognition tools scan and map the proportions and contours of a user’s face and translate them into a unique numerical code known as a “faceprint.”

Iris/retina recognition

Iris and retina recognition tools scan and analyze the distinctive color markings (iris) or the distinctive blood vessel patterns (retina) in a user’s eye.

Voice recognition

Voice recognition tools capture and analyze the sound qualities and other unique properties of a user’s voice, which are determined by their particular jaw movements, throat shape, and behavioral or linguistic inflections.

Many biometric auth solutions are equipped with liveness detection capabilities, so they can distinguish between a real, live user and a mere reproduction or copy — like a photographic image or voice recording — in order to detect and prevent fraud.

Let’s dive deeper into what voice recognition technology does and how it works.

What does voice recognition do?

Essentially, voice recognition tools allow users to access, activate, and/or interact with digital platforms simply by speaking to them. They’ve become well-known in recent years, with the release of smartphone software (like Siri) and virtual assistants (like Amazon’s Alexa and Echo devices or Google’s Home or Nest systems).

However, there’s a big difference between voice recognition technology and the speech recognition technology many of these voice-controlled systems rely on.

  • Automatic speech recognition (ASR) technology is trained to identify and act upon what is being said, regardless of who is saying it. Think, someone instructing an Alexa device to play a specific song or using a speech-to-text program to transcribe what different participants say during a meeting.
  • Voice recognition (or speaker recognition) technology is trained to identify who is speaking, based on the unique attributes of their voice. For this reason, voice recognition is typically used to offer a personalized user experience or to provide extra protection in sensitive or risky use cases. Think, someone using a voice command to set or disarm a home security system through Google Home.

When used as an authentication factor, biometric flows rely on voice recognition technology to verify that the person speaking is the legitimate, registered user they say they are.

How does voice recognition work?

Voice recognition tools must be programmed or trained to identify a specific user’s voice. 

This typically involves taking one or more speech samples and creating a unique digital template or “voiceprint” — similar to the fingerprints and faceprints used in other biometrics. This voiceprint is stored within the system and compared against any sample obtained during a log in attempt.

A voiceprint takes into account physical/physiological as well as behavioral/inflectional attributes, such as a user’s:

  • Pitch
  • Timbre
  • Intensity
  • Accent or pronunciation
  • Cadence

Some voice recognition systems require a user to input a specific, pre-determined phrase or sentence — which they must then repeat with every login — whereas others allow a user to say whatever they please.

Accuracy of voice biometrics

The precision of voice recognition technology has improved significantly in recent years, and it continues to make steady progress. As with many authentication methods, voice biometrics is not foolproof, but accuracy rates top 95% on average and frequently reach up to 99%. Compare that to passwords, only 80% of which can be considered even remotely secure.

The benefits of voice biometrics for authentication

When compared to other biometric authentication factors, voice biometric authentication has a couple of distinct advantages. Voice recognition can be used with accessories that sometimes get in the way of other methods, like hats, gloves, masks or sunglasses. Voice biometrics is also contactless.

That said, there are a few shortcomings that make the voice a less preferable biometric method than others. 

Shortcomings of voice biometrics

Some of the downsides that have hampered voice biometric authentication adoption to date include:

  • Advanced cyber attacks like “voice spoofing” that attempt to record and/or replicate a user’s voice and gain access to an online account. While liveness detection tools help defend against such attacks, hackers are always adapting and improving upon their methods in an effort to outsmart the latest cybersecurity efforts.
  • Environmental interference like loud background noise, static, and poor connections that can muddle the quality of an audio sample.
  • Physical impairments like respiratory illnesses, allergies, vocal cord injuries, and other conditions that may alter the sound qualities of a user’s voice.
  • Privacy concerns around the ethics or safety of devices potentially “listening to” and capturing users’ personal communications.

While some voice recognition solutions are turning to artificial intelligence (AI) and machine learning models to better understand and adapt to changes in the human voice across circumstance, these friction points and spoofability factors make the voice a less attractive biometric option than say, face recognition or fingerprints. 

When in doubt, choose what users will adopt

At the end of the day, we at Stytch believe that the security of any given authentication method is effectively moot if users won’t adopt it. So while the tech of voice biometrics may feel like the thing of sci fi movies (and kinda fun!), we’re far more interested in the biometric methods that are gaining traction and popularity, in large part due to the pioneering work of companies like Apple, who have worked hard to make biometrics a seamless, integrated part of their user experience. 

Discover Stytch’s biometric authentication

Stytch offers many passwordless authentication solutions as part of our comprehensive product suite, including Native Mobile Biometrics.


Get started with Stytch