Argon2 vs bcrypt vs. scrypt: which hashing algorithm is right for you?
As an engineer, you’ll likely come across the concepts of data hashing and encryption whenever you’re handling sensitive data. Both hashing and encryption are important to cryptography and are often confused, and choosing the right algorithms for your use case is not necessarily a straightforward decision.
In this article, we'll cover the basic definitions of hashing and encryption, and compare three common hashing algorithms you're likely to come across in your work: Argon2, bcrypt and scrypt. We'll look at their origins, their strengths and weaknesses, and in what circumstances you'd likely use each. L
et's dive in.
What is encryption?
Encryption is a process that involves scrambling or coding information so that only someone with a key to how you scrambled or coded it can read it. This is key: you only encrypt information when you expect or intend at least certain parties to decrypt it and access it later. It is a two-way, reversible process. In computer science, encrypting is typically done with complex mathematical algorithms.
What is hashing?
Similar to encryption, hashing is another mathematical technique used to obfuscate data you want to keep unavailable to other people. But there are two major distinctions that make hashing different:
- Hashing goes one way – if you hash something, it is not intended to be "un-hashed."
- Hashing converts data into a fixed-length output, also known as a hash value. With encryption, the output from the encryption may vary in length depending on the algorithm.
The conversion from data to hash value is done by an algorithm, and the choices and operations involved are what make each hashing algorithm unique. Every hashing algorithm has its own history and design. Some were created to support additional parameters that you can adjust to meet your security or computational needs. What parameters are available depends on the hashing algorithm you choose to use.
This is why the choice of hashing algorithm is so important: you want to make sure it meets the computational and security needs of your use case.
Why do we hash passwords?
Password storage is one of the paramount concerns for any application or web service that deals with authentication. Depending on how and where they store passwords, their users' credentials could be exposed to bad actors in the event of a data breach or cyber attack.
Password hashing is a great way apps and online services can protect users' accounts in the case of a cyber attack or data leak. A stored hash has two big advantages over storing a plain text password or even an encrypted password.
Irreversibility
Once a hash value is generated, there is no way to derive the original input based on the hash value alone. Hash values usually appear as a random string of characters.
Consistency
At the same time, companies that only store a password's hash can still verify the identity of a user with their username and password because the same input string will always generate the same hash. So if a user enters their password and it generates the same hash as the one in the application's database, they can safely verify the user without storing their credentials.
Is password hashing unhackable?
Like any cybersecurity measure, though password hashing can greatly increase the cost of hacking or account takeovers, it is not completely attack proof.
The main way hackers can break a hashing system occurs requires a few steps, each one of which can be thwarted by additional security measures:
- To gain access to a hashed password, the hackers must first breach the account storage information of their attack target. If these passwords are hashed, this information on its own is not usable – hence why password hashing is such a valued protection!
- Once hackers have a list of hashed passwords, they can create a database of words they think are likely to be used as passwords (strings like "password123," "qwerty," or perhaps even passwords they've obtained from other breaches). They then hash these likely passwords using the hashing algorithm they either suspect or or know their attack target uses. This creates a database of password hashes.
- With their possible or probable hashes in hand, the attackers then compare their database with entries in the hacked database of password hashes. If they find any matches, they can then gain access to those accounts.
While we've summed up this process in three neat steps, it's a very time and computing-intensive process, which serves as a strong deterrent against most hackers.
How does hashing work?
All hashing algorithms involve taking a piece of information called the "input" and a number of other parameters that determine the hash's complexity, computing requirements, and additional security measures.
Note that not all hashing algorithms use all the parameters below. Indeed, the parameters hashing algorithm use are a big part of what distinguishes one from another.
Some examples of parameters include:
- Input: Every hashing algorithm needs an input to hash. The input is usually a plain text string of varying lengths (like a password).
- Salt: The "salt" refers to a string of characters that is appended or prepended to the input before it is hashed. A salt changes the hash value by increasing the length and complexity of the input. It is a common practice to help guard against dictionary attacks (a kind of brute force attack that tries to reverse engineer the hash) or rainbow tables (a more sophisticated method and dictionary attacks.
- Length: Length refers to the number of bytes or HEX characters in the generated hash value.
- Cycles: The number of cycles describe how many times or iterations the algorithm’s hashing function will run. More cycles create a stronger hash but require more time to compute. The number of cycles are determined by the work factor, by an exponential relation in which cycles = 2^work factor.
- Work factor: The work factor refers to the number of times the hashing algorithm is performed – this is usually expressed as an exponent, like 2^work iterations. The higher the work factor, the harder it would be for a hacker to crack the hashing algorithm. But, the higher the work factor, the greater the computational cost to the application and/or its auth provider.
- Threads: Threads refers to the number of concurrent threads, or degrees of parallelism, that the algorithm will utilize to compute the hash.
- Memory/memory hardness: The amount of memory to be used in the hashing process. Algorithms generally are evaluated by many factors, one of which is called "memory hardness," which refers to how much CPU and RAM usage is required to perform a given function or action. Hashing algorithms need to strike a balance between not being too memory hard without being too easy to crack either.
- CPU: The cost or work factor that increases the memory and CPU usage needed to generate the hash. Some scrypt implementations require this parameter to be a power of 2.
What are the different types of hashing algorithms?
With so many options for hashing algorithms, like SHA-1, SHA-256, MD-5, Argon2, scrypt, and bcrypt, it's important to understand the differences and choose the right one for your needs. For password protection, Argon2, bcrypt, and scrypt are recommended due to their configurable memory and cost parameters that can increase computational strength against attacks.
Argon2
Argon2 was designed by Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich from the University of Luxembourg. They released their specification paper on Argon2 in 2015 and that same year won the Password Hashing Competition, organized by a global panel of security and cryptographic experts. In their paper, the designers state their motivation for creating Argon2 was "to maximize the cost of password cracking" and that "passwords, despite all their drawbacks, remain the primary form of authentication."
bcrypt
Bcrypt was designed by Niels Provos and David Mazières. They presented their paper "A Future-Adaptable Password Scheme" in 1999 at the Unix Users Group conference. They based their hashing algorithm on Blowfish, an encryption algorithm created by Bruce Schneier in 1993, to take advantage of its purposefully expensive key setup phase. Provos and Mazières took the concept further by designing bcrypt to have adjustable cost. In their paper, they state that "the computational cost of any secure password scheme must increase as hardware improves."
scrypt
Scrypt was created by Colin Percival who presented his conference paper in 2009 at the Berkeley Software Distribution conference. It was originally developed for Tarsnap, an encrypted online backup service for UNIX operating systems, which Percival also created. Scrypt was designed to be a memory-hard algorithm that would be maximally secure against hardware brute-force attacks.
Which algorithm is right for you – Argon2 vs. bcrypt vs. scrypt
While there are of course deeper nuances to Argon2, bcrypt, and scrypt, the choice between them boils down to weighing computing and time requirements against memory hardness and parameter number.
Argon2 is a great memory-hard password hashing algorithm, which makes it good for offline key derivation. But it requires more time, which, for web applications is less ideal.
bcrypt can deliver hashing times under 1 second long, but does not include parameters like threads, CPU, or memory hardness.
scrypt (Stytch's personal choice!) is maximally hard against brute force attacks, but not quite as memory hard or time-intensive as Argon2.
At Stytch, once we've salted and hashed all passwords using scrypt, we store them in an encrypted database that we manage. This ensures our Passwords solution is secure and built for performance.
If you’re looking for more information on each hashing algorithm, read more about the differences here.