Saph

The Stupid Algorithm for Password Hashing

Playground

Input

Parts (strings, will be encoded as UTF-8):

Memory size (amount of 64-byte chunks, default: 16384):

Iterations (default: 8):

Output

Why?

At the dawn of the humanity, we all worked with passwords stored in plain in our databases. The easy and simple process of just storing the user’s five letter, all lowercase password in a column called “password”. If said user wanted to log in, you’d just do a SELECT comparing the user’s input with the stored value (of course without checking for quotes, because we told users not to put them in!) and called it a day.

However, wise people thought, “it’s easy, but it’s not safe!”. And thus we started hashing first the passwords using the almighty MD5 hash.

This kinda did the job, but then some people realized “this is bullshit, what if we just calculated all possible MD5s for short-length passwords”. Thus, people started putting in extra, random data in the hash, to foil attacker’s rainbow tables.

However, with the ever-growing power of CPUs and GPUs, plain old hashes proved to be ineffective, since you could feasibly bruteforce them in a matter of hours (mostly because they never were intended to be used for this - they’re meant to be fast!).

Thus functions dedicated for hashing passwords securely were born.

PBKDF2 is one of the first ones, and it’s still used to this day. This employs a standard hashing algorithm as its basis, and works from that on. It is computationally intensive, but requires little memory, making it easier still to bruteforce using an ASIC. Besides, there are known issues with collisions.

Then, two other algorithms came along.

Bcrypt fixed the issues with memory requirements, but they relied on the availability of Blowfish, a AES-contender in practice no longer in use anywhere aside from this function. This means that there’s a good chance that any implementation of Bcrypt would need its own shaded copy of Bcrypt, which increases code bloat for no good reason.

scrypt did away with Blowfish, but increases the complexity by depending on PBKDF2, which in turns depends in SHA256; and introduces a dependency on Salsa20/8, a very interesting stream cipher but that it’s not available widely yet either.

From 2015 onwards, the so-called Password Hashing Contest, or PHC for short, published a recommendation on a new algorithm called Argon2. And this is when all went downhill.

From simply storing the password in plain in a database, we are being suddenly now recommended to use an algorithm with:

Each variation, of course, giving a different hash. This is just insane, and IMHO suffers a lot from the second system-effect.

I don’t need parallelism in a password hashing algorithm - 99.99% of the time the hashing will be done in a single thread, be it the server or the client.

I don’t need to have to choose between four different variations - we just care about having one that works, it does well, and we can trust that if we update the library suddenly our hashes aren’t gonna change.

I just need a simple algorithm that increases the complexity of hashing a password for stopping bruteforce attempts, and uses primitives available in most systems, to reduce bloat and ensure the slowdown comes from the cryptographic primitives and not from executing our own function in a scripting language. Nothing else, nothing more.

Thus the Stupid Algorithm for Password Hashing was born.

The algorithm

The algorithm is heavily based on battcrypt, another PHC finalist. However, the cryptographic primitive Blowfish has been changed to AES.

The algorithm is defined as follow:

; Calculate starting hash
hash := sha256(sha256(part[1]) || sha256(part[2]) || ... sha256(part[n]))

; Initial memory with all zeros
mem := zeros(64 * memsize)

for iteration := 1 to iterations do
	; Encrypt memory using the current hash as key and IV
	key := hash[0 ~ 15]
	iv := hash[16 ~ 31]
	mem := aes128cbc_encrypt(mem, key, iv)

	; Calculate the order in which they have to be hashed
	order := [0, 1, ... memsize - 2, memsize - 1]
	for a := 0 to memsize - 1 do
		; Parse lower 32-bits as little endian
		b :=
				enc[c * 64 + 0] * 1 +
				enc[c * 64 + 1] * 256 +
				enc[c * 64 + 2] * 65536 +
				enc[c * 64 + 3] * 16777216

		; Cap to memory size
		b := b mod memsize

		; Swap indexes
		swap(order[a], order[b])
	end

	; Build new byte array in correct order
	reordered := []
	for chunk in order do
		start := chunk * 64
		end := chunk * 64 + 63
		reordered := reordered || enc[start ~ end]
	end

	; Calculate new hash
	hash := sha256(reordered)
end

result := hash

Rationale

I am no expert cryptographer, but I have been reading for years on the security and cryptography topic and I am pretty confident of the security of this construction.

This algorithm is pretty opinionated, but all decisions have a rationale behind them:

Implementations

Test vectors

To ensure my implementations were correct, I hand crafted the following test vectors, running the algorithm by hand step by step: