Implementing BLAKE2b in Kotlin: A Cryptographic Hash Function

When I talk about “implementing BLAKE2b in Kotlin”, I don’t mean inventing a new hash function. I mean taking the official BLAKE2b spec and turning it into a clean, testable, production‑grade implementation that behaves exactly like the reference code and matches Cardano’s usage of BLAKE2b‑224/256 for addresses, scripts, and general hashing.1

In this article I stay at the level of algorithm and state design for Kotlin/JVM. I’m not going to drop a screen of code. The goal is that when you do write or review the implementation, you know exactly:

what the internal state looks like on the JVM,
how the compression function is structured,
where endianness and counters usually go wrong,
and how to prove correctness against reference implementations.

Introduction

BLAKE2b is a modern hash function specified in RFC 7693. It’s optimized for 64‑bit CPUs, immune to length‑extension attacks, and widely used as a “sane default” hash in real systems. Cardano uses BLAKE2b for addresses, scripts, verification keys, and general hashing; Polkadot and others use it as well.1

On Cardano you typically see these digest sizes:

- BLAKE2b-224 (hash28) – verification keys, script hashes, addresses
- BLAKE2b-256 (hash32) – general-purpose ledger hashing
- BLAKE2b-512 (hash64) – key derivation and internal uses

All of these are the same BLAKE2b algorithm with a different digest_size parameter.2

From a Kotlin/JVM implementation I want:

predictable behaviour across platforms (JVM now, maybe KMP later),
good throughput on large streams of data (blocks, CBOR blobs),
and enough test coverage that I can safely use it inside Cardano‑facing services.

Prerequisites

You should already be comfortable with the skeleton of a cryptographic hash:

internal state,
compression function,
fixed‑size blocks and streaming updates,
padding and finalisation.

If you’ve implemented SHA‑2 or followed its pseudocode before, BLAKE2b will feel familiar but slightly more flexible.1

On the Kotlin/JVM side, you should:

be happy with ByteArray,
understand little‑endian encoding of 64‑bit words,
and be comfortable with bitwise operations on Long (and possibly ULong if you go multiplatform).

If you only care about the JVM, Long plus careful masking is enough. If you want Kotlin Multiplatform, you’ll want a small “portable core” with unsigned math and minimal dependencies. Existing BLAKE2 libraries in the Kotlin ecosystem are a good reference for how to wrap the algorithm in a multiplatform‑friendly API.3

Theory: BLAKE2b at a Glance

BLAKE2b is a Merkle–Damgård‑style construction with:

a chaining state of 8×64‑bit words,
a compression function built from a “G” round function,
and a parameter block to configure digest length, keying, salting, personalization, and tree hashing.1

Shapes that matter for implementation:

- Block size:        128 bytes  (16 × 64-bit words)
- State vector h:    8 × 64-bit words
- Working vector v: 16 × 64-bit words (h || IV, then mixed)
- Rounds:           12 per block (each with 8 G-mixes)
- Digest size:       1..64 bytes (we care about 28, 32, 64)

RFC 7693 defines:

the fixed initialization vectors (IV),
the sigma permutation table selecting message words per round,
and the exact rotation constants used by G.1

For each 128‑byte block:

1. Load 16 little-endian 64-bit words from the message block into m[0..15].
2. Initialize v[0..7] from the current state h[0..7].
3. Initialize v[8..15] from the fixed IV, tweaked with byte counters and flags.
4. Run 12 rounds of the G-mixing function, using sigma to pick message words.
5. XOR the mixed v back into h to get the new chaining state.

BLAKE2b maintains two 64‑bit counters t0, t1 (total bytes hashed so far) and a final‑block flag f0 (and f1 for tree mode). The final block is processed with f0 set, which slightly alters the mixing to mark the end of the stream.1

State Layout in Kotlin/JVM (No Code Yet)

On the JVM, I like to think of a BLAKE2b instance as a small, mutable “engine” object:

BLAKE2bState
------------
h[0..7]       internal chaining state
t0, t1        128-bit byte counter (two 64-bit words)
f0, f1        final-block flags
buf[0..127]   pending block buffer
bufLen        number of bytes currently in buf
digestLen     desired output length (bytes)
keyLen        key length (if using keyed mode)

This state accumulates input and calls compress() whenever buf reaches 128 bytes.

Inside compress() you use temporary working arrays:

Working arrays in compress()
----------------------------
v[0..15]   16 words = 8 from h + 8 from IV / t / f
m[0..15]   16 message words from the current block

On the JVM all of this should be preallocated per instance:

arrays of Long reused across calls,
a single buf reused for every block.

You do not want to allocate new arrays per block and rely on the GC in a tight hashing loop.

The G function itself is a fixed sequence of:

modular additions on 64‑bit words,
XORs,
and rotations by specific constants.

You lift these constants and the permutation table straight from RFC 7693. There is zero room for creativity here.1

Cardano‑Specific Parameters

In Cardano we mostly use unkeyed BLAKE2b for hashing ledger objects and keys. The keyed mode (MAC) exists, but is rarely used on‑chain; what matters more is digest length and how we serialize the preimage.4

For Cardano‑oriented Kotlin code I usually define three presets:

BLAKE2b-224:
  digestLen = 28 bytes
  keyLen    = 0
  fanout, depth, leafLength,
  nodeOffset, nodeDepth, innerLen = defaults
  salt, personal = zero

BLAKE2b-256:
  digestLen = 32 bytes
  same parameters otherwise

BLAKE2b-512:
  digestLen = 64 bytes
  same parameters otherwise

These parameters go into the parameter block, a 64‑byte structure that gets XORed into the IV at initialization. The spec defines the layout byte by byte: digest length, key length, fanout, depth, leaf length, node offset, node depth, inner length, salt, personalization.1

Your Kotlin API doesn’t have to expose the parameter block directly. Callers can choose digest size (and maybe key); you build the parameter block internally.

Implementation Walkthrough (Conceptual, JVM‑First)

I’ll outline the steps I follow on the JVM; each maps cleanly to Kotlin code.

1. Define the public API

I like a streaming interface that mirrors MessageDigest, plus helpers:

- reset()
- update(byte[], offset, length)
- doFinal(out[], offset)
- digest(byte[]): ByteArray   (one-shot convenience)

On top, I add factories:

- blake2b224(): BLAKE2b
- blake2b256(): BLAKE2b
- blake2b512(): BLAKE2b

On the JVM this makes it easy to integrate with existing code that expects MessageDigest‑like semantics while still using a Kotlin implementation internally.5

2. Constants and tables

Lift these from RFC 7693 and keep them private and immutable:

- IV[0..7]          initial 64-bit constants
- SIGMA[12][16]     message word permutations per round
- rotation constants for G

Represent SIGMA as an IntArray (2D or flattened) and never modify it. Treat it as static data, not state.1

3. State initialization

On reset() / constructor:

1. Copy IV[0..7] into h[0..7].
2. Build the 64-byte parameter block from digestLen, keyLen, tree params,
   salt, personal.
3. Interpret the parameter block as 8×64-bit little-endian words and XOR
   them into h[0..7].
4. Zero t0, t1, f0, f1, bufLen.
5. If keyed: treat the key as the first block (padded to 128 bytes) and
   feed it through update().

Cardano’s unkeyed mode usually means step 5 is skipped, but I still implement it for completeness.1

4. Update path

The update() method does buffered streaming:

- While len > 0:
    - Copy as many bytes as possible into buf.
    - If buf is full (128 bytes):
        - Increase t0/t1 by 128.
        - Call compress(lastBlock = false).
        - Set bufLen = 0 and continue.

The byte counter must include any keyed block you prepended; the spec is very precise about how many bytes have been “compressed so far”.1

5. Finalisation

On doFinal():

1. Increase t0/t1 by bufLen (the remaining uncompressed bytes).
2. Set f0 = 0xFFFF...FFFF  (all bits 1) to mark the final block.
3. Zero-pad the tail of buf up to 128 bytes (compression input).
4. Call compress(lastBlock = true).
5. Serialize h[0..7] into 64 bytes, little-endian.
6. Truncate to digestLen bytes for the final output.
7. Optionally wipe internal state (h, buf, key material).

If you get finalisation wrong, RFC test vectors will fail loudly, which is perfect.

Kotlin/JVM‑Specific Details and Pitfalls

On the JVM Long is signed, but BLAKE2b uses 64‑bit modular arithmetic. As long as you only use:

+ (wrap‑around addition),
bitwise ops (xor, and, or),
and rotations,

you can treat Long as if it were uint64. Kotlin’s rotateLeft/rotateRight extensions make the G function easy to read.

The real foot‑gun is little‑endian encoding:

When loading words from the message, you must decode 8 bytes in little‑endian order into a Long.
When writing the digest, you must write h[0], h[1], … as little‑endian bytes and then truncate.

I strongly recommend tiny helpers like:

- leBytesToLong(bytes, offset): Long
- longToLeBytes(word, out, offset)

and unit tests just for those.

For performance on the JVM:

avoid allocating ByteBuffer objects in the hot path,
reuse arrays inside the state object,
and avoid logging or allocation inside compress().

If you later move to KMP, keep the core algorithm in commonMain (with unsigned types and simple helpers) and wrap it with JVM‑specific IO and interop.

Production note. The nastiest bug I’ve seen here was a “just faster” Kotlin BLAKE2b where little‑endian conversion was subtly wrong. Small ASCII test strings still passed; large binary inputs didn’t. Comparing against the C reference implementation plus RFC 7693 test vectors exposed it immediately.

Testing Against Reference Implementations

For cryptographic primitives, test vectors are non‑optional. RFC 7693 publishes inputs and expected digests for multiple digest sizes and modes.1 The official BLAKE2 repo adds more vectors and C code.6

My standard test strategy:

1. Import RFC test vectors:
   - empty string
   - "abc"
   - long messages
   - different digest sizes (224/256/512)
   - keyed and unkeyed

2. For each vector, test:
   - one-shot API: digest(input) == expectedDigest
   - incremental API: feed the same input in random chunk sizes; result
     must still match.

3. Cross-check against a known-good implementation:
   - C reference from the official BLAKE2 repo
   - a well-reviewed Java/JVM library (BouncyCastle or similar)

For Cardano‑specific usage I also hash:

known addresses,
policy IDs,
and other ledger objects

and compare the output against cardano-cli or a trusted Haskell implementation. That ties “algorithm correctness” to “correct preimage encoding”.

Performance Considerations

BLAKE2b is fast by design, but you can slow it down with a naive JVM port.

The essentials:

- Reuse buffers:
    preallocate state, working arrays, and buf inside the instance.

- Keep compress() tight:
    no per-call allocations, no logging, no virtual calls in the inner loop.

- Batch calls to update():
    avoid hashing tiny slices in a tight loop if you can pass a larger buffer.

- Benchmark with realistic workloads:
    Cardano blocks, CBOR-encoded transactions, or whatever you hash in production.

If you don’t care about re‑implementing from scratch, you can:

wrap a mature Java BLAKE2b implementation in a Kotlin API, or
use an existing KotlinCrypto module for BLAKE2, and treat this article as a conceptual reference rather than a spec you must implement.3

Production note. In a Cardano‑oriented JVM service, BLAKE2b‑256 was never the real bottleneck. CBOR parsing and hex/base16 encoding dominated CPU time. Once the hash implementation was correct and in‑place, its cost disappeared into the noise floor.

Security Considerations

BLAKE2b’s security comes from its design and analysis, but you can still break things at the implementation or integration layer.7

Some JVM‑oriented points I keep in mind:

Constant‑time keyed usage. If you use keyed BLAKE2b (MAC mode), avoid branching or early returns based on secret data. The core algorithm is already constant‑time enough in practice; don’t introduce branches on key bytes or secret message properties.
Domain separation. Don’t reuse the same BLAKE2b key across different logical domains. If you need to separate uses, use the personalization and salt fields or different keys. Don’t improvise “prefix this string and hope it’s fine”.1
Canonical serialization. For Cardano‑style unkeyed hashing, the main risk is ambiguous preimages. Define a canonical serialization (CBOR or well‑specified binary) and test that different logical values cannot collide at the byte level. The hash can only protect what you feed it.
No parameter tweaks. Do not change round counts or constants. BLAKE2b’s 12 rounds and fixed tables are part of the security story; “BLAKE2b but 8 rounds for speed” is a different, unanalyzed algorithm.7

On the JVM specifically, also treat secrets carefully: if you use keys, consider wiping key and state arrays once you’re done, within the limits of what the JVM allows.

Conclusion

Implementing BLAKE2b in Kotlin/JVM is mostly a translation exercise: take RFC 7693 and implement it faithfully as a small, well‑structured state machine.

You:

define a state object with chaining words, counters, flags, and a buffer,
implement the compression function and G rounds exactly as specified,
handle little‑endian conversions carefully,
keep the hot path allocation‑free on the JVM,
and validate everything against RFC test vectors and trusted implementations.

Once you have that, you own a solid BLAKE2b‑224/256/512 primitive for your JVM stack. You can safely plug it into Cardano‑related services (addresses, script hashes, “hash‑then‑sign” paths), or use it as a general‑purpose hash wherever you’d previously have reached for SHA‑2.

References

RFC 7693 – The BLAKE2 Cryptographic Hash and Message Authentication Code (MAC), including full algorithm specification and test vectors.1
Official BLAKE2 reference implementations in C and optimized variants, with additional vectors and design notes.6
Cardano documentation on BLAKE2b‑224/256 usage for addresses, verification keys, scripts, and policy IDs.4
Kotlin/JVM BLAKE2b implementations and libraries you can study or wrap instead of writing everything from scratch.3