Skip to content
KaruCore
Go back

Benchmarking (Stock) OpenSSL on Karu

By Markku-Juhani O. Saarinen

OpenSSL is the de facto standard cryptographic library for Linux systems. Hence, it is natural to use it for benchmarking the impact of our basic cryptographic features.

I was happy to notice that the stock OpenSSL 3.5.6 currently shipped with RISC-V Debian (trixie) already has comprehensive support for the Zvk vector crypto extensions. Well, this shouldn’t be so surprising — they were already ratified in 2023. For more information about these extensions, see Chapter 33 of the Unprivileged ISA spec.

The OpenSSL “Processor Capabilities Vector”

OpenSSL uses the RISC-V processor capabilities vector to specify processor capabilities available on a given system. The library can dynamicallyxload implementations optimized for your specific processor variant. Not just RISC-V processors but also ARM, x86, PowerPC, … CPUs have similar capability vectors, as additional cryptographic capabilities have been added to those ISAs over time.

On any given machine, you can dump the string from the command line with openssl info -cpusettings. With the current Karu64, you get:

karu@karudeb:~$ openssl info -cpusettings
OPENSSL_riscvcap=RV64GC_ZBA_ZBB_ZBS_ZKT_V_ZVKB_ZVKG_ZVKNED_ZVKNHA_ZVKNHB_ZVKSED_ZVKSH vlen:256

The capabilities OpenSSL reports for Karu (at the time of writing) are:

StringDescription
RV64GCGeneric 64-bit ISA with Floating Point.
ZbaAddress Generation (scalar).
ZbbBasic bit-manipulation (Scalar).
ZbsSingle-bit instructions.
ZktData Independent Execution Latency (DIEL).
VVector Extension for Application Processors.
ZvkbVector Cryptography Bit-manipulation.
ZvkgVector GCM (AEAD mode) and GHASH.
ZvknedVector AES Block Cipher.
ZvknhaVector SHA-256 Secure Hash.
ZvknhbVector SHA-512 Secure Hash.
ZvksedVector SM4 Block Cipher.
ZvkshVector SM3 Secure Hash.
VLENPhysical vector register size.

Linux does also know about Zvkt (Vector DIEL) — as can be seen from /proc/cpuinfo — but OpenSSL doesn’t seem to have that separately. In any case, Karu itself implements both “constant-time” extensions, which means that a specific subset of cryptography and non-cryptography instructions always has data-independent execution latency.

Note

Karu lacks support for most Scalar (non-vector) Cryptography extensions (Zk.. rather than Zvk..) as those were mostly superseded by the vector equivalents in RVA23U64. Already back in 2020, when I helped to write a paper explaining the design rationale for scalar symmetric cryptography extensions, it was clear that Vector Cryptography would be used in application-class processors; Zk is essentially intended for low-end microcontrollers only. However, current Karu has one serious Zk gap: It lacks the Zkr Entropy Source extension for true random bits. The entropy source extension is actually used for both vector and scalar cryptography. Its design rationale is documented in this paper (free e-Print) that appeared in different versions from 2020 to 2022.

You can set capabilities dynamically — on command line!

The OpenSSL command-line utility (of the same name) can pick up the capability string from an environment variable, so we can pass it on the command line and study the effect of various extensions on performance.

Let’s pass simply the rv64gc (base) ISA string to the built-in benchmark function for AES-128:

karu@karudeb:~$ OPENSSL_riscvcap=rv64gc openssl speed -bytes 16384 -evp aes-128-ecb

The result is decidedly unimpressive, even for a processor running at 75 MHz, quite possibly because a constant-time implementation of AES requires a lot of overhead when AES extensions are not available:

Doing AES-128-ECB ops for 3s on 16384 size blocks: 36 AES-128-ECB ops in 2.96s
version: 3.5.6
built on: Mon May  4 18:39:11 2026 UTC
options: bn(64,64)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -fzero-call-used-regs=used-gpr -Wa,--noexecstack -g -O2 -Werror=implicit-function-declaration -ffile-prefix-map=/build/reproducible-path/openssl-3.5.6=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DZSTD -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
CPUINFO: OPENSSL_riscvcap=RV64GC env:rv64gc
The 'numbers' are in 1000s of bytes per second processed.
type          16384 bytes
AES-128-ECB        199.26k

Now the same with vector AES extension Zvkned:

karu@karudeb:~$ OPENSSL_riscvcap=rv64gc_v_zvkned openssl speed -bytes 16384 -evp aes-128-ecb
[...]
The 'numbers' are in 1000s of bytes per second processed.
type          16384 bytes
AES-128-ECB       6564.63k

So, plain AES operations are 6564.63 / 199.26 = 33 times faster with the extension! (On this run — there is some variance with such a 3-second test.)

Initial OpenSSL Benchmarks

The KaruDeb repo includes an automated script openssl_zvk_bench that runs this test on relevant ciphers. Here are some summary numbers from my first run. These are wall-clock timings measured by the standard openssl speed command on the 75 MHz FPGA board running Linux 7.1.1 (with full MMU and DDR4 memory overhead):

CaseAlgorithmscalar kB/sBest cap setBest kB/sBest speedup
aes-128-ecbAES-128-ECB196.2zvkb_zvkned6553.633.40x
aes-128-ctrAES-128-CTR170.1zvkb_zvkned5589.032.87x
aes-128-gcmAES-128-GCM124.8zvkb_zvkg_zvkned4203.633.69x
aes-128-xtsAES-128-XTS169.6zvkned825.84.87x
ghashghash382.2zvkg14455.937.82x
sha256sha256209.5zvkb_zvknha2414.211.52x
sha512sha512372.8zvkb_zvknhb2077.15.57x
sm3sm3194.4zvkb_zvksh1553.37.99x
sm4-ecbSM4-ECB222.6zvkb_zvksed1784.88.02x
chacha20ChaCha20403.4v_zbb_zvkb1363.73.38x

Even though we write “Best cap set”, there is no harm in having all of the capabilities enabled simultaneously.

Tip

We also use the openssl command-line utility for additional end-to-end known-answer tests (KATs) for its symmetric crypto implementations; this script is openssl_zvk_kat.


Share this post:

Previous Post
PQC and Keccak on Karu
Next Post
Announcing karu64 and karudeb