Repository contains code for encoding and decoding base64 using SIMD instructions. Depending on CPU's architecture, vectorized encoding is faster than scalar versions by factor from 2 to 4; decoding is faster 2 .. 2.7 times.
There are several versions of procedures utilizing following instructions sets:
Vectorization approaches were described in a series of articles:
Base64 encoding with SIMD instructions__,
Base64 decoding with SIMD instructions__,
Base64 encoding & decoding using AVX512BW instructions__ (includes AVX512VBMI and AVX512VL),
AVX512F base64 coding and decoding__.
__ http://0x80.pl/notesen/2016-01-12-sse-base64-encoding.html __ http://0x80.pl/notesen/2016-01-17-sse-base64-decoding.html __ http://0x80.pl/notesen/2016-04-03-avx512-base64.html __ http://0x80.pl/articles/avx512-foundation-base64.html
Daniel Lemire__ and I wrote also paper
Faster Base64 Encoding and Decoding Using AVX2 Instructions__ which was published
ACM Transactiona on the Web__.
Performance results from various machines are located
There are separate subdirectories for both algorithms, however both have the same structure. Each project contains four programs:
verify--- does simple validation of particular parts of algorithms,
check--- validates whole procedures,
speed--- compares speed of different variants of procedures,
benchmark--- similarly to
speedbut works on small buffers and calculates CPU cycle rate (available only for Intel architectures).
Change to either directory
decode and then use following
.. list-table:: :header-rows: 1
* - command - tools - instruction sets * - ``make`` - ``verify``, ``check``, ``speed``, ``benchmark`` - scalar, SSE, BMI2 * - ``make avx2`` - ``verify_avx2``, ``check_avx2``, ``speed_avx2``, ``benchmark_avx2`` - scalar, SSE, BMI2, AVX2 * - ``make avx512`` - ``verify_avx512``, ``check_avx512``, ``speed_avx512``, ``benchmark_avx512`` - scalar, SSE, BMI2, AVX2, AVX512F * - ``make avx512bw`` - ``verify_avx512bw``, ``check_avx512bw``, ``speed_avx512bw``, ``benchmark_avx512bw`` - scalar, SSE, BMI2, AVX2, AVX512F, AVX512BW * - ``make avx512vbmi`` - ``verify_avx512vbmi``, ``check_avx512vbmi``, ``benchmark_avx512vbmi`` - scalar, SSE, BMI2, AVX2, AVX512F, AVX512BW, AVX512VBMI * - ``make xop`` - ``verify_xop``, ``check_xop``, ``speed_xop``, ``benchmark_xop`` - scalar, SSE and AMD XOP * - ``make arm`` - ``verify_arm``, ``check_arm``, ``speed_arm`` - scalar, ARM Neon
make run (for SSE) or
make run_ARCH to run all programs for given
ARCH can be "sse", "avx2", "avx512", "avx512bw",
BMI2 presence is determined based on
/proc/cpuinfo or a counterpart.
When an AVX2 or AVX512 targets are used then BMI2 is enabled by default.
To compile AVX512 versions of the programs at least GCC 5.3 is required. GCC 4.9.2 doesn't have AVX512 support.
Intel Software Development Emulator__ in order to run AVX512
The emulator path should be added to the
Both encoding and decoding don't match the base64 specification, there is no processing of data tail, i.e. encoder never produces '=' chars at the end, and decoder doesn't handle them at all.
All these shortcoming are not present in a brilliant library by Alfred Klomp: https://github.com/aklomp/base64.