StringZilla 🦖
The world wastes a minimum of $100M annually due to inefficient string operations.
A typical codebase processes strings character by character, resulting in too many branches and data-dependencies, neglecting 90% of modern CPU's potential.
LibC is different.
It attempts to leverage SIMD instructions to boost some operations, and is often used by higher-level languages, runtimes, and databases.
But it isn't perfect.
1️⃣ First, even on common hardware, including over a billion 64-bit ARM CPUs, common functions like strstr
and memmem
only achieve 1/3 of the CPU's throughput.
2️⃣ Second, SIMD coverage is inconsistent: acceleration in forward scans does not guarantee speed in the reverse-order search.
3️⃣ At last, most high-level languages can't always use LibC, as the strings are often not NULL-terminated or may contain the Unicode "Zero" character in the middle of the string.
That's why StringZilla was created.
To provide predictably high performance, portable to any modern platform, operating system, and programming language.
StringZilla is the GodZilla of string libraries, using SIMD and SWAR to accelerate string operations on modern CPUs. It is up to 10x faster than the default and even other SIMD-accelerated string libraries in C, C++, Python, and other languages, while covering broad functionality. It accelerates exact and fuzzy string matching, edit distance computations, sorting, lazily-evaluated ranges to avoid memory allocations, and even random-string generators.
- 🐂 C : Upgrade LibC's
<string.h>
to<stringzilla.h>
in C 99 - 🐉 C++: Upgrade STL's
<string>
to<stringzilla.hpp>
in C++ 11 - 🐍 Python: Upgrade your
str
to fasterStr
- 🍎 Swift: Use the
String+StringZilla
extension - 🦀 Rust: Use the
StringZilla
traits crate - 🐚 Shell: Accelerate common CLI tools with
sz_
prefix - 📚 Researcher? Jump to Algorithms & Design Decisions
- 💡 Thinking to contribute? Look for "good first issues"
- 🤝 And check the guide to setup the environment
- Want more bindings or features? Let me know!
Who is this for?
- For data-engineers parsing large datasets, like the CommonCrawl, RedPajama, or LAION.
- For software engineers optimizing strings in their apps and services.
- For bioinformaticians and search engineers looking for edit-distances for USearch.
- For DBMS devs, optimizing
LIKE
,ORDER BY
, andGROUP BY
operations. - For hardware designers, needing a SWAR baseline for strings-processing functionality.
- For students studying SIMD/SWAR applications to non-data-parallel operations.
Performance
C | C++ | Python | StringZilla |
---|---|---|---|
find the first occurrence of a random word from text, ≅ 5 bytes long | |||
strstr 1x86: 7.4 · arm: 2.0 GB/s |
.find x86: 2.9 · arm: 1.6 GB/s |
.find x86: 1.1 · arm: 0.6 GB/s |
sz_find x86: 10.6 · arm: 7.1 GB/s |
find the last occurrence of a random word from text, ≅ 5 bytes long | |||
⚪ |
.rfind x86: 0.5 · arm: 0.4 GB/s |
.rfind x86: 0.9 · arm: 0.5 GB/s |
sz_rfind x86: 10.8 · arm: 6.7 GB/s |
split lines separated by \n or \r 2 | |||
strcspn 1x86: 5.42 · arm: 2.19 GB/s |
.find_first_of x86: 0.59 · arm: 0.46 GB/s |
re.finditer x86: 0.06 · arm: 0.02 GB/s |
sz_find_charset x86: 4.08 · arm: 3.22 GB/s |
find the last occurrence of any of 6 whitespaces 2 | |||
⚪ |
.find_last_of x86: 0.25 · arm: 0.25 GB/s | ⚪ |
sz_rfind_charset x86: 0.43 · arm: 0.23 GB/s |
Random string from a given alphabet, 20 bytes long 5 | |||
rand() % n x86: 18.0 · arm: 9.4 MB/s |
uniform_int_distribution x86: 47.2 · arm: 20.4 MB/s |
join(random.choices(...)) x86: 13.3 · arm: 5.9 MB/s |
sz_generate x86: 56.2 · arm: 25.8 MB/s |
Get sorted order, ≅ 8 million English words 6 | |||
qsort_r x86: 3.55 · arm: 5.77 s |
std::sort x86: 2.79 · arm: 4.02 s |
numpy.argsort x86: 7.58 · arm: 13.00 s |
sz_sort x86: 1.91 · arm: 2.37 s |
Levenshtein edit distance, ≅ 5 bytes long | |||
⚪ | ⚪ |
via jellyfish 3x86: 1,550 · arm: 2,220 ns |
sz_edit_distance x86: 99 · arm: 180 ns |
Needleman-Wunsch alignment scores, ≅ 10 K aminoacids long | |||
⚪ | ⚪ |
via biopython 4x86: 257 · arm: 367 ms |
sz_alignment_score x86: 73 · arm: 177 ms |
StringZilla has a lot of functionality, most of which is covered by benchmarks across C, C++, Python and other languages.
You can find those in the ./scripts
directory, with usage notes listed in the CONTRIBUTING.md
file.
Notably, if the CPU supports misaligned loads, even the 64-bit SWAR backends are faster than either standard library.
Most benchmarks were