cloudflare/trie-hard
Rust
Captured source
source ↗cloudflare/trie-hard
Description: Novel implementation of a Trie data structure optimized for small, sparse maps
Language: Rust
License: Apache-2.0
Stars: 600
Forks: 17
Open issues: 4
Created: 2024-09-06T10:20:11Z
Pushed: 2026-04-23T20:46:53Z
Default branch: main
Fork: no
Archived: no
README: 
Don't just Trie, Trie Hard
This crate is an implementation of the trie data structure that is optimized for reading from small maps where a large number of misses are expected. This is still a work in progress in the sense that as none of the other features you would expect to find in a trie (like prefix search), but it is used in production in Cloudflare's Pingora to detect and remove specific headers from 30 million requests every second before they are proxied to their final destinations.
Performance
There are several other trie implementations for rust that are more full-featured, so if you are looking for a more robust tool, you will probably want to check out `radix_trie` which seems to have the best features and performance. On the other hand, if you want raw speed and have the same narrow use case, you came to the right place!
Here is a chart showing the time taken to read 10k entries from a map that consists of 119 entries containing only lower-case characters, numbers, and -. As you can see, when miss rate gets above 50% the performance of trie-hard surpasses std::HashMap and improves as miss rates get higher.
!Trie Hard read is faster than HashMap for small maps where miss rate is high
The improvement of performance as miss rate grows is one of the features of tries, and you can see the same trend in the performance of radix-trie.
!Trie Hard read is faster than RadixTrie but follows the same curve
This also shows the difference in performance between radix-trie and trie-hard, but this should not be interpreted as a reason to use trie-hard over radix-trie. radix-trie is a full-featured trie and balances the read performance with write performance. For fairness, here is a breakdown of the loading performance of the two trie implementations.
| Item | Time to load 15.5k words | | ---------- | ------------------------ | | Trie Hard | 11.92 ms | | Radix Trie | 3.49 ms |
For insertion radix is ~3x times faster and also supports incremental changes whereas trie-hard is only designed for bulk loading.
How Does it Work?
Trie Hard achieves its speed in 2 ways.
1. All node and edge information is kept in contiguous regions on the heap. This prevents jumping around in memory during gets, and maximizes the chance of child nodes already appearing in cache. 2. Relationships between nodes and edges is encoded into individual bits in unsigned integers.
Example
Let's say we want to store the (uninteresting) strings, "and", "ant", "dad", "do", "dot" in our trie. Let's work through an example of how the data is organized and how we can read from it.
Construction
Trie-hard only supports bulk loading so, we run
let trie = ["and", "ant", "dad", "do", "dot"].into_iter().collect::>();
The first thing trie-hard does is find all the used bytes in the entries. It assigns each a unique 1-bit mask. For our simple case, this will look like
| Byte | Mask | | ------ | --------- | | b'a' | 0b00001 | | b'd' | 0b00010 | | b'n' | 0b00100 | | b'o' | 0b01000 | | b't' | 0b10000 |
The number of masks required determines the underlying integer type used to represent the mask. In our case, we only have 5 bits, so the underlying type will be u8. The implication of needing a bit for each unique byte is that potentially we would require a 256-bit integer (u256), so an implementation of U256 is provided as part of this crate. _Note_ The U256 type should not be used directly. It only implements (and tests) the few operations needed to work with trie-hard. The use case trie-hard was designed for only needs to support at most 37 unique bytes, but in practice only 30 unique bytes appear in the stored map. This means we are storing our underlying tree information in u32s.
Next we will construct the graph representing the tree starting with the bytes that appear first in the input strings. Only a and d appear in the first position, so for the node representing the first byte (and also the root of the trie), we create a mask indicating only a and d are allowed.
let root = Node {
// a (0b00001) + d (0b00010) = 0b00011
mask: 0b11,
// ..
}This tells us that if a byte other than a or d appears in the first position, the key being tested does not appear in the trie. This ability to make an exclusion decision at every step is what makes tries more appealing than even hashmaps in some cases. Searching for a string in a hashmap requires hashing the entire string whereas a trie can potentially determine that a string is not part of a set within a single byte.
If the byte is a or d we still need to know which node to go to next. All nodes in the graph are stored in a contiguous vector (with the root node at index zero). Each node will contain the information on where its child appears in the array of nodes. In our example the root node will point to nodes with indexes 1 and 2. Where 1 is the index with keys starting with a and 2 is the node for keys starting with d. It is important that these child nodes are ordered by their corresponding byte.
let root = Node {
mask: 0b11,
// 1 -> keys that start with b'a'
// 2 -> keys that start with b'd'
children: vec![1, 2],
// ..
}> _Note_. In the final map, the child indices are not stored in per-node vecs. They are instead stored in a central vec, and only the index into that vec corresponding with this node is stored in the node.
At this point we can visualize the conceptual trie and trie-hard like this
| Conceptual | Trie-Hard | |…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10Solid new repo with decent traction.