Probabilistic Data Structures

Interactive visualizations of HyperLogLog, Balls-into-Bins, and the Hat Problem

HyperLogLog estimates the number of distinct elements using only O(loglogn)O(\log\log n) space. Each item is hashed, and we compute ρ\rho — the position of the leftmost 1-bit — like counting coin flips until heads (Pr[ρ=k]=2k\Pr[\rho = k] = 2^{-k}). The first pp bits select one of m=2p=16m = 2^p = 16 registers, each storing the max ρ\rho seen. Since max{ρ}log2n\max\{\rho\} \approx \log_2 n, the harmonic-mean estimator n^=αmm2/2Rj\hat{n} = \alpha_m m^2 / \sum 2^{-R_j} recovers cardinality with relative error 1.04/m\approx 1.04/\sqrt{m}.

ItemsRegisters (max ρ)R00R10R20R30R40R50R60R70R80R90R100R110R120R130R140R150
Estimated cardinality
0
Actual unique items
0
0 total insertions
Relative error
proof
0%
Theoretical: 26.0%\approx 26.0\%
1.04m=1.0416\frac{1.04}{\sqrt{m}} = \frac{1.04}{\sqrt{16}}
Space used
16 regs (10 bytes)
O(m)=O(2p)O(m) = O(2^p) registers

HyperLogLog: Union & Intersection

HyperLogLog registers are mergeable: the element-wise max gives the sketch for the union. For intersection, we maintain bottom-kk MinHash sketches (the kk smallest hash values per server) to estimate Jaccard similarity, then compute ABJ^(A,B)AB^|A \cap B| \approx \hat{J}(A,B) \cdot |\widehat{A \cup B}|.

Server A
Recent items
No items yet
HLL Est.
0
Actual
0
MinHash
0/16
Server B
Recent items
No items yet
HLL Est.
0
Actual
0
MinHash
0/16
Register values (per server + union)
R0R1R2R3R4R5R6R7R8R9R10R11R12R13R14R15Server A0000000000000000Server B0000000000000000Union (max)0000000000000000
Bottom-k MinHash — Jaccard Similarity (k = 16)
Merged bottom-16 of Server AServer BĴ = 0/0= 0Server A onlyServer B onlyBoth
PairĴ (MinHash)J (actual)
Server AServer B0.0000.000
Union (HLL)
0
Actual: 0 · Error: 0%
hatn=HLL ⁣(maxsRj(s))hat{n}_{\cup} = \text{HLL}\!\bigl(\max_s R^{(s)}_j\bigr)
Intersection (Jaccard × Union)
0
Actual: 0 · Error: 0%
ABJ^(A,B)n^AB|A \cap B| \approx \hat{J}(A,B) \cdot \hat{n}_{A \cup B}
Intersection (incl-excl)
0
Actual: 0 · Error: 0%
n^A+n^Bn^AB\hat{n}_A + \hat{n}_B - \hat{n}_{A \cup B}