A Couple of (Probabilistic) Worst-case Bounds for Robin Hood Linear Probing
https://www.pvk.ca/Blog/2019/09/29/a-couple-of-probabilistic-worst-case-bounds-for-robin-hood-linear-probing/ [www.pvk.ca]
2019-09-30 03:59
tags:
compsci
hash
perf
I like to think of Robin Hood hash tables with linear probing as arrays sorted on uniformly distributed keys, with gaps. That makes it clearer that we can use these tables to implement algorithms based on merging sorted streams in bulk, as well as ones that rely on fast point lookups. A key question with randomised data structures is how badly we can expect them to perform.
source: L
Preemption Is GC for Memory Reordering
https://www.pvk.ca/Blog/2019/01/09/preemption-is-gc-for-memory-reordering/ [www.pvk.ca]
2019-01-11 06:03
tags:
concurrency
cpu
perf
programming
systems
Interrupt processing (returning from an interrupt handler, actually) is fully serialising on x86, and on other platforms, no doubt: any userspace instruction either fully executes before the interrupt, or is (re-)executed from scratch some time after the return back to userspace. That’s something we can abuse to guarantee ordering between memory accesses, without explicit barriers.
And then it gets crazy.
source: HN
How to Print Integers Really Fast
https://www.pvk.ca/Blog/2017/12/22/appnexus-common-framework-its-out-also-how-to-print-integers-faster/ [www.pvk.ca]
2017-12-25 04:37
tags:
benchmark
c
library
perf
programming
Human readable formats wasting CPU cycles to print integers is a common problem, and we quickly found a few promising approaches and libraries.
source: L
Rendezvous Hashing: My Baseline “Consistent” Distribution Method
https://www.pvk.ca/Blog/2017/09/24/rendezvous-hashing-my-baseline-consistent-distribution-method/ [www.pvk.ca]
2017-09-26 19:32
tags:
compsci
hash
perf
programming
Regardless of the reason for consistent hashing’s popularity, I feel the go-to technique should instead be rendezvous hashing. Its basic form is simple enough to remember without really trying (one of those desert island algorithms), it is more memory efficient than consistent hashing in practice, and its downside–a simple implementation assigns a location in time linear in the number of hosts–is not a problem for small deployments, or even medium (a couple racks) scale ones if you actually think about failure domains.