inks

site: www.pvk.ca

A Couple of (Probabilistic) Worst-case Bounds for Robin Hood Linear Probing

https://www.pvk.ca/Blog/2019/09/29/a-couple-of-probabilistic-worst-case-bounds-for-robin-hood-linear-probing/ [www.pvk.ca]

2019-09-30 03:59

tags: compsci hash perf

I like to think of Robin Hood hash tables with linear probing as arrays sorted on uniformly distributed keys, with gaps. That makes it clearer that we can use these tables to implement algorithms based on merging sorted streams in bulk, as well as ones that rely on fast point lookups. A key question with randomised data structures is how badly we can expect them to perform.

source: L

Preemption Is GC for Memory Reordering

https://www.pvk.ca/Blog/2019/01/09/preemption-is-gc-for-memory-reordering/ [www.pvk.ca]

2019-01-11 06:03

tags: concurrency cpu perf programming systems

Interrupt processing (returning from an interrupt handler, actually) is fully serialising on x86, and on other platforms, no doubt: any userspace instruction either fully executes before the interrupt, or is (re-)executed from scratch some time after the return back to userspace. That’s something we can abuse to guarantee ordering between memory accesses, without explicit barriers.

And then it gets crazy.

source: HN

How to Print Integers Really Fast

https://www.pvk.ca/Blog/2017/12/22/appnexus-common-framework-its-out-also-how-to-print-integers-faster/ [www.pvk.ca]

2017-12-25 04:37

tags: benchmark c library perf programming

Human readable formats wasting CPU cycles to print integers is a common problem, and we quickly found a few promising approaches and libraries.

source: L

Rendezvous Hashing: My Baseline “Consistent” Distribution Method

https://www.pvk.ca/Blog/2017/09/24/rendezvous-hashing-my-baseline-consistent-distribution-method/ [www.pvk.ca]

2017-09-26 19:32

tags: compsci hash perf programming

Regardless of the reason for consistent hashing’s popularity, I feel the go-to technique should instead be rendezvous hashing. Its basic form is simple enough to remember without really trying (one of those desert island algorithms), it is more memory efficient than consistent hashing in practice, and its downside–a simple implementation assigns a location in time linear in the number of hosts–is not a problem for small deployments, or even medium (a couple racks) scale ones if you actually think about failure domains.