It all starts with a bug report to LibreSSL that the openssl tool crashes when it tries to print NULL. This bug doesn’t manifest on OpenBSD because libc will convert NULL strings to ”(null)” when printing. However, this behavior is not required, and as observed, it’s not universal. When snprintf silently accepts NULL, that simply leads to propagating the error.
There’s an argument to be made that silly error messages are better than crashing browsers, but stacking layers of sand seems like a poor means of building robust software in the long term.
As soon as development for the next release of OpenBSD restarted, some developers began testing a patch that would remove this crutch from printf.
One of the early victims was not actually a program that printed NULL, but well, we’ll see. talloc (or libtalloc as the OpenBSD port calls it) is a malloc replacement library that adds hierarchical pools.
The talloc configure script runs for a while, checking for all sorts of various features.
IPv6 is the future of the internet, so obviously we want to make sure that our malloc replacement is ready to go in IPv6 only environments. Time to throw away that legacy IPv4 malloc!
This carries on for a few minutes. libtalloc itself takes less than five seconds to compile, but the configure script will spend 100x that long probing for functions to get and set filesystem extended attributes. Code reuse is the key to efficiency.
Finally, we get to the real prize.
That test involves the following bit of code.
Now we have three problems.
The first is passing an int to a format string that expects a size_t. On 64 bit platforms, they will likely be different sizes. This only works by the implementation artifact that loading a 32 bit value into a register clears the high bits.
The second is a bit silly. Why are there backslashes esacping the dollar signs? Because somebody copied a test for some other language, (perl, sh, ruby?), that has string interpretation. C does not. The second is actually accompanied by a compiler warning, warning: unknown escape sequence '\$', but nobody looks at those.
Third, we have the test for printing NULL. Except we use 0 and only pass 32 bits of it. No matter, we then check to make sure it printed at least 3 characters. Why 3? What’s wrong with just printing an empty string?
So a couple observations. The test checking that our snprintf function conforms to the C99 standard actually contains, at a minimum, 3 deviations from the standard. It should say “Checking for non-conformant vsnprintf”.
Even if snprintf should print ”(null)” or “nil” or whatever, that breaks a pretty basic identity.
Of course, we’re dealing with NULL pointers, so all bets are off, but I wonder what people who expect printf NULL to work expect out of strlen? Does it return 0? Does it crash?
After all the configure testing, eventually the build will fail, because somebody forgot to actually add the replacement object file to the Makefile.
If the replacement function has never been used, that’s hardly reassuring that it is actually better tested than the version we have in libc.
Jumping back up a few levels, this isn’t about just one library, but about the difficulty of correcting mistakes that seep into existing software. Eventually the wrong behavior becomes so common, it becomes not just the “right” behavior, it becomes a kind of “standard” that everybody is expected to adhere to. The OpenBSD project would like to prohibit NULL pointers in this context because we believe it will make our own software more robust, but we can’t make this change without far reaching consequences.