The common way to approach software portability is to establish a baseline and then program to that least common denominator. The portability layers in OpenSSL, however, go way beyond least. This is a fully realized experiment in worst common denominator programming. Some examples.
Let’s start with a fair replacement. There are still some lesser operating systems that lack strlcpy, so including an implementation for them is reasonable. At least OPENSSL_strlcpy is a compatible implementation with the real strlcpy. Perfect example of least common denominator compatibility. (I have a strong preference for omitting the namespace prefix in cases like this, only including the replacement code when necessary, but it is what it is.)
A setuid program (and its libraries) is in a tricky situation. It has some privileges it may need, but many it won’t, and it’s running in a hostile environment. It needs to avoid accidentally opening files it shouldn’t like /etc/passwd. OpenSSL checks for this with a function called OPENSSL_issetugid. On OpenBSD and FreeBSD, it’s a call to the real issetugid function, which is a system call. issetugid needs to be a syscall; you can at best only emulate it poorly in userland. Providing a wrapper that does what it can may be better than nothing, although it can also lead to a false sense of security. If you can’t do it right, at least do it wrong...? The real problem here though is that OpenBSD and FreeBSD are not the only ones with issetugid. Even bad operating systems get better over time, but ineffective wrapper functions don’t.
(Simply checking current euid and ruid is insufficient. The running program may still have file handles open to sensitive files or the contents of those files still in memory. Once privileged, always privileged.)
The real snprintf returns the number of characters it wanted to print, regardless of buffer size. The imitation BIO_snprintf returns the number of characters printed, unless truncation occurs, in which case it returns -1. So they’re mostly the same, except in those situations where bad things like overflow are happening, in which case normal C programmers need to remember that OpenSSL C is not normal.
That’s how one checks real snprintf for truncation, but not BIO_snprintf! Ironically, the code would have worked even so, because -1 implicitly converted to size_t is going to be quite large, but then somebody went full retard and cast sizeof to int. Never do this. (The cast would be mostly harmless with real snprintf. Only the combination of wrapper and cast cause the error. The planets must have been in alignment. Still, don’t cast sizeof.)
There is a function named OPENSSL_isservice which returns whether the current process is a Windows Service. It’s actually used in various places, to decide whether to print an error or display a MessageBox. Call me crazy, but shouldn’t that be something the application would handle? Why does this library need to make such decisions? We can abstract this, therefore we must! Or maybe I’m just not seeing the potential here and what’s missing is a OPENSSL_iscronjob function to decide between printing messages to stderr or syslog.
For all those operating systems lacking fancy functions like memcmp, OpenSSL provides a replacement. Two, actually. CRYPTO_memcmp is a constant time function designed to prevent timing side channels, which is quite useful. There’s also OPENSSL_memcmp. Now, considering that OPENSSL_malloc calls CRYPTO_malloc, you might think OPENSSL_memcmp would call CRYPTO_memcmp. Incorrect. OPENSSL_memcmp is a naive implementation that exits as soon as a mismatch occurs. You need to remember not only to call the wrapper (fair), but which wrapper to call (unfair). Make a mistake, and you’ve just introduced a side channel attack.
Long ago in the wild west days of the Internet, before socklen_t was something you could count on, you sometimes had to roll your own. Usually it’s an int, but sometimes not. In either case, when you port to an OS that doesn’t have socklen_t, the solution is to figure out what the appropriate type is and add your own typedef. Carefully. You do this off in some compat.h header where people don’t really need to see it.
OpenSSL went down a different road. They rolled their own fake socklen_t from a union of int and size_t, along with a 20 line comment that details why it, in theory, should work. And then it lists the exceptions, but explains that maybe it’s not so bad if it doesn’t work. And then they include some runtime code that tries to assert that the terrible thing they’ve done actually did work. This code is inserted directly inline with all the regular socket code that normal people care about. Naturally, this code runs on every platform instead of just the oddballs that lack socklen_t. Better that everybody lose than somebody win.
Does any of this matter? Does it really make the code worse? Consider this OpenSSL commit. Because the code was written to workaround the absence of fcntl.h, the code compiled even when the include was erroneously omitted. But then it fell through into the race prone, unsafe fallback code that creates world readable secret files and only changes their permission after the fact. If the unsafe fallback code hadn’t existed, the unsafe fallback code wouldn’t have been used by accident. (LibreSSL fixed this several weeks earlier by deleting all the fallback code.)
The code wrappers also can’t quite figure what their NULL argument policy is going to be. Unlike their real world counterparts, functions like OPENSSL_realloc and OPENSSL_free apparently can’t handle NULL so you see tons of code like this hypothetical example:
which could be collapsed to the much simpler:
On the other hand, the OPENSSL_strdup wrapper does handle NULL, unlike the real strdup. It may be sensible to upgrade all the standard functions to handle NULL to simplify some callers, but that doesn’t explain why other much more frequently used functions were downgraded!
If you’re not familiar with it, OpenVMS was a partially successful communist plot to destroy the POSIX standard. It’s not related to the OpenBSD project.
You can try to puzzle out the method behind the madness, but it’s easier to rewrite it.
This is really the worst of the worst. The OpenSSL dev team should have told HP to shove it and fix the bugs in their damn OS. The bizarro workarounds for Win16 I can understand because they come from an era when companies didn’t make patches and users didn’t apply them. (Why they remain in the code to this day is another matter.) But nobody should be working around 64-bit VMS problems. Demand better. HP is paid handsomely to maintain their abomination of an operating system; let them maintain it.
Instead of using their considerable political capital as a phenomenally successful (practically “must run”) piece of software to leverage fixes out of HP, the OpenSSL devs went and internalized the brain damage into their own project. Now HP has less incentive to do their job, and countless (but hopefully a dwindling few) other projects will run into the same issue.
Understanding an alien codebase is always difficult. Rewriting a function for clarity often involves deciding between “should do what it looks like it does” or “should do what it actually does”. This is a special kind of unpossible challenge when all the functions being called are just like the functions you know, except when they’re not. The most consistent thing in OpenSSL is that every standard function will be wrapped or reimplemented in some way, and the wrappers will be almost identical, but every one must be subtly different.
The NFS code in OpenBSD is fraught with peril and relies upon evil, evil macros like nfsm_mtofh which modify local variables that aren’t even parameters, but the one good thing I can say is that when I read code using that macro, I’m never confused into thinking it’s just an alias for the standard C function mtofh.
For more reading, consider all the bad things that happen when projects include their own strtod. The standard C library implementation is going to improve and incorporate fixes faster than your lazily maintained half forgotten copy of some code scraped off usenet.
Any one or two dozen compat hacks would be understandable. As would the diabolical brace formatting. Or the dreadful (in the sense of literally inspiring dread) comments:
(Yes, you are running that code. Even on unix. OpenSSL uses it everywhere.)
But taken all together, it’s like “drowning in an ocean composed of pufferfish that are pregnant with tiny Freddy Kruegers -- each detail is horrendous in isolation, but the aggregate sum is delightfully arranged into a hate flower that blooms all year.” If you haven’t read James Mickens’s last column To Wash It All Away (do so!), practically the whole thing could be adapted to OpenSSL with about five minutes of editing.
Every old codebase contains the scarred remains of yesteryear’s battles, but the problems here are so many and so varied, they’re fractal in nature. Nevertheless, buried somewhere in here are all the valuable hacks that make the difference between a spec TLS stack and a working TLS stack.