analysis of openssl freelist reuse

About two days ago, I was poking around with OpenSSL to find a way to mitigate Heartbleed. I soon discovered that in its default config, OpenSSL ships with exploit mitigation countermeasures, and when I disabled the countermeasures, OpenSSL stopped working entirely. That sounds pretty bad, but at the time I was too frustrated to go on. Last night I returned to the scene of the crime.

freelist

OpenSSL uses a custom freelist for connection buffers because long ago and far away, malloc was slow. Instead of telling people to find themselves a better malloc, OpenSSL incorporated a one-off LIFO freelist. You guessed it. OpenSSL misuses the LIFO freelist. In fact, the bug I’m about to describe can only exist and go unnoticed precisely because the freelist is LIFO.

OpenSSL reads data from the connection into a temporary buffer, allocating a new one if necessary. For reference, consult the ssl/s3_pkt.c functions ssl3_read_n and ssl3_read_bytes. Be mindful you are not confused by the difference between the record buffer s->s3->rrec and the read buffer s->s3->rbuf. The buffer setup and release functions themselves live in ssl/s3_both.c.

On line 1059, we find a call to ssl3_release_read_buffer after we have read the header, which will free the current buffer.

if (type == rr->type) /* SSL3_RT_APPLICATION_DATA or SSL3_RT_HANDSHAKE */
        {
[...]
        if (!peek)
                {
                rr->length-=n;
                rr->off+=n;
                if (rr->length == 0)
                        {
                        s->rstate=SSL_ST_READ_HEADER;
                        rr->off=0;
                        if (s->mode & SSL_MODE_RELEASE_BUFFERS)
                                ssl3_release_read_buffer(s);
                        }
                }

There’s one small problem. We’re not actually done with it yet. It still has some interesting data in it that we will want to read later. Fortunately, this is only a small problem because the LIFO freelist will give it right back to us! It has to chill on the freelist for few microseconds, but then the next call to ssl3_read_n will call setup and start right back where we left off. Same buffer, same contents.

        rb    = &(s->s3->rbuf);
        if (rb->buf == NULL)
                if (!ssl3_setup_read_buffer(s))
                        return -1;

        left  = rb->left;

Unless, of course, there is no freelist and releasing the read buffer actually, you know, releases it, which is what happens when you compile with OPENSSL_NO_BUF_FREELIST. Now that first buffer is gone forever, and it’s a different buffer that we start reading from. But this new, different buffer isn’t very likely to have the same data as the old buffer. OpenSSL gets very confused when it can’t find the data it expects and aborts the connection.

(Riddle: what is the value of rb->left?)

patch

The solution is quite simple. Don’t release the buffer if we’re not done with it. There are probably many ways to shave this yak; here’s one:

diff -u -p -r1.20 s3_pkt.c
--- s3_pkt.c	27 Feb 2014 21:04:57 -0000	1.20
+++ s3_pkt.c	10 Apr 2014 03:31:18 -0000
@@ -1054,8 +1054,6 @@ start:
 				{
 				s->rstate=SSL_ST_READ_HEADER;
 				rr->off=0;
-				if (s->mode & SSL_MODE_RELEASE_BUFFERS)
-					ssl3_release_read_buffer(s);
 				}
 			}
 		return(n);

That’s actually a little overkill, disabling the entire buffer release option. A more refined patch still releases the buffer, but only when empty.

--- lib/libssl/src/ssl/s3_pkt.c	27 Feb 2014 21:04:57 -0000	1.20
+++ lib/libssl/src/ssl/s3_pkt.c	12 Apr 2014 17:01:14 -0000	1.20.4.1
@@ -1054,7 +1054,7 @@ start:
 				{
 				s->rstate=SSL_ST_READ_HEADER;
 				rr->off=0;
-				if (s->mode & SSL_MODE_RELEASE_BUFFERS)
+				if (s->mode & SSL_MODE_RELEASE_BUFFERS && s->s3->rbuf.left == 0)
 					ssl3_release_read_buffer(s);
 				}
 			}

analysis

This bug would have been utterly trivial to detect when introduced had the OpenSSL developers bothered testing with a normal malloc (not even a security focused malloc, just one that frees memory every now and again). Instead, it lay dormant for years until I went looking for a way to disable their Heartbleed accelerating custom allocator.

Building exploit mitigations isn’t easy. It’s difficult because the attackers are relentlessly clever. And it’s aggravating because there’s so much shitty software that doesn’t run properly even when it’s not under attack, meaning that many mitigations cannot be fully enabled. But it’s absolutely infuriating when developers of security sensitive software are actively thwarting those efforts by using the world’s most exploitable allocation policy and then not even testing that one can disable it.

Update: Turns out I’m not the first person to run into this. Here’s a four year old bug report. And another. Thanks Piotr!

postscript

I’ve left the original post intact, but a few clarifications and followup thoughts. At the time, I was focused on the flaw and what could have prevented it, but not the potential effects of the bug. Also, I didn’t specify what configurations were affected.

The bug only affects connections with the SSL_MODE_RELEASE_BUFFERS option set. This isn’t a compile option; it’s a runtime option set by the application. nginx is a notable program that sets this option. The bug exists whether or not you compile with OPENSSL_NO_BUF_FREELISTS. The default config, without that option, masks the bug in many cases, but also makes potential exploitation easier.

As for effects and possible exploitation, it allows a form of blind traffic injection. While I said “same buffer, same contents” above, that’s not always true. A quick glance at the code reveals that while the buf is chilling on the freelist, it can be stolen by another thread. Also it appears certain traffic patterns can trigger it in nginx even when not threaded? Once again, we leave my field of expertise.

Consider Alice and Bob POSTing pictures to a web server. There’s a chance that a buffer from Alice’s connection will be released, and the next free buffer taken instead, which has Bob’s data in it. Typically, this will cause the connection to fail, but if Bob’s buffer is specially crafted to match the state of Alice’s connection, the connection will proceed. If Bob is the bad guy, he has just overwritten Alice’s data. (The attack also works the other way. If Alice is the bad girl, she arranges for her connection state to be compatible with Bob’s buffer, and now some of Bob’s data has bled into her connection which would presumably allow her to download it.)

The powers that be have decided this is CVE-2010-5298.

Posted 10 Apr 2014 13:04 by tedu Updated: 17 Apr 2014 01:09
Tagged: c programming security