bluesnapper design and experiences

For about two years I ran a little side project for OpenBSD called bluesnapper to provide automatic binary updates. There have been other attempts at binary packages, but they’re really more like packages than patches and targeted stable. I think my user base may have peaked around two active users, but I still learned quite a bit and consider the experiment a success.

The name bluesnapper was chosen for the client because the server side is named redsnapper. I needed a silly name that had something to do with snaps, picked redsnapper, and wrote the server first. Ideally, the names would be swapped, but such is life with legacy code.

background

At the time I started on this endeavor, my cable ISP was flaky and there was a nontrivial chance it would disappear for thirty minutes mid snapshot download. Unacceptable. My backup internet, a only slightly more reliable 4G connection, wasn’t very fast. I could download the snap ahead of time, and install using a local http server, but I wasn’t always sure when I wanted to update and didn’t see the point automatically downloading snaps I wasn’t going to use. I didn’t want to workaround the problem, I wanted to solve it.

The problem, taking a step back, was not my internet connection. It was the number of bytes that needed to be transferred to upgrade from one snapshot to the next. Why do I need to keep downloading a telnet client that never changes? Boom. Idea. Let’s only download the changes!

redsnapper

Somewhere off in the cloud, where the tubes are fat and always working, there is a perl script called redsnapper.pl which distills OpenBSD snapshots into diffs. There’s also an entire suite of ten line shell scripts which run first, prepping the scene. They download a new snapshot if it changed from last time, extract it, and then rename some directories around so I have the old and new snapshots laid out on disk, like two side by side OpenBSD installs. Finally we call in redsnapper. It walks the new tree, comparing with the old, diffing existing files and saving new files.

There is one wrinkle with file layout. The /etc directory contains files which the user may have modified and which we shouldn’t patch automatically. Instead, clones of etc are saved under /var/db/bluesnapper in their stock condition, allowing sysmerge to apply the changes with some intelligence.

The real heavy lifting, making usefully small binary diffs, I had nothing to do with. That’s all done by Colin Percival’s excellent bsdiff utility.

I save everything (binary diffs and new files) as blobs in a sqlite database. I probably could have used tar, but I have more experience programmatically creating sqlite databases. It’s like tar++. Or tarNG, as the kids would say. Then we move the database into a web server directory. The diff databases are not stored compressed, although they are fairly compressible. They are gzipped by nginx when serving and decompressed by the client when downloading though, so transport data is minimized. This wasn’t really by design, just something that kind of happened, and in the early days it was easier to poke uncompressed files by hand with sqlite without the extra step of gunzip.

The information saved for each diff is the name, the diff itself, and the md5 of the original file.

The cron job to create snapshots was timed to ideally create two per week. One that would be ready on Monday for people who work with OpenBSD and one that would be ready on Friday for people who play with OpenBSD. Even when basically nothing changes, the diff files still weigh in at about a megabyte. Diffs have to be applied sequentially, so creating one per day would increase download requirement significantly. I thought about creating combo diffs, which would sum up all the changes over 10 diffs, but my typical usage scenario didn’t anticipate many people falling that far behind.

In addition to creating diffs, redsnapper creates a fingerprint for each snapshot by calculating the md5 of some key files. This information is used by a new bluesnapper client to bootstrap itself and identify which diffs need to be downloaded.

bluesnapper

bluesnapper.pl just does the inverse of redsnapper. Download a couple small text files to determine which snap we’re at and where we need to get to, then download and apply each diff in order. select file, diff, md5 from diffs and away we go. bluesnapper is careful not to overwrite any file which has changed, applies the diff to a copy first, then moves it back over the original, and never deletes files (in fact, redsnapper doesn’t even record deleted files). In theory, very little can go wrong.

The hardest part of using the client is that it requires manually setting up the /var/db/bluesnapper/etc directory structure.

issues

On two occasions, using bluesnapper would leave your system in a slightly inconsistent state and require manual intervention.

Not long after release, an X11 file turned into an X11 directory. By design, bluesnapper never deletes existing files, so it was unable to create the directory, and therefore unable to populate the directory either. Repairing by hand required downloading a snapshot and extracting those files (subsequent bluesnapper updates would only include diffs for those files). This happened exactly once.

Sometimes new system calls are added to the kernel (or new variants of existing calls). bluesnapper would update both the kernel and userland without rebooting, meaning that about half way through the update, you’d have cp and mv binaries which didn’t actually work with the running kernel, and then things would fall apart. This sounds pretty disastrous, but was actually easy to recover from. Reboot and run bluesnapper again. It will skip patching all the already patched files, continue where it left off, and then finish successfully.

Over time, as I recompiled and tested changes on my own, my userland would start to drift from the official snapshots, requiring a refresh (regular snapshot upgrade).

Fixing half of the first issue was long on my list of todo items. The redsnapper server has all the latest files unpacked and potentially available for individual download. If bluesnapper sees that a file it wants to patch is missing or different, it should offer to download the latest version. At least this way, the manual intervention necessary for file/directory conflicts would consist solely of removing the item in the way.

The bluesnapper client should also update just the kernel and then wait for a reboot to update userland. Just like Windows Update!

The /etc directory situation needs a better solution. It should just work.

I wanted to add pkg support, but it’s a much harder problem. The easy way is you need to run all (and only) the packages I support and have bluesnapper treat /usr/local like everything else, but that’d be a pretty big set. Or bluesnapper keeps a set of key packages up to date on your hard drive, in pkg.tgz form, and you use pkg_add to update them. I was working on this last approach, but creating binary diffs of compressed files doesn’t work. redsnapper would have to explode the pkg, then make the diff. bluesnapper would have to do the same. untar old pkg, apply diff, retar into new pkg. And I wouldn’t cover all the pkgs, just the big ones like firefox, so you’d have to set the pkg cache directory just right and so on and so forth.

finish

bluesnapper never caught on with users, which is fine. I didn’t promote it beyond the original announcement. Personally I certainly saved more time by using it than I spent building it, and I think I’ve proved that binary diff updates are viable for OpenBSD, even if now isn’t the time.

The source for bluesnapper.pl has long been available. I’ve thrown up redsnapper.pl as well. Please ignore my terribly embarrassing insecure tmp directory handling.

Posted 30 Mar 2013 20:41 by tedu Updated: 10 Oct 2014 00:40
Tagged: openbsd perl programming software