flak rss random

string interfaces

A little while ago, deraadt converted the tame API to use strings instead of using CPP macros to assemble a bit mask. String based interfaces are a little unusual in C, but they’re quite handy in some cases.

In the case of tame (now called pledge, btw), the strings are easier to read. CPP macros need to be prefixed with namespace to avoid collisions, but then one has to use the prefix half a dozen times on the same line.

tame("stdio cmsg getpw proc dns", NULL);
tame(TAME_STDIO | TAME_CMSG | TAME_GETPW | TAME_PROC | TAME_DNS, NULL);

The interface is also more easily extended. Although we’re not out of bits yet, using a string allows for the possibility of more flags without altering the syscall interface. It’s even possible to permit suboptions, like “stdio:stderr”, although this is only a hypothetical extension.

And while I doubt Theo was much concerned with my ability to use tame in languages other than C, avoiding the preprocessor accommodates that much better. Other system interfaces are quite the hassle to use via FFI like bindings.

/usr/X11R6/include> grep -R ^#define * | wc -l
   42837

That’s a lot of boilerplate to copy!

Other uses of similar interfaces include SSL_CTX_set_cipher_list and tls_config_set_ciphers. Also used in crypt_newhash. (The original crypt interface wasn’t explicitly designed to be extensible, but being string based, people found a way to hack new hashes in there too.)

Other extensible interfaces are often a pain to use.

pthread_attr_t attr;
pthread_t thread;

pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 4096);
pthread_attr_setguardsize(&attr, 0);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&thread, &attr, start);

Yikes. Versus:

pthread_magic_create(&thread, "stack:4096 guard:0 detached", start);

There’s also the semi extensible interface used by GLX.

   int attribs[64];
   int i = 0;

   attribs[i++] = GLX_RGBA;
   attribs[i++] = GLX_DOUBLEBUFFER;
   attribs[i++] = GLX_RED_SIZE;
   attribs[i++] = 1;
   attribs[i++] = GLX_GREEN_SIZE;
   attribs[i++] = 1;
   attribs[i++] = GLX_BLUE_SIZE;
   attribs[i++] = GLX_DEPTH_SIZE;
   attribs[i++] = 1;
   attribs[i++] = None;

   visinfo = glXChooseVisual(dpy, scr, attribs);

This is like the worst of both worlds. It’s tedious to use, but still offers very little safety. (The example above has a hosed blue size, and a useless GLX_USE_GL hanging on the end.)

   glXGimmeAVisual(dpy, scr, "rgba double red:1 green:1 blue:1 depth:1");

Although using strings subverts C’s already weak type checking, that’s probably not a major concern. One can screw up bit masks by using || in place of |. Or, as above, one can incorrectly pack the magic array. It’s usually much easier to visually audit a string than the C code used to plaster a dozen option together.

Of course, one should not use a string interface if it doesn’t make sense, but we need not reject the concept out of hand without at least some consideration. Perhaps a few more real life examples to explore the subject.

The getopt function’s option string could be considered a generate case. The argument is not actually a string; it’s an array of characters exactly like the attribs array used for glXChooseVisual. It just so happens that C has magic syntax for arrays of this particular type.

Format strings for the printf family are the most familiar. It’s quite the rare programmer who only uses print_int and print_string and print_double functions. And most C compilers will syntax check format strings these days, so it is at least theoretically possible to do so, thought admittedly unlikely for all but the most popular of interfaces.

The posix_spawn function is too complicated for me to want to write an example, but look up the documentation for the file actions struct: posix_spawn_file_actions_addclose, posix_spawn_file_actions_adddup2, and posix_spawn_file_actions_addopen. It amounts to writing, by hand, a bytecode program equivalent to the shell’s file redirection syntax. A string like “3>1 4>2” would be much simpler to use, and not substantially harder to implement. The hard part here is the bytecode interpreter, not the parser.

Speaking of interpreters and parsers, how about regcomp? It’s not strictly necessary to use a string to compile a regular expression. One might also envision an interface like:

regemit(&re, REG_ANCHOR, REG_START);
regemit(&re, REG_ANY, REG_ZERO_OR_MANY);
regemit(&re, REG_DIGIT, REG_ONE_OR_MANY);
regemit(&re, ' ', REG_ONE);

Personally, I’d prefer regcomp(&re, "^.*[[:digit:]]+ ", 0).

One may object that that string parsing is complicated, or difficult, or insecure. Addressing the last issue, there can only be a security concern when the string and the parser originate at different privilege/trust levels. Especially with static strings compiled into a program, this is often not the case. Something to be mindful of, but often a red herring. As for complexity, one need only get the implementation right once. As often as not, the more typical binary interfaces are fronted by adhoc parsing code to handle either the command line or a config file. It seems more likely that one programmer will get this right than that every programmer will.

Posted 29 Sep 2015 17:03 by tedu Updated: 10 Oct 2015 00:36
Tagged: c programming