guest - flak

the design of strtonum

A few years ago I added a new function to OpenBSD libc, strtonum, to solve yet again the problem of converting a string into a number. In OpenBSD, we happen to like the function, but other projects have made various objections. The man page gives a brief explanation of the function, but it’s not a full history. I’ll try to clear the air by explaining strtonum’s rationale and responding to its criticisms.

First, the libc function evolved from a similar function I added to ping in response to a bug report about its number parsing. It was immediately obvious that similar problems plagued other programs and so we should fix all of them. strtol was obviously not the solution because it had been around for years and still nobody was using it correctly. We needed something very simple.

You can read the strtonum man page for details, but let’s review the strtonum interface. It takes as input a string and a range of accepted values, from min to max. It returns the integer number. For error handling, it takes a pointer to a string (referred to as errstr) and on error sets it to a message indicating the error. The interface is generally similar to atoi or strtol, the functions it is intended to replace.

The list of complaints I’ve read about strtonum is long and varied. Some are real, some result from people not bothering to study the function closely. In the seven years since strtonum was introduced and first criticized, however, nobody has presented a new function that resolves all the problems. I believe this is because the problem is intractable, and trying to please everyone results in a committee designed disaster. strtonum grew somewhat organically from our experiences using it. It pleases us.

First, the misconceptions. strtonum does not set errno except on error conditions. It’s very careful about that, but critics don’t read the implementation carefully enough. It will always set errstr to NULL on success, however. The primary mission of strtonum was always to make detecting success or failure easy.

strtonum is not a complete replacement for strtol. Notably, it doesn’t handle bases other than ten or trailing non-digit characters. It is a complete replacement for atoi. The man page does advertise it as a better strtol, which I think is fair, in that it’s a better atoi replacement than strtol, and lots of strtol usage is people searching for a better atoi. In particular, if strtonum is acceptable to use, it’s far simpler to use than strtol. Not every use of strtol can be replaced by strtonum. In that case, you are stuck with strtol. We wanted to make the easy stuff super easy. It’s ok for the hard stuff to still be hard.

The interface is defined in terms of the long long type, instead of something more sensible like intmax_t. In practice, these types are always going to be the same on OpenBSD. Again, we designed this function for OpenBSD first, without a lot of consideration for bizarre embedded systems with strange int sizes. The type in the interface is easily changed without affecting the nature of the function’s operation, and I don’t think any usage in OpenBSD actually exceeds the bounds of the plain int type. Even if intmax_t probably would have been a better choice, I think we’re sticking with long long just because it frustrates people unwilling to admit it doesn’t make a difference.

strtonum returns the integer directly and the error string via parameter. Partly, this is historic accident, because the original ping version did something similar. And that’s because I wanted the function to work like just like atoi or strtol, both of which return the value. Returning the error string may be the more common idiom, but has a big drawback. Due to the fact that strtonum performs range checking, it’s currently safe to assign the return value to a smaller type without risk of overflow. If the function had instead accepted a pointer to the value, that would have required all callers to use that particular type, be it long long, intmax_t, or strtonum_t whatever. It’s no longer such an easy dropin replacement.

strtonum only works with base 10. It’s true and it’s by design. Hexadecimal and octal numbers are not unknown, but they are less commonly used for quantities. If it’s important that users are able to enter octal, suck it up and use strtol. Same for trailing characters.

strtonum’s range does not accept max values in the upper half of the unsigned long long range. millert tried to make it work, but it gets confusing and weird. What happens when the min and max values are more than 2^64 apart? The implementation could probably be worked out, but then there are more error conditions to using the function and more documentation and more cognitive load for the caller. No. Using signed types lets the caller specify negative numbers but still leaves more than enough room. Again, strtonum is for parsing things like iteration counts, timeouts, and so forth.

So why does strtonum return a string and not an integer code for errors? Because it’s easier that way. We could have defined error codes and then a strtonum_strerror function a la gai_strerror, but why not save the caller some work? We didn’t want people switching on the error code and making up their own messages. That’s not simple. The errstr is somewhat redundant to errno, but that’s because the ERANGE errno doesn’t convey enough info. For example, strtol sets it for underflow, too, leading to confusing error messages when a negative number is really too small, not too large. (strtonum does the same, but the preferred value is in errstr, and that will say too small.) Sometimes the errstr may not be the best message to print to the user. Such is life. In those circumstances, the caller can do something clever.

The errstr is not localized. Indeed. OpenBSD is not localized. shrug

The name strtonum is too generic. I’ll concede this point, mostly because people see the name, immediately assume it’s going to handle every string to number corner case they can imagine, and are then bitterly disappointed when it doesn’t. But calling the function parse_everyday_number would be ridiculous. If the C standard ever does add a strtonum function, we will rename ours.

I have no regrets about strtonum. The OpenBSD project has used it to happily to replace countless atoi and strtol calls. If it can’t be used everywhere, that’s an acceptable tradeoff for making it easy to use where it does fit. Like I said, the strtonum detractors have failed to come up with anything better. That’s because if you truly need a flexible function, strtol is it. We made strtonum better by making it simpler, not by making it bigger. strtol isn’t really all that bad, it’s just awkward. The haphazard error checking in the source tree was ample evidence that correct usage didn’t happen automatically.

Posted 2011-07-27 17:11:55 by tedu Updated: 2015-03-02 21:07:29
Tagged: c openbsd programming software