flak rss random

connect doesn't restart

There was an interesting bug where pkg_add failed when resizing the terminal. The bug was actually in ftp, specifically the way it calls connect. When the terminal is resized, SIGWINCH is sent, which interrupts the connect system call. Sometimes syscalls restart, but connect is not among those that do. This may be a little surprising, because the previous bug involved the server side counterpart to connect, accept. On the server, accept restarts, but on the client, connect does not.

Behind the scenes, what’s happening? As the man page says, connect “initiates a connection on a socket”. It doesn’t say much about finishing the connection, though, which may be a bit surprising. Depending on whether the socket is blocking or nonblocking, there are two ways that may happen. This all assumes TCP, which involves some interplay of SYNs and ACKs that does not take place instantaneously. (Which explains why accept behaves differently. It is never in a half connected state.)

In the nonblocking case, the SYN is sent, and then the user should poll to wait until the socket is writeable, which indicates the connection is completed, successfully or not. To check for success, one uses getsockopt to check for SO_ERROR. If there’s no error, full steam ahead.

In the blocking case, things get complicated. By default, connect will wait for the connection to be completed and return success or failure. This is what most programmers probably expect, and it’s how a fair amount of code is written. The wrinkle is the signal case. When interrupted, connect will return an error. But the socket is still connecting! The SYN is still out there.

Where does this leave us? If we call connect again, we (should) get an error that the connection is already in progress, because it is. We’ve (perhaps unexpectedly) fallen into a state very similar to the nonblocking case. We now need to poll the socket for completion.

Posted 15 Aug 2016 21:00 by tedu Updated: 15 Aug 2016 21:00
Tagged: c openbsd programming