whose xterm is it anyway?

As part of the effort to find idle hogs, I noticed some xterms were heavier than others.

 9960 tedu       2    0 7056K   12M sleep     select    0:08  0.00% xterm
15257 tedu       2    0 6808K   12M sleep     select    0:01  0.00% xterm
10960 tedu       2    0 6924K   12M idle      select    0:01  0.00% xterm
25365 tedu       2    0 6796K   12M sleep     select    0:01  0.00% xterm

How did inmate 9960 come to acquire 8 whole seconds of CPU time? For that matter, which xterm is it? The answer to the second will likely reveal the first.

Looking around, all my xterms are currently idle. Just as indicated by top. How do we turn a pid into a window?

brute force

The brute, or even brutal, force technique is to quit each xterm one by one until 9960 goes away. A nicer approach is to send SIGSTOP to each xterm and see which one stops responding. (Alas, if xterm is setgid, you may not be able to SIGCONT it afterwards. Less nice.) Or run find / in each xterm while watching top to see who lights up. All a bit intrusive, but wasn’t it Heisenberg who proved there can be no observation without modification? Actually no, though observer effect is a real thing. Nevertheless, we can do a better job of observing xterms without pummeling them to see which one bruises.

Let’s start with ps.

 2157 p9  Ss      0:00.02 -ksh (ksh)
17271 p9  R+      0:00.00 ps
30407 pa  Ss      0:00.05 -ksh (ksh)
   79 pa  S+      0:00.16 vim kern_sig.c
20564 pb  Is+     0:00.01 -ksh (ksh)
21379 pc  Is+     0:00.03 -ksh (ksh)
  236 pd  Is      0:00.02 -ksh (ksh)
 3583 pd  S+      0:00.38 top

The second column is controlling terminal. So we have some hints. I now know which terminal is running ps, and which is running top, and which is finding out why SIGCONT doesn’t work. But no xterms, unless we run ps x.

 9960 ??  Is      0:07.80 xterm
24106 ??  Is      0:00.66 xterm
24358 ??  Is      0:00.25 xterm

xterms don’t have controlling terminals; instead they control the terminal. But this is still useful info to have.

> fstat -p 9960
USER     CMD          PID   FD MOUNT        INUM MODE       R/W    SZ|DV
tedu     xterm       9960 text /usr       702660 -rwxr-sr-x   r   596224
tedu     xterm       9960   wd /home     1611008 drwxr-xr-x   r     2560
tedu     xterm       9960    0 /          182659 crw-------  rw    ttyC0
tedu     xterm       9960    1 /          183026 crw-rw-rw-   w     null
tedu     xterm       9960    2 /          183026 crw-rw-rw-   w     null
tedu     xterm       9960    3* unix stream 0x0
tedu     xterm       9960    4 /          182379 crw-rw-rw-  rw    ptyp1

There it is. We’re looking at p1.

> pgrep -lf -t p1
6988 -ksh

Another approach is would be to run ps -O ppid (or pgrep -lf -P 9960) and look for the shell with a parent of 9960, and walk back up. Either way, it’s one of the dozen xterms sitting there with an idle shell, which is a hint not an answer. Running around and pasting echo $$ in each shell would find the suspect. Or I could run write tedu ttyp1 and look for the graffiti.

We can also continue further on this path, inspecting the working directory for each shell, and then narrowing our search to those xterms, but maybe it’s time to switch techniques.

just ask

A smarter approach would be to just ask. In theory, every xterm has a _NET_WM_PID property that is equal to its pid. This can be retrieved by running xprop and clicking the window. Or using the -id argument. Then we need all the xterm window IDs, which can be obtained via xwininfo.

> xwininfo -root -children | grep XTerm | awk '{print $1}' | \
    xargs -n1 -I % sh -c "echo %; xprop -id % _NET_WM_PID"
0xe0000d
_NET_WM_PID(CARDINAL) = 24106
0xc0000d
_NET_WM_PID(CARDINAL) = 9960
0xa0000d
_NET_WM_PID(CARDINAL) = 25365

Armed with the window ID, we can feed it back to xwininfo.

> xwininfo -id 0xc0000d
xwininfo: Window id: 0xc0000d "Thanks for flying Vim"
  Corners:  +-2542+15  -3831+15  -3831-12  +-2542-12
  -geometry 115x67+-2542-12

Alight, so this xterm is off screen somewhere, but the geometry maybe gives us another hint as to which it is based on size. And it once upon a time ran vim, which fiddles with the title. Interesting, but we’d like something a little more obvious.

> xwd -id 0xc0000d | xwud 
X Error of failed request:  BadMatch (invalid parameter attributes)
  Major opcode of failed request:  73 (X_GetImage)
  Serial number of failed request:  95
  Current serial number in output stream:  95
xwud: Error => Unable to read dump file header.
xwud: Resource temporarily unavailable

Damn. I was hoping for Woah! A new exact duplicate of 9960 has appeared. So that’s which one it is. but no dice. Depends on the suspect window being on screen. But if we can get all the windows on screen (dwm “0” screen) either this or the above approach can work.

For funsies, there’s a Stack Overflow answer dedicated to finding the pid for an X11 window, which is the reverse process.

inferno

We’re moving well past the point of no return now. Instead of using X to spy on our xterm, we can do so ourselves. This can be done using gdb, for instance. Unfortunately, other people would do it that way. How hard can it be to write a one off single purpose debugger?

Step one of our journey is gazing into the xterm source code. Eventually one will discover that there is a LineData structure with a pointer to what appears to be character data. There’s an array of these, one for each line. But there is not an obvious pointer to this array. Instead it’s accessed using a variety of casts, offsets, and pointer arithmetic, but the base pointer is visbuf in something called TScreen, a giant structure that takes over 500 lines of code to declare. That is embedded in an XTermWidget, and (thank the heavens!) there is a global pointer to one of these called term, bringing our trek to an end.

All we need to do now is write a debugger that iteratively reads each:

((LineData *)(term->screen.visbuf + offset))->chardata.

OpenBSD includes a useful sysctl for examining the address space of another process. Through arcane magic not explained here (procmap), I know the xterm I’m looking at has a text segment of 540672 bytes. We can find it programmatically thusly:

local function findexecbase(pid, execsize)
        local mib = ffi.new("int[3]")
        mib[0] = CTL_KERN
        mib[1] = KERN_PROC_VMMAP
        mib[2] = pid
        local numents = 200
        local ents = ffi.new("struct vmentry[?]", numents)
        local entsize = ffi.sizeof("struct vmentry")
        local oldsize = ffi.new("size_t[1]")
        oldsize[0] = entsize * numents
        local rv = C.sysctl(mib, 3, ents, oldsize, nil, 0)
        if rv == -1 then
                return nil
        end
        for i = 0, tonumber(oldsize[0]) / numents - 1 do
                local ent = ents[i]
                if (tonumber(ent.kve_end) - tonumber(ent.kve_start)) == execsize and
                                ent.kve_prot == PROT_RW then
                        return ent.kve_start
                end
        end
end
local addr = findexecbase(pid, 540672)

Using further magic (I’m cheating a bit, but basically nm xterm | grep term$), we know the offset from there to term, and then we can start chasing pointers with ptrace. Offsets calculated by compiling an xterm with a printf of interesting values.

local function pread(addr)
        local v = C.ptrace(PT_READ_D, pid, addr, 0)
        v = tonumber(v)
        if v < 0 then
                v = v + 4294967296
        end
        return v
end

local function preadptr(addr)
        local p1 = pread(addr)
        local p2 = pread(addr + 4)
        return p1 + p2 * 4294967296
end

rv = C.ptrace(PT_ATTACH, pid, 0, 0)
rv = C.waitpid(pid, nil, 0)
addr = addr + 4912632 -- offset of term
addr = preadptr(addr) --read term
addr = addr + 392 -- offset of term->screen
addr = addr + 15496 -- offset of screen.visbuf
addr = preadptr(addr)
print("SCREEN DUMP")
for row = 0, 10 do
        local datadr = preadptr(addr + row * 48 + 24)
        local s = { }
        for i = 0, 80 do
                local v = pread(datadr + i * 4)
                table.insert(s, string.char(v))
        end
        print(table.concat(s))
end
rv = C.ptrace(PT_DETACH, pid, 0, 0)

Let it rip and...

SCREEN DUMP
    if (row >= 0 && row <= max_row) {                                            
        result = (LineData *) scrnHeadAddr(screen, buffer, (unsigned) row);      
        if (result != 0) {                                                       
#if 1                           /* FIXME - these should be done in setupLineData,
            result->lineSize = (Dimension) MaxCols(screen);                      
#if OPT_WIDE_CHARS                                                               
            if (screen->wide_chars) {                                            
                result->combSize = (Char) screen->max_combining;                 
            } else {                                                             
                result->combSize = 0;                                            
            }

Hey! Now that does look familiar. It’s the source code to the line getting function in xterm. Now I know exactly which window it is.

epilogue

This was a pretty big waste of time. As soon as I saw that one xterm was busier than the rest, I knew exactly which one it was: the one I read mail in, which has to redraw the screen for every email. This was trivially confirmed using any of the brute force techniques which work well enough with some educated guesswork guiding them. Learning to script gdb may have been faster, but a lot less fun.

Posted 25 Sep 2015 05:46 by tedu Updated: 25 Sep 2015 05:46
Tagged: lua openbsd programming