As part of the effort to find idle hogs, I noticed some xterms were heavier than others.
How did inmate 9960 come to acquire 8 whole seconds of CPU time? For that matter, which xterm is it? The answer to the second will likely reveal the first.
Looking around, all my xterms are currently idle. Just as indicated by top. How do we turn a pid into a window?
The brute, or even brutal, force technique is to quit each xterm one by one until 9960 goes away. A nicer approach is to send SIGSTOP to each xterm and see which one stops responding. (Alas, if xterm is setgid, you may not be able to SIGCONT it afterwards. Less nice.) Or run find / in each xterm while watching top to see who lights up. All a bit intrusive, but wasn’t it Heisenberg who proved there can be no observation without modification? Actually no, though observer effect is a real thing. Nevertheless, we can do a better job of observing xterms without pummeling them to see which one bruises.
Let’s start with ps.
The second column is controlling terminal. So we have some hints. I now know which terminal is running ps, and which is running top, and which is finding out why SIGCONT doesn’t work. But no xterms, unless we run ps x.
xterms don’t have controlling terminals; instead they control the terminal. But this is still useful info to have.
There it is. We’re looking at p1.
Another approach is would be to run ps -O ppid (or pgrep -lf -P 9960) and look for the shell with a parent of 9960, and walk back up. Either way, it’s one of the dozen xterms sitting there with an idle shell, which is a hint not an answer. Running around and pasting echo $$ in each shell would find the suspect. Or I could run write tedu ttyp1 and look for the graffiti.
We can also continue further on this path, inspecting the working directory for each shell, and then narrowing our search to those xterms, but maybe it’s time to switch techniques.
A smarter approach would be to just ask. In theory, every xterm has a _NET_WM_PID property that is equal to its pid. This can be retrieved by running xprop and clicking the window. Or using the -id argument. Then we need all the xterm window IDs, which can be obtained via xwininfo.
Armed with the window ID, we can feed it back to xwininfo.
Alight, so this xterm is off screen somewhere, but the geometry maybe gives us another hint as to which it is based on size. And it once upon a time ran vim, which fiddles with the title. Interesting, but we’d like something a little more obvious.
Damn. I was hoping for Woah! A new exact duplicate of 9960 has appeared. So that’s which one it is. but no dice. Depends on the suspect window being on screen. But if we can get all the windows on screen (dwm “0” screen) either this or the above approach can work.
For funsies, there’s a Stack Overflow answer dedicated to finding the pid for an X11 window, which is the reverse process.
We’re moving well past the point of no return now. Instead of using X to spy on our xterm, we can do so ourselves. This can be done using gdb, for instance. Unfortunately, other people would do it that way. How hard can it be to write a one off single purpose debugger?
Step one of our journey is gazing into the xterm source code. Eventually one will discover that there is a LineData structure with a pointer to what appears to be character data. There’s an array of these, one for each line. But there is not an obvious pointer to this array. Instead it’s accessed using a variety of casts, offsets, and pointer arithmetic, but the base pointer is visbuf in something called TScreen, a giant structure that takes over 500 lines of code to declare. That is embedded in an XTermWidget, and (thank the heavens!) there is a global pointer to one of these called term, bringing our trek to an end.
All we need to do now is write a debugger that iteratively reads each:
((LineData *)(term->screen.visbuf + offset))->chardata.
OpenBSD includes a useful sysctl for examining the address space of another process. Through arcane magic not explained here (procmap), I know the xterm I’m looking at has a text segment of 540672 bytes. We can find it programmatically thusly:
Using further magic (I’m cheating a bit, but basically nm xterm | grep term$), we know the offset from there to term, and then we can start chasing pointers with ptrace. Offsets calculated by compiling an xterm with a printf of interesting values.
Let it rip and...
Hey! Now that does look familiar. It’s the source code to the line getting function in xterm. Now I know exactly which window it is.
This was a pretty big waste of time. As soon as I saw that one xterm was busier than the rest, I knew exactly which one it was: the one I read mail in, which has to redraw the screen for every email. This was trivially confirmed using any of the brute force techniques which work well enough with some educated guesswork guiding them. Learning to script gdb may have been faster, but a lot less fun.