So I found the answer to my [[!fill rate confusion|corrected_rectangles]] and it turned out to not be all that interesting in the end---no pretty graphs this time. And it should have been obvious to me---though admittedly the EXA run time was too fast for me to see what was happening.
What I did was eliminate all the variables of the cairo-perf test
suite by writing a tiny [[!standalone test case|rectangles.c]]. I
happened to be running an XAA server at the time, and when I ran the
test it gave exactly the same results as x11perf -rect500
. They both
rendered 501 500x500 rectangles per second. But there was an obvious
difference, x11perf flashed wildly while my test stayed a constant
black.
So a quick glance with xtrace---by the way this is the long [[!sought after|understanding_rectangles]] X protocol tracer that actually decodes Render requests, and it's much easier to use than any of xmon, xscope, or wireshark. Hurrah! (And many thanks to Behdad for pointing it out to me). Anyway, xtrace showed immediately that my test was sending rectangles in batches of 256 per request while x11perf was sending only 1 per request.
And how could I have missed the obvious fact that x11perf is alternating black and white colors when drawing rectangles while my program was sending only white? (Cue forehead smacking sounds.) I changed my program to do the same, (the file linked to above contains this change), and it now behaves exactly like x11perf.
At least I was correct that the speedup I saw is due to the optimization in EXA to avoid doing any redundant filling of rectangles that overlap. So with that optimization guess what happens when you send 256 rectangles that overlap almost entirely? Wow, it goes about 200 times faster.
OK, so that's actually a pretty boring result. I can't see that it's all that useful to send lots of overlapping rectangles to the X server, (but if your application is doing this for any reason, use EXA and it will go faster).
Oh, and just to leave on another note of mystery. After I saw many runs of both x11perf and my test agreeing on 501 rectangles/sec., after a server restart I started getting 772 rectangles/sec. At first I thought this was due to a different X server build and configuration file that I had switched to, but when I switched back, the original one also gave me 772 rectangles/sec.
Incidentally, the 501 rectangles/sec. rate corresponds to the 125M pixels/sec. fill rate I reported in my previous post. So now I'm getting a 193M pixels/sec. fill rate and I have no idea what changed. (And I'm also wondering what the expected maximum fill rate is for an r100. Anyone know? I guess it depends on how fast the memory is on my card, and I'm not exactly sure how fast it might be.)