Correcting bugs in the rectangles test

Owen Taylor was kind enough to take a close look at my [[!recent post|understanding_rectangles]] comparing the performance of EXA and NoAccel rectangle fills on an r100. He was also careful enough to notice that the results looked really fishy.

Here are some the problems he noted from looking at the graphs:

  1. The EXA line looks to have an impossibly large fill rate

  2. The NoAccel line looks asymptotically linear rather than quadratic as expected.

  3. No chart of numbers was provided to allow for any closer examination.

I went back to the code for my test case and did find a bug that explains some of the problems he saw. The random positioning of rectangles wasn't correctly accounting for their size to keep them within the visible portion of the window. So, as the rectangle gets larger the region that is likely to be clipped by the destination window also gets larger. And that explains the linear rather than quadratic growth.

So here's a corrected version of the original graphs:


And, again, a closer look at the small rectangles:


And, this time I'll provide a chart of numbers as well:

Time to render 10000 rectangles with XRenderFillRectangles
Rectangle size NoAccel (ms) EXA (ms)
1x1 1.456 2.356
2x2 1.529 2.288
4x4 1.884 2.352
8x8 3.039 2.356
16x16 3.255 2.357
32x32 7.608 2.377
64x64 26.479 2.430
128x128 101.325 5.376
256x256 1295.105 22.549
512x512 15354.022 89.744

So that addresses the second and third of Owen's issues. But what about that fill rate? First, how can I know my card's maximum fill rate? I'm told that the standard approach is to use x11perf -rect500. Let's see what that gives for NoAccel:

NoAccel $ x11perf -rect500
    900 reps @   6.1247 msec (   163.0/sec): 500x500 rectangle

And then for EXA:

$ x11perf -rect500
   3000 reps @   1.9951 msec (   501.0/sec): 500x500 rectangle

So that shows fill rates of about 41M pixels/sec for NoAccel and about 125M pixels/sec for EXA, (500*500*163 = 40750000 and 500*500*501 = 125250000).

Meanwhile, my results above for the 10000 512x512 rectangles give fill rates of 171M pixels/sec for NoAccel and 29210M pixels/sec for EXA, (512*512*10000/15.354022 =~ 170733114 and 512*512*10000/.089744 =~ 29210197896).

So my test is reporting a NoAccel fill rate that is 4x faster than what x11perf reports, and an EXA fill rate that is 233x (!) faster than what x11perf reports. So, something is definitely still fishy here. A fill rate of close to 30 billion pixels/sec. from an old r100 just cannot be possible, (as another datapoint, I just got a new Intel 965 and with x11perf I measure a fill rate of 843 million pixels/sec. on it).

So what could be happening here? It could be that my cairo-perf measurement framework is totally broken. It does at least seem to be returning consistent numbers from one run to the next, though. And the results do appear to have the correct trend as can be seen from these two graphs showing the measured fill rates:

But again, notice from the Y-axis values of the cairo-perf plot that the numbers are just plain too large to be believed.

I don't yet have a good answer for what could explain the difference here. I did notice that exaPolyFillRect converts the list of rectangles into a region which should prevent areas overlapped my multiple rectangles from being filled multiple times. For x11perf there is no overlap at 100x100 or smaller, but a lot of overlap at 500x500. Similarly, the overlap gets more probable at larger sizes with the cairo-perf test. The existence of optimizations like that suggest that these tests might legitimately be able to report numbers larger than the actual fill rate of the video card.

But that code should also be common whether calling XRenderFillRectangles like my cairo-perf test does, or XFillRectangles like the x11perf test does. So that optimization doesn't explain what I'm seeing here. (I also reran my cairo-perf test with XRenderFillRectangles changed to XFillRectangles and saw no difference.)

Anybody have any ideas what might be going on here? Email me at or the xorg list at, (subscription required of course).