Eliminating glyph fallbacks

Sometimes things get worse before they get better.

A few days ago, I presented a patch for storing glyphs as pixmaps which improved performance, but not as dramatically as one would have hoped.

I profiled the result and found that there were still a lot of software fallbacks going on. Tracking things down, (hints: enable DEBUG_TRACE_FALL in xserver/exa/exa_priv.h and I830DEBUG in xf86-video-intel/src/i830.h), I found a simple case statement that was falling back to software for any compositing operation targeting an A8 buffer. Fortunately, it looks like this fallback was due to a limitation in older graphics card that doesn't exist on the i965. So a very simple patch eliminates the software fallback.

So lets take a look at before-and-after profiles:

aa10text-fallbacks/ (144000 chars./sec.) symbols profile
aa10text-no-fallbacks/ (95000 chars./sec.) symbols profile

Yikes! The patch takes us from 144k chars/sec. to only 95k chars/sec. I'm regressing performance! But look again, and see that the libexa time has been cut dramatically, and the libpixman time has been eliminated altogether. That's exactly what we would hope to see for eliminating software fallbacks. So I've finally gotten this text-rendering benchmark to involve no software fallbacks. Hurrah!

Meanwhile, the intel_drv and vmlinux time have increased dramatically. Take a look at how hot those hotspots are in their profiles:

intel_drv:

samples  %        symbol name
29614    41.2170  i965_prepare_composite
26641    37.0792  I830WaitLpRing
9143     12.7253  i965_composite
1618      2.2519  I830Sync

vmlinux:

samples  %        symbol name
28775    25.3748  delay_tsc
21956    19.3616  system_call
7535      6.6446  getnstimeofday
5109      4.5053  schedule

So this is just the same, old synchronous compositing bug I identified earlier. Performance has gotten worse since I'm stressing out the driver and this bug more.

Dave Airlie has been doing some recent work that should let us fix that bug once and for all. Hopefully it won't be too long before I can actually post some positive progress here.

PS. I've also gotten one report that my patch for storing glyphs as Pixmaps speeds glyph rendering up initially, but after the X server has been running for about an hour or so, things get really slow. Shame on me for not doing any testing more extensive than starting the X server and then running a single client for a few minutes, (either firefox or x11perf). The report is that most of the time is disappearing into ExaOffscreenMarkUsed. Well the good news is that Dave's work eliminates that function entirely, (along with lots of migration code in EXA), so hopefully there's not any big problem to fix there. I'll have to test more thoroughly after synching up with Dave.