C++ Optimization Trick
Maybe this is old hat, but when I was writing a software rasterizer several years ago I came up with a trick for getting optimized variations for different render states (textured, flat shaded, Gouraud shaded, wireframe, z-write, z-test, cull-ccw, etc) without hand-writing multiple permutations of the rasterizer as is recommended by LaMothe and others.
It requires an optimizing C++ compiler that supports template functions. The trick is to pack as much state as possible into a single POD value like a 32-bit integer, write your function so that all of the if() tests for render state look at the integer for branching decisions, and call a template function that takes this value as a template parameter in a constant context like a switch statement.
Here's the base function full of nasty branches:
result_t func(int flags, const data_t& data) { if (flags & OPT_A) { // ... } if (flags & OPT_B) { if (flags & OPT_C) { // ... } else { // ... } } else { // ... } }
Write this wrapper template function:
template <int flags> result_t func(const data_t& data) { return func(flags, data); }
Then call the versions you need optimized with this wrapper function:
result_t call_func(int flags, const data_t& data) { switch (flags) { case 0: return func<0>(data); case OPT_A: return func<OPT_A>(data); case OPT_A | OPT_B: return func<OPT_A | OPT_B>(data); case OPT_A | OPT_C: return func<OPT_A | OPT_C>(data); case OPT_A | OPT_B | OPT_C: return func<OPT_A | OPT_B | OPT_C>(data); default: return func(flags, data); } }
Or with macros can be used to improve readability:
#define OPTIMIZE_CASE(f,n,d) case (n): return f<n>(d) result_t call_func(int flags, const data_t& data) { switch (flags) { OPTIMIZE_CASE(func, 0, data); OPTIMIZE_CASE(func, OPT_A, data); OPTIMIZE_CASE(func, OPT_A | OPT_B, data); OPTIMIZE_CASE(func, OPT_A | OPT_C, data); OPTIMIZE_CASE(func, OPT_A | OPT_B | OPT_C, data); default: return func(flags, data); } }
The compiler will expand the template to produce separate versions of the function with the specified flags as a constant template parameter, and most of the branches will be optimized away.
Note that you don't need to include every permutation, just the variations you expect to be used. When I used this in a game prototype I had the default clause emit warnings so that I knew when I'd tripped over a render state that I had not yet marked for optimization.
Comments
Post a Comment