Implementing 2.5D culling for tiled deferred rendering (in OpenCL)

Introduction

In this post I’ll cover how to add 2.5D light culling to an existing tile based deferred shading solution in OpenCL.

An old screenshot from Linux Game Engine showcasing Tiled Deferred Shading

An old screenshot from Linux Game Engine showcasing Tiled Deferred Shading

What?

2.5D light culling is a method for improving tiled deferred shading performance (when parts of the scene are overlapping) by dividing up the depth interval into several parts.

It was invented by Takahiro Harada [1] [3], and it appeared in the 4th GPU Pro book [2].

There are several other methods of course to choose from when one decides to improve the tiled deferred performance for example: clustered deferred shading  [5].

Emil Persson also covered the topic [4].

Why?

The reason I choose 2.5D culling over the other methods is that it is really simple and it provides most of the performance improvement while it costs me only a little.

How?

In order to subdivide the depth interval one has to add a bit mask to each tile. This bit mask will identify each “slot” along the Z axis. When a pixel in a tile is in a slot you have to mark it as used.

Then you compute a bit mask for each light when you’re culling lights for each tile, and then &-ing the two bit masks will determine if the light intersects any bucket that is occupied.

Calculating the per-tile bit masks

Assuming you already have the min/max (vs_min_depth/vs_max_depth) depth for each tile computed, all you have to do is divide this up to 32 slots (since uint can store 32 bits). I did it like this: the nth slot start and end is defined like this:

[min + (n-1) * range / 32 ; min + n * range / 32]

where range is calculated like this:

float range = abs( vs_max_depth – vs_min_depth + 0.00001f ) / 32.0f;

The tiny correction is there to make sure that the pixel at vs_min_depth goes into the first slot, and the pixel at vs_max_depth goes into the last slot.

Then all you need to do is make sure that each pixel occupies the right slot. To do this you need to make sure that the first slot starts at 0, therefore the represented depth range will be [0…vs_max_depth].

You can do this by adjusting the per-pixel depth value:

vs_depth -= vs_min_depth;

Then you only have to calculate the depth slot and mark the corresponding slot as used.

float depth_slot = floor(vs_depth / range);

depth_mask = depth_mask | (1 << depth_slot);

Calculating the per-light bit masks

We assume that each light is represented as a sphere (spot lights too, to make it easy, that’s wasteful though) for the sake of simplicity.

Then you need to determine where the light starts and ends on the Z axis:
float light_z_min = -(light_pos.z + att_end);
float light_z_max = -(light_pos.z – att_end);

Adjust the light position to make sure that it goes into the right slot:

light_z_min -= vs_min_depth;

light_z_max -= vs_min_depth;

Calculate the min/max depth slots:

float depth_slot_min = floor(light_z_min / range);
float depth_slot_max = floor(light_z_max / range);

Then you just need to fill out the light’s bit mask:

depth_slot_min = max( depth_slot_min, -1.0f ); //clamp so that we don’t iterate from infinity to infinity…
depth_slot_min = min( depth_slot_min, 32.0f );

depth_slot_max = max( depth_slot_max, -1.0f );
depth_slot_max = min( depth_slot_max, 32.0f );

for( int c = (int)depth_slot_min; c <= (int)depth_slot_max; ++c ) //naiive implementation
if( c >= 0.0f && c < 32.0f )
light_bitmask = light_bitmask | (1 << c);

source code here: http://pastebin.com/h7yiUYTD

Also I’ll hopefully post on each Friday. Any suggestions, constructive criticism is welcome in the comments section 🙂

References:

[1] https://sites.google.com/site/takahiroharada/

[2] http://bit.ly/Qvgcsb

[3] http://dl.acm.org/citation.cfm?id=2407764

[4] http://www.humus.name/Articles/PracticalClusteredShading.pdf

[5] http://www.cse.chalmers.se/~uffe/clustered_shading_preprint.pdf