thoughts about computer graphics, game engine programming

The philosophy of LGE


LGE (Linux Game Engine for short) is a homebrew game engine that I started to write 5 years ago. Since then, I maxed out post-processing and lighting (and other effects) delivering equal or better quality than today’s game engines. I was also great practice, since there’s always something to do.

In order to get to this level with the engine I needed to “cheat” in a sense that I certainly didn’t have infinite time to do the work of the usual 40-50+ something people working on a commercial game engine.

This is where I had to establish certain rules in order to meet my goals. If you follow these rules, you might get a usable game engine in much shorter time than usual, however:

  • it will NOT account for all the problems: I certainly didn’t know about many of them, but you can alleviate (more or less) this problem by reading: http://www.gameenginebook.com/ (I didn’t know about it at that time, therefore I had to rewrite the engine some times… but mistakes are there to be learned from)
  • there will not be many tools if at all (that is where IMO a game engine starts to become million lines of code)
  • you will certainly not make a GAME with the engine you’re writing in a sensible amount of time. If you want to create a game then look for already existing game engines, like Unity, Cryengine 3, Unreal Engine 4 etc.
Philosophy of Linux Game Engine

Philosophy of Linux Game Engine

The rules (philosophy)

  • if something is written already AND it works the way you’d like it to AND it’s easily usable AND it seems to be maintained AND it has good documentation, then don’t try to rewrite that just so that you have your own implementation of that particular feature… Examples: STL functionality, mesh loading library, image loading library, math library etc.
  • do rewrite stuff that is highly engine specific (and it doesn’t meet the previous criteria). Examples: engine-wide communication (messaging), component-entity-systems, resource management
  • after most of your engine’s features are complete, you can go ahead and profile the engine! At this time if you find out that some of the functionality from the previous rules is slow, then go ahead and roll your own! Examples: memory allocation (you’re good with the default malloc/free, new/delete until you run into issues), string library, custom mesh format etc. However: if you want your engine to be highly multithreaded, you should build your engine with that in mind (but you don’t want to write your own thread pool –> use Intel Threading Building Blocks)
  • you DON’T want to write you own: GUI, physics (Bullet) and the like. There are libs that already do this really well. For the GUI for the editor I’m using libberkelium (embedded browser), this way I can build my GUI in a website building application, and I can even test it using a dummy backend.
  • visuals without compromises, in real-time: this isn’t really something that you have to do, but with LGE it felt logical, since this is the most rewarding part for me, I never got bored with it. It was interesting though that in 2009-10 I worried about adding HDR and bloom effects, because on a Geforce 8600GT it only ran with 30-something FPS (in something like 768p), but today I have gazillions of post-processing effects and I know that they’d run on any pc that was upgraded recently. So the goal here was the following: don’t care much about performance using non-realistic rendering (ie. not pathtracing) just add any and every effect possible that would make it look realistic, close to today’s best looking commercial engines. I mostly achieved this goal, I’m only missing good directional light shadows and global illumination, but these will be implemented soon too.
  • don’t try to research new rendering technology (unless you really have to): you almost certainly won’t have time for this, you are not a 40-50 people team that has at least 3-4 members dedicated to this. Instead try to implement what the big guys do (they did have the time to research and optimize all this), and add your own ideas. There are tons of published technology available online, you just have to look for them.
  • support only latest technology: your engine will only be done in say 5 years, therefore it’s pointless to support last gen technology (like DX9), because in 5 years it will become outdated. Use the latest DX and OGL versions, this way you can ensure that your engine will be up-to-date when you finally decide it’s done (it never is, but let’s assume that it does some stuff you wanted it to, and therefore it’s “done”)
  • try to make your engine as flexible as possible, look up how the big guys do this: this isn’t really a rule either, but I wanted the engine to be this way. My favourite example of this is how it’s implemented in Unity: you can basically put your game objects together as legos and each script instance will be tied to a game object, therefore “this” will be the game object. This is incredibly intuitive to use, it “just makes sense”. You can also rewrite the post-processing and rendering pipeline in Unity too, which makes the engine really flexible to whatever style a game will try to implement.
  • dynamic everything: not a rule either, I just wanted the engine to be a bit different from current gen game engines that have most of the world as static objects that you can’t interact with in any way (just collide with them maybe). This makes the world feel… dead… Plus if some of the objects are dynamic, like a football, the you’d miss the realism of breaking a window with that football. Get it? However this requires everything to work at least like it works in bullet: if an object comes to rest it becomes static, but it can be awaken later by another object. This also means that you need dynamic lighting, shadows, GI, physics, (totally) destructible environments, and the list goes on. This is incredibly hard to do in real-time, this may be the reason current gen game engines still contain static objects.
  • cross-platform / cross compiler development: aside from having as many target platforms as possible, it’s also great because different compilers notice different bugs (gcc vs vsc++ vs clang). You should definitely try to develop for Win/Linux and if possible (if you have one) for osx too. If your game engine becomes a success on these platforms, then it will be much simpler to port it to consoles and mobile.
  • cmake based build system: this makes sure that your build system is compiler agnostic, and that you are able to handle many platforms easily. Also some of your future team will work on different compilers, so your engine should compile fine under say Eclipse, QtCreator, Visual Studio.
  • open source: (if you want to) it is beneficial to everybody if they can learn from your code. However you may not go this way if you plan to go commercial…


Implementing 2.5D culling for tiled deferred rendering (in OpenCL)


In this post I’ll cover how to add 2.5D light culling to an existing tile based deferred shading solution in OpenCL.

An old screenshot from Linux Game Engine showcasing Tiled Deferred Shading

An old screenshot from Linux Game Engine showcasing Tiled Deferred Shading


2.5D light culling is a method for improving tiled deferred shading performance (when parts of the scene are overlapping) by dividing up the depth interval into several parts.

It was invented by Takahiro Harada [1] [3], and it appeared in the 4th GPU Pro book [2].

There are several other methods of course to choose from when one decides to improve the tiled deferred performance for example: clustered deferred shading  [5].

Emil Persson also covered the topic [4].


The reason I choose 2.5D culling over the other methods is that it is really simple and it provides most of the performance improvement while it costs me only a little.


In order to subdivide the depth interval one has to add a bit mask to each tile. This bit mask will identify each “slot” along the Z axis. When a pixel in a tile is in a slot you have to mark it as used.

Then you compute a bit mask for each light when you’re culling lights for each tile, and then &-ing the two bit masks will determine if the light intersects any bucket that is occupied.

Calculating the per-tile bit masks

Assuming you already have the min/max (vs_min_depth/vs_max_depth) depth for each tile computed, all you have to do is divide this up to 32 slots (since uint can store 32 bits). I did it like this: the nth slot start and end is defined like this:

[min + (n-1) * range / 32 ; min + n * range / 32]

where range is calculated like this:

float range = abs( vs_max_depth – vs_min_depth + 0.00001f ) / 32.0f;

The tiny correction is there to make sure that the pixel at vs_min_depth goes into the first slot, and the pixel at vs_max_depth goes into the last slot.

Then all you need to do is make sure that each pixel occupies the right slot. To do this you need to make sure that the first slot starts at 0, therefore the represented depth range will be [0…vs_max_depth].

You can do this by adjusting the per-pixel depth value:

vs_depth -= vs_min_depth;

Then you only have to calculate the depth slot and mark the corresponding slot as used.

float depth_slot = floor(vs_depth / range);

depth_mask = depth_mask | (1 << depth_slot);

Calculating the per-light bit masks

We assume that each light is represented as a sphere (spot lights too, to make it easy, that’s wasteful though) for the sake of simplicity.

Then you need to determine where the light starts and ends on the Z axis:
float light_z_min = -(light_pos.z + att_end);
float light_z_max = -(light_pos.z – att_end);

Adjust the light position to make sure that it goes into the right slot:

light_z_min -= vs_min_depth;

light_z_max -= vs_min_depth;

Calculate the min/max depth slots:

float depth_slot_min = floor(light_z_min / range);
float depth_slot_max = floor(light_z_max / range);

Then you just need to fill out the light’s bit mask:

depth_slot_min = max( depth_slot_min, -1.0f ); //clamp so that we don’t iterate from infinity to infinity…
depth_slot_min = min( depth_slot_min, 32.0f );

depth_slot_max = max( depth_slot_max, -1.0f );
depth_slot_max = min( depth_slot_max, 32.0f );

for( int c = (int)depth_slot_min; c <= (int)depth_slot_max; ++c ) //naiive implementation
if( c >= 0.0f && c < 32.0f )
light_bitmask = light_bitmask | (1 << c);

source code here: http://pastebin.com/h7yiUYTD

Also I’ll hopefully post on each Friday. Any suggestions, constructive criticism is welcome in the comments section 🙂


[1] https://sites.google.com/site/takahiroharada/

[2] http://bit.ly/Qvgcsb

[3] http://dl.acm.org/citation.cfm?id=2407764

[4] http://www.humus.name/Articles/PracticalClusteredShading.pdf

[5] http://www.cse.chalmers.se/~uffe/clustered_shading_preprint.pdf

predictions from the past about game engines

In 2010, when I started developing my custom homebrew game engine called “Linux Game Engine” (lge for short) I didn’t really know much about engine development. However, I did have some ideas about what was wrong back then with all the game engines of that time.

I observed the following: whenever I played around with a game engine, there were several occasions when one had to wait for the tool to do some background number crunching, like baking lightmaps or generating paths for pathfinding. This observation was based on Unreal Engine 3 and Cryengine 2, two of the biggest engines of that time.

Therefore the goal emerged: write a game engine that is truly WYSIWYG meaning no compromises in quality, plus no loading or number crunching time, as fast iteration on content as possible (all this in an AAA context).

So the goal in today’s terms was to basically write a Cryengine 3, Unreal Engine 4 and Unity 4/5 mix, combine the strengths of all of them. This seemed to be impossible even back then, but as I didn’t have any projects to improve my coding skills on, I decided to begin developing the engine. Therefore the engine developed and improved as I did, and there were at least 3-4 compelete rewrites until I got some of the things right.

The engine was meant to be a tool with which one could develop a FPS easily (without the intent to make an actual FPS game, that alone is way too much effort). Of course, later I discovered that I can actually write most of the engine agnostic to what genre the game will be.

So going back to the present, now most of the game engines try to achieve as little iteration times as possible, Cryengine 3 includes WYSIWYP, Unreal Engine 4’s global illumination solver has been redesigned to be much faster. Unity 5 includes Enlighten which is pretty much realtime, therefore we could say my predictions (or goals) were correct at the time.

In conclusion, if you ever embark on writing a game engine, you should always consider that it will be somewhat “ready” 4-5 years later, and you’d have to predict what the technology will be at that time, develop the engine with the future in mind. Of course this is impossible most of the time.

hello world

hello world post. yay.

This’ll be about:

  • game engine programming
  • computer graphics
  • some other technical thoughts