Wednesday, March 10, 2010

My bid to AMD's 48 cores

This is my entry to AMD's challenge to use a 48-core computer.

OpenGL shading, DirectX rendering, Toy Story, Jurassic Park, Call of Duty graphics, ray tracing... it is all child's play, stuff from the minor-league when it comes to high quality rendering. These render methods are all very crude approximations of what light actually does. To do a faithful simulation of light, you need to solve what CG scientists call 'the Rendering Equation'.

The best known method for solving this is Henrik Wann Jensen's Photon Mapping algorithm. The concept is downright simple. Unlike ray tracing, where rays are fired from the eye into the scene, Photon Mapping shoots photons from the light sources into the scene. In a post process step, the scene is sampled for photon distribution. So how many photons are we talking here? Well, typically between 100M and a billion to get a noise-free high quality image. And after doing all that work, what does it yield? Well, you get things like Caustics, Diffuse Inter-reflections and Soft Shadows not obtainable by conventional render methods. The images below will illustrate my point (taken from Henrik's web site.)

In the field of parallel computing, a class of problems is designated as 'embarrassingly parallel'. Those problems are easy to divide up in chunks that can de computed independently from each other. Ray tracing is an example of a problem in this class. However, Photon Mapping is just as embarrassingly parallel, maybe even more so. When shooting let's say 480M photons into a 3D scene, it will be as simple as giving each core 10M photons each. After generating those 48 maps with photons, each can be sampled (with final gathering) independently, as there is no need to merge the maps. The combining of results can be performed in screen space (pixel basis).

Generating these physically correct images can typically take many hours per single image. A parallel algorithm will exhibit near linear speed ups. This means that 48 cores will render the same image almost 48 times faster than a single core would. In my previous job, I worked at the SARA super computing centre. While working there, I had access to 128 core Itanium and 512 core MIPS super computers. These were not clusters, but real shared memory supers, with all those processors running under a single OS image. During an occasional idle time, after work hours, I would test my photon mapping implementation, and I can confirm that Photon Mapping indeed scales linearly, even with a lot of cores thrown at it.

To summarize... if I had access to a 48 core machine, I would use it to investigate how feasibly it is to do Photon Mapping at more interactive rates at low photon count, and low screen resolutions.