Monday, April 29, 2013

Some ray tracing performance on Javascript engines

I am playing with ideas to represent world geometries. Playing with deformable voxels as "surface nets", I finally implemented a deformable recursive grid to store the voxels.

I guess the idea is really close to brick maps as for example, exposed by Cyril Crassin in his nice Ph.D. here.

The idea is super simple: this is just a grid of grid of .... of voxels.

Anyway, the code is here for those who want to look at it:

https://github.com/bsegovia/cube-template/blob/master/src/ogl.cpp

(look for brick something)

One of the interestings parts is obviously to raytrace this thing. Ray tracing a grid is super simple as exposed by Amanatides here. Extending it to recursive grid is as simple as some c++ template horror and tracking both tmin and tmax while recursing (see "intersect" function).

This led to benchmarking the javascript engines. Here is some very simple level:


with the depth map as it is raytraced:


Native C has been compiled with gcc 4.7 and the JS versions with emscripten.

Here are the performance:

Native C: 919,000 ray/s
Firefox nightly: 148,000 ray/s
Firefox nightly + asm.js: 508,000 ray/s
Chromium 25: 111,000 ray/s

asm.js proves to be super efficient and as advertised, I got 50% of native performance.

Obviously, native code is not using any SIMDified code so, in practice, the gap is still much larger. Also, for JS versions, multi-threaded code will only work with webworkers which will only support message passing. This may bring some extra hit compared to a regular shared memory implementation.

Anyway, this is still really encouraging!



Tuesday, April 16, 2013

Beignet (OpenCL for IVB on Linux) got its first release!

Youhou!

The OpenCL code base I initiated last year is now officially supported by Intel. The guys there added plenty of stuffs. It is pretty cool.

The official announce is here:
http://lists.freedesktop.org/archives/intel-gfx/2013-April/026747.html

There is a short news on Phoronix:
http://www.phoronix.com/scan.php?page=news_item&px=MTM1Mjc

Even something on Slashdot!

I am pretty eager to see further developments. OpenCL, though far from perfect, is a nice API easy to use and that can give decent performance.

This is anyway pretty cool since most of the initial code I did was a lonely project and I had never been sure it will be officially supported at some point.

Sunday, March 31, 2013

Cube ported to JS

Hello all,

following the hype, I spent some time porting Cube (1), the minimalistic FPS, to javascript with Emscripten.

Here is a screenshot:



Emscripten is amazing. The guys did a really good job to wraps useful libraries like SDL, SDL_mixer and so on.

On my side, most of the effort was therefore basically on adapting the old OGL 1.x code to a more recent webGL like version of it.

Emscripten developers put a large effort on making OGL 1.x directly running in the browser but I wanted to have a cleaner and faster implementation only using webgl features

Cube is an interesting code base compiler wise. It is indeed pretty CPU bound since a large time is spent to build the vertex arrays for every frame instead of caching them into VBOs as every game does today.

On my machine (Intel NUC with i3 IVB), the code is mostly 4 times slower than the native version. I tried asm.js with the latest nightly build but with no performance difference. Not sure why.

Code is slow on Chrome.

Code is here:
https://github.com/bsegovia/cube-gles

You  need to download cube data yourself here:
http://sourceforge.net/projects/cube/

Enjoy!

Wednesday, January 16, 2013

Bits of OpenCL on Linux for IvyBridge are published

Some OpenCL code I was working on at Intel has been recently publically pushed and is now available for download.

A phoronix news about this is here: http://www.phoronix.com/scan.php?page=news_item&px=MTI3MTU

The code is here: http://cgit.freedesktop.org/beignet/

The code basically contains both the run-time code which is basically the OpenCL host code (clKernel*, clProgram*) and a compiler back-end which is responsible to take bits of LLVM IR code and to output IvyBridge ISA from it.

Small problem is that I am not working for Intel anymore so, I will not work that much on it.

The code is decent even if some parts are a bit over-the-top, c++ style wise. However, the most important parts are already here (instruction scheduling, instruction selection infrastructure, register allocation) even if a serious amount of work is still to be done.

Since I am not working for Intel anymore, I cannot speak for Keith Packard and the others but I guess any patch will be welcomed. For those interested in low-level hacking, do not forget that the complete IvuBridge documentation is still here: https://01.org/linuxgraphics/documentation/2012-intel-core-processor-family