Simulating particles with compute shaders

For my specialization I created a new particle system in our groups engine with the use of compute shaders. I added vectorfields with support for .fga and .vdb files and experiemented with curlnoise for a more interesting behaviour.

The system was faster, better and could use more complex behaviors than the old system ever could and was a great addition to our engine making our games even better.


For my portfolio, I wanted to do something in the graphics pipeline since I find it very fascinating. In the project group, I was the person responsible for the particlesystem of our engine. Therefore, it felt natural to push the boundaries of this particlesystem which I found very unoptimized. 

The game industry is challenged every day as the consumers demand games with higher fidelity and more details. This pushes the hardware and software to the limits and all possibibilites to optimize the engines have to be considered. Games rich in details require many particles.  Many particles causes many calculations which demand computer power. A particle system containing a few hundred particles is fine but dozens of systems active at the same time with a hundred thousand particles in each system slow down most computers. The CPU has certain limits since it is a generalist that needs to handle many different operations. This is opposite to the GPU that is specifically optimized for large batches of calculations. Moving the work from the CPU to the GPU is a great way to optimize the engine.

An example of unitys new visual graph system, which uses compute shaders for its particles

The behavior of the particles is to be processed by the GPU with a type of shader called compute shader, which does not return pixels but performs work based on the number of threads set to be executed. I had never worked with compute shaders before but wanted to push my limits and try something new with our engine. To use compute shaders in our engine, I first had to modify our render hardware interface, which needed some modifications to be able to handle compute shaders. Once that was done, I started looking into how to transfer data to the GPU.

I started with a structured buffer, which I successfully filled with data and transferred to the shader. After some more research, I changed the structured buffer to a more specialized buffer called append/consume buffer. I found append/consume buffers to be commonly used by engine programmers to handle the simulation state of particles. This buffer can either be used as append buffer or a consume buffer. As an append buffer data is enqued (appended) but as a consume buffer the data is dequeued (consumed). The buffers types are interchangeable.

The advantage of append/consume buffers is its ability to handle the death and emission of new particles without the need of a separate list keeping track of which particles are dead or alive. The dead particles are simply not appended to the buffer if they are considered dead. New particles to be emitted are injected in an emit compute shader before the update compute shader. To update a particle, it is first consumed from the consume buffer and then updated (velocity, color, position etc). It is then appended to the append buffer to be presented to the current frame. The append buffer containing the particles is then changed to become the consume buffer while the old consume buffer is converted to an append buffer and the cycle begins all over again.


I struggled how to handle data in append/consume buffers since the documentation was lacking and this was a whole new system to me. I took everything in small steps by using the old CPU particle system for rendering which meant copying the data back to the CPU. After a week, I had a system sending data back and forth between the two buffers I could observe on the screen.


Finally, I something moving in the window! I improved the system by removing the last part of the CPU particle system used for rendering. Instead I changed to indirect drawing on the GPU. This meant the data was always in the GPU avoiding the large bandwidth of having the data sent back and forth between the GPU and CPU. I now had a GPU particle system where I could explore the addititon of new interesting behaviors.


I added all the behaviors from my old particle system such as start velocity, lifetime, color over lifetime etc.

To benchmark the system, I spawned 1 million particles. During the the test the game never dropped below 100 fps. By comparison, 1 million particles with the old system had a frame rate of about 12 fps. 

So what now?


I had seen that Unity┬┤s new particle system used something interesting called vector fields, which controls the velocity of the particles. I had to implement it!

I found some vector fields in .fga format and wrote an importer to convert them into a 3D texture. A 3D texture was an easy and intuitive way to sample vectors from a 3D space. It worked well in action as seen in the picture to the right.


Nowadays most rendering engines use a format called VDB for effective storage and usage of volumetric data. It got me interested in adding support for the format as well. The library was downloaded from the website of OpenVDB and integrated into the engine. The process was easy since Nvidia had developed a header only version of the library called nanoVDB. The system now had support for both .fga and .vdb files.


I explored other applications where I could use my knowledge about vectorfields. Here I randomized the velocity of the particles to create interesting patterns. I did this by creating another 3D texture filled with data from an curlnoise algorithm. I sampled this new 3D texture to get an additional velocity to the particles for this curling effect. 


I had really fun with my project and felt I reached further than I had ever anticipated. I managed to make a fully GPU-based particle system and started learning about OpenVDB, which I will further deepen my knowledge in after this project is finished. I never got the chance to fully explore OpenVDB by launching it on a cuda kernel for volumetric clouds and wind simulations. Imagine if I could get a fire simulation from Houdini to work in the engine!

I am very impressed by the power of the GPU that can handle millions of calculations in constrast to the CPU. The power of even the most common GPU is formidable and is a testament to the ingenuity of the engineers who have pushed the hardware to such levels. It is easy to forget how much work there is to render a game. Having it visualized in the form of particles on the screen has given me a deeper appreciation for the engine we work on every day.