Particle-In-Cell (PIC) method is widely used for plasma simulation and the GPUs appear to be the most efficient way to run this method. In this work we propose a technique that enables one to speed up one of the most time-consuming operations in the GPU implementation of the PIC method. The operation is particle reordering, or redistribution of particles between cells, which is performed after pushing. The reordering operation provides data locality which is the key performance issue of the PIC method. We propose to divide the reordering into two stages. First, gather the particles that are going to leave a particular cell into arrays, the number of arrays being equal to the number of neighbor cells (26 for 3D case). Second, each neighbor cell copies the particles from the necessary array to its own particle array. The second operation is done in 26 threads independently with no synchronization or waiting and involves no critical sections, semaphores, mutexes, atomic operations etc. It results in the more than 10 times reduction of the reordering time compared to the straightforward reordering algorithm.
|Number of pages||8|
|Journal||Vestnik Udmurtskogo Universiteta: Matematika, Mekhanika, Komp'yuternye Nauki|
|Publication status||Published - 1 Jan 2018|