The convergence of highly parallel many-core graphics processors with conventional multi-core processors is becoming a reality. To allow algorithms and data structures to scale efficiently on these new platforms, several important factors needs to be considered.
(i) The algorithmic design needs to utilize the inherent parallelism of the problem at hand. Sorting, which is one of the classic computing components in computer science, has a high degree of inherent parallelism. In this thesis we present the first efficient design of Quicksort for graphics processors and show that it performs well in comparison with other available sorting methods.
(ii) The work needs to be distributed efficiently across the available processing units. We present an evaluation of a set of dynamic load balancing schemes for graphics processors, comparing blocking methods with non-blocking.
(iii) The required synchronization needs to be efficient, composable and easy to use. We present a methodology to easily compose the two most common operations provided by a data structure – the insertion and deletion of elements. By exploiting a common construction found in most non-blocking data structures, we created a move operation that can atomically move elements between different types of non-blocking data structures, without requiring a specific design for each coupling. We also present, to the best of our knowledge, the first application of software transactional memory to graphics processors. Two different STM designs, one blocking and one obstruction-free, were evaluated on the task of implementing different types of common concurrent data structures on a graphics processor.
Keywords: parallel, lock-free, graphics processors, multi-core, sorting, load balancing, composition, software transactional memory
|Institution||Chalmers University of Technology|