Project Proposal

Sanjana Kuchibhotla, Aditri Gupta
15-418
March 26, 2025

Parallel Graph Coloring Algorithm

Sanjana Kuchibhotla, Aditri Gupta

Summary

We want to implement parallel graph coloring algorithms using both CPU (using OpenMP) and GPU (using CUDA) implementations. We want to explore the challenges in parallelizing graph coloring algorithms, while expanding upon this with techniques like speculative coloring and load balancing. We will first implement a sequential version of the algorithm, then parallelize it with OpenMP, and finally use CUDA to parallelize it, in order to show the differences and tradeoffs between these methods.

Background

Graph coloring is a classical NP-hard problem in which each vertex of a graph is assigned a "color" such that no two vertices that share an edge are the same color, while also minimizing the total number of colors used, This problem has numerous applications including scheduling for bandwidth allocation, assigning aircrafts to flights, register allocation for compiler optimization, and most importantly, solving sudoku puzzles.

The sequential algorithm for graph coloring typically follows a greedy approach where for each vertex v in the graph, you must examine all the neighbors of v and identify their colors and then assign the least used available color to v that is not used by any neighbor.

Parallelizing graph coloring presents challenges that are potentially interesting due to race conditions that could result from processing adjacent vertices concurrently. Several parallel algorithms for approaching this problem have been proposed including:

These algorithms employ different strategies such as independent set identification, randomization, and conflict resolution to enable parallelism while maintaining correctness. There will also be high synchronization overhead. Constant synchronization may be needed to maintain correctness which in turn limits scalability

The Challenge

There are many challenges in this problem, as the graph coloring problem is not necessarily trivially parallelizable. Conflict resolution especially is necessary, since if two vertices next to each other are updated in parallel and colored the same, there might need to be an extra step to ensure this does not happen. In addition, the structure of graphs is not easily balanced across threads and processors, and since the problem is not easily divided up, load imbalance can become an issue. There is also the possibility of exploring methods like speculative coloring, where vertices will be optimistically colored, and then conflicts will be fixed later.

With CUDA, there is possibility for divergent execution as well, since vertices in the graph have different degrees and high degree vertices have the potential to cause load imbalance. In addition, implementing synchronization across blocks is not as simple, since the problem isn't necessarily suited for this.

Resources

Reference papers and articles:

Hardware

We will use the GHC machines as well as the PSC machines to evaluate our code with higher thread counts. We will use the NVIDIA RTX 2080 which is installed on the gates machines for our CUDA implementation.

Goals and Deliverables

Platform Choice

We will be using both OpenMP and CUDA for our solution. OpenMP is a good choice, as it is meant to handle dynamic workloads using multicore CPUs. As the graph coloring problem has a lot of potential to have workload imbalance, OpenMP is a good tool to implement this solution, as it offers control over thread scheduling and also uses shared memory. The choice to use CUDA is to involve extra exploration in order to see how this problem would run on a GPU and what kinds of performance challenges we would run into. As the implementation of the algorithm using OpenMP is much different from the GPU implementation using CUDA, we want to explore the differences in each solution, and compare the tradeoffs between the approaches and the challenges they both have.

Schedule

Week Goals
Week 1 Implement baseline sequential solution for graph coloring algorithm.
Week 2 Start implementing initial OpenMP solution.
Week 3 Finish and optimize OpenMP implementation and start CUDA implementation.
Milestone (April 15) Working OpenMP solution, basic CUDA implementation.
Week 4 Finish and optimize CUDA solution.
Week 5 Compare OpenMP and CUDA solutions, run benchmark numbers, and get analysis and report done.