Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server

Abstract—Graph processing is an increasingly important application domain and is typically communication-bound. In this work, we analyze the performance characteristics of three high performance graph algorithm codebases using hardware performance counters on a conventional dual-socket server. Unlike many other communication-bound workloads, graph algorithms struggle to fully utilize the platform’s memory bandwidth and so increasing memory bandwidth utilization could be just as effective as decreasing communication. Based on our observations of simultaneous low compute and bandwidth utilization, we find there is substantial room for a different processor architecture to improve performance without requiring a new memory system.