Quantifying the Energy Efficiency of Object Recognition and Optical Flow

In this report, we analyze the computational and performance aspects of current state-

of-the-art object recognition and optical ow algorithms. First, we identify important al-

gorithms for object recognition and optical ow, then we perform a pattern decomposition

to identify key computations. We include proles of the runtime and energy eciency

(GFLOPS/W) for our implementation of these applications on a commercial architecture.

Finally, we include an analysis of memory-bandwidth boundedness for optical ow to iden-

tify opportunities for communication-avoiding algorithms.

Our results were measured on an Intel i7-4770K (Haswell) reference platform. A ve-

layer convolutional neural network used for object classication achieves 0.70 GFLOPS/W,

which is 21% of the theoretical compute bound for this Haswell processor. On the Horn-

Schunck, Lucas-Kanade, and Brox optical ow methods our implementations achieve 0.0338,

0.0103, and 0.0203 GFLOPS/W respectively. Our implementation achieves 7.9% of the

theoretical bandwidth bound, assuming no cross-iteration memory optimization, for Horn-

Schunk optical ow using the Jacobi solver, and 9.7% of the bandwidth bound for the

conjugate-gradient solver. To improve performance, we will focus rst on increasing band-

width utilization, then on doing cross-iteration memory optimizations such as blocking and

tiling the Jacobi solver and employing communication-avoiding linear solvers.

We also compare the runtime-accuracy tradeos for each optical ow method. We nd

that each method has distinct advantages over the other methods in terms of the runtime-

accuracy tradeo, so we will continue to develop and support all three methods in the

future.

Aspire Lab – UC Berkeley