Technical report detail

Computational and Parallel Deep Learning Perform-ance Benchmarks for the Xeon Phi

by Tim Dettmers, Hanieh Soleimani

Deep learning is a recent predictive modeling approach which may yield near-human performance on a range of tasks. Deep learning models have gained popularity as they achieved state-of-the-art results in many tasks, such as language translation or object recognition, but are computationally intensive thus requiring computers with accelerators and weeks of computation time. Here we test computational and parallel performance of deep learning implementations on the Intel Xeon Phi accelerator. We find that the performance for deep learning algorithms where successive operations have different dimensions is poor. In particular performance of general matrix multiplication and random number generation lead to a reduction about 20% and 7% , respectively, compared to when successive operations are of the same size. While parallelization performance seems to be in line with GPUs, we conclude that the Xeon Phi is unsuitable for deep learning in its current state.

Technical report 2016/04, July 2016

BibTex entry

@techreport{16computational, author = {Tim Dettmers and Hanieh Soleimani}, title = {Computational and Parallel Deep Learning Perform-ance Benchmarks for the Xeon Phi}, institution = {University of Lugano}, number = {2016/04}, year = 2016, month = jul }