SupMR: Circumventing Disk and Memory Bandwidth Bottlenecks for Scale-up MapReduce

Appeared in IEEE International Parallel & Distributed Processing Symposium Workshops.

Abstract

Reading input from primary storage (i.e. the ingest phase) and aggregating results (i.e. the merge phase) are important pre- and post-processing steps in large batch computations. Unfortunately, today's data sets are so large that the ingest and merge job phases are now performance bottlenecks. In this paper, we mitigate the ingest and merge bottlenecks by leveraging the scale-up MapReduce model. We introduce an ingest chunk pipeline and a merge optimization that increases CPU utilization (50-100%) and job phase speedups (1.16× - 3.13×) for the ingest and merge phases. Our techniques are based on well-known algorithms and scale-out MapReduce optimizations, but applying them to a scale-up computation framework to mitigate the ingest and merge bottlenecks is novel.

Publication date:
May 2014

Authors:
Michael Sevilla
Ike Nassi
Kleoni Ioannidou
Scott A. Brandt
Carlos Maltzahn

Projects:

Available media

Full paper text: PDF

Bibtex entry

@inproceedings{sevilla-ipdps2014,
  author       = {Michael Sevilla and Ike Nassi and Kleoni Ioannidou and Scott A. Brandt and Carlos Maltzahn},
  title        = {{SupMR}: Circumventing Disk and Memory Bandwidth Bottlenecks for Scale-up {MapReduce}},
  booktitle    = {IEEE International Parallel & Distributed Processing Symposium Workshops},
  month        = may,
  year         = {2014},
}
Last modified 10 Apr 2024