Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup
Appeared in Proceedings of the 17th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2009).
Abstract
Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements; critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high locality and existing deduplication techniques require this locality to provide reasonable throughput. We present Extreme Binning: a scalable deduplication technique for backup requests made up of individual files and with no locality among consecutive files in a given window of time. Due to the lack of locality existing techniques perform poorly. Extreme Binning exploits file similarity instead of locality and makes only one disk access per file to maintain throughput. The backup system scales gracefully with the data; more backup nodes can be added very easily to boost throughput. In such a multi node backup system every file is allocated, using a stateless routing algorithm, to one node only allowing for maximum parallelization. Each backup node is autonomous with no dependency across nodes making data management tasks robust and low overhead.
Publication date:
September 2009
        Authors:
        
            
                Deepavali Bhagwat
            
        
            
                Kave Eshghi
            
        
            
                Darrell D. E. Long
            
        
            
                Mark Lillibridge
            
        
    
        Projects:
        
            Deduplication
        
    
Available media
Full paper text: PDF
Bibtex entry
@inproceedings{bhagwat-mascots09,
  author       = {Deepavali Bhagwat and Kave Eshghi and Darrell D. E. Long and Mark Lillibridge},
  title        = {Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup },
  booktitle    = {Proceedings of the 17th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2009)},
  month        = sep,
  year         = {2009},
}
    
