Improved Deduplication through Parallel Binning
Appeared in Proceedings of the 31st IEEE International Performance, Computing and Communications Conference (IPCCC '12).
Abstract
Many modern storage systems use deduplication in order to compress data by avoiding storing the same data twice. Deduplication needs to use data stored in the past, but accessing information about all data stored can cause a severe bottleneck. Similarity based deduplication only accesses information on past data that is likely to be similar and thus more likely to yield good deduplication. We present an adaptive deduplication strategy that extends Extreme Binning and investigate theoretically and experimentally the effects of the additional bin accesses.
Publication date:
December 2012
        Authors:
        
            
                Zhike Zhang
            
        
            
                Deepavali Bhagwat
            
        
            
                Witold Litwin
            
        
            
                Darrell D. E. Long
            
        
            
                Thomas Schwarz
            
        
    
        Projects:
        
            Deduplication
        
            Deduplication Optimization
        
    
Available media
Full paper text: PDF
Bibtex entry
@inproceedings{zhang-ipccc12,
  author       = {Zhike Zhang and Deepavali Bhagwat and Witold Litwin and Darrell D. E. Long and Thomas Schwarz},
  title        = {Improved Deduplication through Parallel Binning},
  booktitle    = {Proceedings of the 31st IEEE International Performance, Computing and Communications Conference (IPCCC '12)},
  month        = dec,
  year         = {2012},
}
    
