Duplicate Data Elimination in a SAN File System
Appeared in Proceedings of the Twenty-first Symposium on Mass Storage Systems (MSS).
Abstract
Duplicate Data Elimination (DDE) is our method for identifying and coalescing identical data blocks in Storage Tank, a SAN file system. On-line file systems pose a unique set of performance and implementation challenges for this feature. Existing techniques, which are used to improve both storage and network utilization, do not satisfy these constraints. Our design employs a combination of content hashing, copy-on-write, and lazy updates to achieve its functional and performance goals. DDE executes primarily as a background process. The design also builds on Storage Tank’s FlashCopy function to ease implementation.1
We include an analysis of selected real-world data sets that is aimed at demonstrating the space-saving potential of coalescing duplicate data. Our results show that DDE can reduce storage consumption by up to 80% in some application environments. The analysis explores several additional features, such as the impact of varying file block size and the contribution of whole file duplication to the net savings.
Publication date:
April 2004
Authors:
Bo Hong
Demyn Plantenberg
Darrell D. E. Long
Miriam Sivan-Zimet
Projects:
Secure Networks
Available media
Full paper text: PDF
Bibtex entry
@inproceedings{MSST-Hong-2004, author = {Bo Hong and Demyn Plantenberg and Darrell D. E. Long and Miriam Sivan-Zimet}, title = {Duplicate Data Elimination in a {SAN} File System}, booktitle = {Proceedings of the Twenty-first Symposium on Mass Storage Systems (MSS)}, month = apr, year = {2004}, }