A Hybrid Approach for Efficient Provenance Storage
Appeared in The 21st ACM Conference on Information and Knowledge Management (CIKM).
Abstract
Efficient provenance storage is an essential step towards the adoption of provenance. In this paper, we analyze the provenance collected from multiple workloads with a view towards efficient storage. Based on our analysis, we characterize the properties of provenance with respect to long term storage. We then propose a hybrid scheme that takes advantage of the graph structure of provenance data and the inherent duplication in provenance data. Our evaluation indicates that our hybrid scheme, a combination of web graph compression (adapted for provenance) and dictionary encoding, provides the best tradeoff in terms of compression ratio, compression time and query performance when compared to other compression schemes.
Publication date:
October 2012
        Authors:
        
            
                Yulai Xie
            
        
            
                Kiran-Kumar Muniswamy-Reddy
            
        
            
                Dan Feng
            
        
            
                Yan Li
            
        
            
                Darrell D. E. Long
            
        
            
                Zhipeng Tan
            
        
            
                Lei Chen
            
        
    
        Projects:
        
            Scalable File System Indexing
        
            Dynamic Non-Hierarchical File Systems
        
            Ultra-Large Scale Storage
        
    
Available media
Full paper text: PDF
Bibtex entry
@inproceedings{xie-cikm12,
  author       = {Yulai Xie and Kiran-Kumar Muniswamy-Reddy and Dan Feng and Yan Li and Darrell D. E. Long and Zhipeng Tan and Lei Chen},
  title        = {A Hybrid Approach for Efficient Provenance Storage},
  booktitle    = {The 21st ACM Conference on Information and Knowledge Management (CIKM)},
  month        = oct,
  year         = {2012},
}
    
