Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems
Published as Storage Systems Research Center Technical Report UCSC-SSRC-08-01.
Abstract
As storage systems reach the petabyte scale, it has become increasingly difficult for users and storage administrators to understand and manage their data. File metadata, such as inode and extended attributes are a valuable source of information that can aid in locating and identifying files, and can also facilitate administrative tasks, such as storage provisioning and recovery from backups. Unfortunately, most storage systems have no way to quickly and easily search file metadata at large scale.
To address these issues, we developed Spyglass, a indexing system that efficiently gathers, indexes and queries file metadata in large-scale storage systems. Our analysis of file metadata from real-world workloads showed that metadata has spatial locality in the storage namespace and that the distribution of metadata is highly skewed. Based on these findings, we designed Spyglass to use index partitioning and signature files to quickly prune the file search space. We also developed techniques to efficiently handle index versioning, facilitating both fast update and queries across historical indexes. Experiments on systems with up to 300 million files show that the Spyglass prototype is as much as several thousand times faster than current database solutions while requiring only a fraction of the space.
Publication date:
May 2008
        Authors:
        
            
                Andrew Leung
            
        
            
                Minglong Shao
            
        
            
                Timothy Bisson
            
        
            
                Shankar Pasupathy
            
        
            
                Ethan L. Miller
            
        
    
        Projects:
        
            Scalable File System Indexing
        
            Ultra-Large Scale Storage
        
    
Available media
Full paper text: PDF
Bibtex entry
@techreport{leung-ssrctr0801,
  author       = {Andrew Leung and Minglong Shao and Timothy Bisson and Shankar Pasupathy and Ethan L. Miller},
  title        = {Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems},
  institution  = {University of California, Santa Cruz},
  number       = {UCSC-SSRC-08-01},
  month        = may,
  year         = {2008},
}
    
