Examining Extended and Scientific Metadata for Scalable Index Designs
Published as Storage Systems Research Center Technical Report UCSC-SSRC-12-07.
Abstract
The sheer volume of modern data makes manual file management impractical. Search-oriented file systems, where data and metadata are indexed for fast search, are increasingly viewed as a necessity, everywhere from desktops to HPC. However, current techniques have been designed and tested for file system metadata, such as POSIX metadata, and fail to account for the wide variety of metadata users would like to search. In particular, the scientific world has been vocal about a desire to search extended and content metadata. While file system metadata is well characterized by a variety of workload studies, scientific metadata is much less well understood. We characterize scientific metadata, in order to better understand the implications for index design. We demonstrate that previously suggested index structures, such as k-d trees, R-trees, and row major databases, are not well suited to scientific metadata. Finally, we provide suggestions for a system design based on our findings.
Publication date:
December 2012
Authors:
Aleatha Parker-Wood
Brian Madden
Michael McThrow
Darrell D. E. Long
Projects:
Scalable File System Indexing
Dynamic Non-Hierarchical File Systems
Ultra-Large Scale Storage
Available media
Full paper text: PDF
Bibtex entry
@techreport{ssrctr-12-07, author = {Aleatha Parker-Wood and Brian Madden and Michael McThrow and Darrell D. E. Long}, title = {Examining Extended and Scientific Metadata for Scalable Index Designs}, institution = {University of California, Santa Cruz}, number = {UCSC-SSRC-12-07}, month = dec, year = {2012}, }