Examining Extended and Scientific Metadata for Scalable Index Designs

Published as Storage Systems Research Center Technical Report UCSC-SSRC-12-07.

Abstract

The sheer volume of modern data makes manual file management impractical. Search-oriented file systems, where data and metadata are indexed for fast search, are increasingly viewed as a necessity, everywhere from desktops to HPC. However, current techniques have been designed and tested for file system metadata, such as POSIX metadata, and fail to account for the wide variety of metadata users would like to search. In particular, the scientific world has been vocal about a desire to search extended and content metadata. While file system metadata is well characterized by a variety of workload studies, scientific metadata is much less well understood. We characterize scientific metadata, in order to better understand the implications for index design. We demonstrate that previously suggested index structures, such as k-d trees, R-trees, and row major databases, are not well suited to scientific metadata. Finally, we provide suggestions for a system design based on our findings.

Publication date:
December 2012

Authors:
Aleatha Parker-Wood
Brian Madden
Michael McThrow
Darrell D. E. Long

Projects:
Scalable File System Indexing
Dynamic Non-Hierarchical File Systems
Ultra-Large Scale Storage

Available media

Full paper text: PDF

Bibtex entry

@techreport{ssrctr-12-07,
  author       = {Aleatha Parker-Wood and Brian Madden and Michael McThrow and Darrell D. E. Long},
  title        = {Examining Extended and Scientific Metadata for Scalable Index Designs},
  institution  = {University of California, Santa Cruz},
  number       = {UCSC-SSRC-12-07},
  month        = dec,
  year         = {2012},
}
Last modified 24 May 2019