Measuring XML Structured-ness with Entropy

No Thumbnail Available
Date
2011-06-03
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
XML is semi-structured. It can be used to annotate unstructured data, to represent structured data and almost anything in-between. Yet, it is unclear how to formally characterize, yet to quantify, structuredness of XML. In this paper we propose and evaluate entropy-based metrics for XML structured-ness. The metrics measure the structural uniformity of path and subtrees, respectively. We empirically study the correlation of these metrics with real and synthetic data sets.
Description
Keywords
Citation