TJFast: Efficient Processing of XML Twig Pattern Matching

No Thumbnail Available
Date
2004-11-29T06:46:05Z
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Finding all the occurrences of a twig pattern in an XML database is a core operation for efficient evaluation of XML queries. Previous twig join algorithms were proposed as an I/O optimal solution when the twig pattern only involves ancestor-descendant relationships. In this paper, we propose a new efficient holistic twig join algorithm, namely TJFast, which is based on a variation of Dewey ID labeling scheme. In order to answer a twig query, TJFast only needs to access the labels of elements whose tags appear in the leaf nodes of query. TJFast guarantees the I/O optimality for queries with only ancestor-descendant edges connecting branching nodes and their child nodes. In other words, unlike previous algorithms, the optimality of TJFast allows parent-child edges to appear in some edges. Besides TJFast, we also present a novel streaming strategy, called tag+level streaming, where two elements appear in the same stream if and only if they have the same tag name and level number. We develop a twig join algorithm, called TJFast+ based on tag+level streaming. We show that TJFast+ can identify even larger query class to guarantee I/O optimality than TJFast. Therefore, in this paper, by proposing TJFast/TJFast, we significantly enlarge the optimal query class compared to previous algorithms. Finally, we report our experimental results to show that our algorithms are superior to previous approaches in terms of the number of elements scanned, the size of intermediate results and query performance.
Description
Keywords
Citation