Efficient Processing of XML Pattern Matching: A String Matching-based Approach

No Thumbnail Available
Date
2004-02-01T00:00:00Z
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this paper, we propose a new approach of indexing XML documents and processing twig patterns in an XML database. Every XML document in the database is labeled with a variation of Dewey ID labeling scheme, namely Extended Dewey ID. The unique feature of this labeling scheme is that from the label of an element alone, we can use finite state transducers (FST) to derive the names of elements along the path from the root to this element. This feature enables us to directly reduce XML path pattern matching into string matching. We then develop a holistic twig join algorithm, called TwigComPath. The algorithm is quite different from the previous strategies in that it solves XML pattern matching problem by string matching and comparing instead of binary relationship matching and stitching. Furthermore, our algorithm only needs to visit the labels of elements that satisfy the leaf node predicates in a twig (or path) pattern; hence, it has performance advantages over the methods that need to visit the labels of all nodes in the pattern. Finally, we provide experimental results to demonstrate the perform-ance benefits of our proposed approaches.
Description
Keywords
Citation