Browsing by Author "LU, Jiaheng"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
- ItemAutomatic XML Keyword Query Refinement(2009-06-23T02:36:06Z) BAO, Zhifeng; LU, Jiaheng; LING, Tok Wang; MENG, XiaofengExisting XML keyword search methods focus on how to find relevant and meaningful data fragments for a keyword query, assuming each keyword is intended as part of it. However, user's queries usually contain irrelevant or mismatched terms, spelling errors etc, which causes the search results to be either empty or meaningless. In this paper, we introduce the problem of automatic XML keyword query refinement, where automatic means the search engine should be able to adaptively decide whether a query Q needs to be refined during the processing of Q, and at the same time find a list of promising refined query candidates and their matching results over XML data, without any user interaction or a second try. In order to achieve this goal, we build a primary framework which consists of two core parts: (1) we build a novel query ranking model to evaluate the quality of a refined query RQ, which takes into account of the relevance of RQ w.r.t Q over XML data, the morphological/semantical similarity between Q and RQ, and the dependence of keywords of RQ in XML data. (2) We integrate the exploration of RQ candidates and the generation of their matching results as a single problem at the same time of processing Q, which is fulfilled within a one-time scan of related keyword inverted lists optimally. Finally, an extensive empirical study verifies the efficiency and effectiveness of our framework.
- ItemEffective XML Keyword Search with Nearest Common Object Node Semantics(2012-09-20) LE, Thuy Ngoc; LING, Tok Wang; LIN, Chunbin; LU, JiahengLowest Common Ancestor (LCA) semantics and its extensions such as SLCA, MLCA, VLCA and ELCA. However, these approaches commonly do not return a complete answer set for a query because they can only find the common ancestors of a set of keywords but cannot find their common information appearing at their descendants in an XML document. In this paper, we introduce a new semantics, called Nearest Common Objects Node (NCON), which guarantees that both common ancestors and common descendants are included in the answer set for a query and therefore enables us to answer a query more completely. We also propose an NCON-based approach for XML keyword search, which exploits not only the index of the original XML document, but also the index of its reversed XML document, and devise optimization techniques to facilitate the process of finding NCONs. We have developed XComplete, a system for our NCON-based approach, which essentially uses the NCON semantics and post-processing techniques, altogether enable XComplete to return an answer set with completeness, meaningfulness, no irrelevance, no duplicate and comprehension to users. The results of our extensive experiments show that our proposed approach outperforms the existing LCA-based approaches in terms of both effectiveness and efficiency.
- ItemFrom region encoding to extended Dewey: on efficient processing of XML twig pattern matching(2005-07-04T09:27:43Z) LU, Jiaheng; LING, Tok Wang; CHAN, Chee-Yong; CHEN, TingFinding all the occurrences of a twig pattern in an XML database is a core operation for efficient evaluation of XML queries. A number of algorithms have been proposed to process a twig query based on region encoding labeling scheme. While region encoding supports efficient determination of ancestor-descendant (or parent-child) relationship between two elements, we observe that the information within a single label is very limited. In this paper, we propose a new labeling scheme, called extended Dewey. This is a powerful labeling scheme, since from the label of an element alone, we can derive all the elements names along the path from the root to the element. Based on extended Dewey, we design a novel holistic twig join algorithm, called TJFast. Unlike all previous algorithms based on region encoding, to answer a twig query, TJFast only needs to access the labels of the leaf query nodes. Through this, not only do we reduce disk access, but we also support the efficient evaluation of queries with wildcards in branching nodes, which is very difficult to be answered by algorithms based on region encoding. Finally, we report our experimental results to show that our algorithms are superior to previous approaches in terms of the number of elements scanned, the size of intermediate results and query performance.
- ItemICRA: Effective Semantics for Ranked XML Keyword Search(2007-05-30) CHEN, Bo; LU, Jiaheng; LING, Tok WangKeyword search is a user-friendly way to query XML databases. Most previous efforts in this area focus on keyword proximity search in XML based on either tree data model or graph (or digraph) data model. Tree data model cannot capture connections such as ID references in XML databases. In the contrast, techniques based on graph (or digraph) data model capture connections, but are generally inefficient to compute. In this paper, we propose interconnected object trees model for keyword search to achieve the efficiency of tree model and meanwhile to capture the connections such as ID references in XML by fully exploiting the property and schema information of XML databases. In particular, we propose ICA (Interested Common Ancestor) semantics to find all predefined interested objects that contain all query keywords. We also introduce novel IRA (Interested Related Ancestors) semantics to capture the conceptual connections between interested objects and include more objects that only contain some query keywords. Then, a novel ranking metric, RelevanceRank, is studied to dynamically assign higher ranks to objects that are more relevant to a given keyword query according to the conceptual connections in IRAs. We design and analyze efficient algorithms for keyword search based on our data model; and experiment results show our approach is efficient and outperforms most existing systems in terms of result quality. A prototype of our ICRA system (ICRA = ICA + IRA) on the updated 321M DBLP data is available at http://xmldb.ddns.comp.nus.edu.sg/.
- ItemObject Semantics for XML Keyword Search(2013-05-21T01:18:54Z) LE, Thuy Ngoc; LING, Tok Wang; JAGADISH, H. V.; LIN, Chunbin; LU, JiahengWe know that some XML elements correspond to objects (in the sense of object-orientation) and others do not. The question we consider in this paper is what benefits we can derive from paying attention to such object semantics, particularly for the problem of keyword queries. Keyword queries against XML data have been studied extensively in recent years, with several lowest-common-ancestor based schemes proposed for this purpose, including SLCA, MLCA, VLCA, and ELCA. It is easy to see that identifying objects can help each of these techniques return more meaningful answers than just the LCA node (or subtree). It is more interesting to see that object semantics can also be used to benefit the search itself. For this purpose, we introduce a novel nearest common object node semantics (NCON), which includes not just common ancestors but also common descendants and referenced objects in evaluating a query. We have developed XComplete, a system for our NCON-based approach, and used it in our extensive experimental evaluation. The experimental results show that our proposed approach outperforms the existing LCA-based approaches in terms of both effectiveness and efficiency.
- ItemTJFast: Efficient Processing of XML Twig Pattern Matching(2004-11-29T06:46:05Z) LU, Jiaheng; CHEN, Ting; LING, Tok WangFinding all the occurrences of a twig pattern in an XML database is a core operation for efficient evaluation of XML queries. Previous twig join algorithms were proposed as an I/O optimal solution when the twig pattern only involves ancestor-descendant relationships. In this paper, we propose a new efficient holistic twig join algorithm, namely TJFast, which is based on a variation of Dewey ID labeling scheme. In order to answer a twig query, TJFast only needs to access the labels of elements whose tags appear in the leaf nodes of query. TJFast guarantees the I/O optimality for queries with only ancestor-descendant edges connecting branching nodes and their child nodes. In other words, unlike previous algorithms, the optimality of TJFast allows parent-child edges to appear in some edges. Besides TJFast, we also present a novel streaming strategy, called tag+level streaming, where two elements appear in the same stream if and only if they have the same tag name and level number. We develop a twig join algorithm, called TJFast+ based on tag+level streaming. We show that TJFast+ can identify even larger query class to guarantee I/O optimality than TJFast. Therefore, in this paper, by proposing TJFast/TJFast, we significantly enlarge the optimal query class compared to previous algorithms. Finally, we report our experimental results to show that our algorithms are superior to previous approaches in terms of the number of elements scanned, the size of intermediate results and query performance.