Browsing by Author "LING, Tok Wang"
Now showing 1 - 14 of 14
Results Per Page
Sort Options
- ItemAlgebra and the Formal Semantics of GLASS(2005-09-13) NI, Wei; LING, Tok WangIn database world, it is common to translate a query language into an algebra for the purpose of precisely defining the formal semantics of a query language and doing query optimization later. In this paper, we examine the scenario of graphical XML query languages, focus on their expressive power and present the underlying algebra of our graphical XML query language. Compared with various previous works on XML algebra, our algebra supports not only traditional select, project and join operators but also swap and SQL-like group operators. To achieve the exactness in query representation, we use ORA-SS (Object-Relationship-Attribute model for Semi-Structured data), a semantic rich data model for XML including the information such as Key constraints, Functional dependencies and Relationship types which are lacked in DTD. With examples, we show how our graphical language solves the difficult points in representation and how it is translated into our algebra. Based on the translation, we use the algebra to define the formal semantics of GLASS.
- ItemAnalyzing Temporal Keyword Queries for Interactive Search over Temporal DatabasesGAO, Qiao; LEE, Mong Li; LING, Tok Wang; DOBBIE, Gillian; ZENG, ZhongQuerying temporal relational databases is a challenge for non-expert database users, since it requires users to understand the semantics of the database and apply temporal joins as well as temporal conditions correctly in SQL statements. Traditional keyword search approaches are not directly applicable to temporal relational databases since they treat time-related keywords as tuple values and do not consider the temporal joins between relations, which leads to missing answers, incorrect answers and missing query interpretations. In this work, we extend keyword queries to allow the temporal predicates, and design a schema graph approach based on the Object-Relationship-Attribute (ORA) semantics. This approach enables us to identify temporal attributes of objects/relationships and infer the target temporal data of temporal predicates, thus improving the completeness and correctness of temporal keyword search and capturing the various possible interpretations of temporal keyword queries.
- ItemAutomatic XML Keyword Query Refinement(2009-06-23T02:36:06Z) BAO, Zhifeng; LU, Jiaheng; LING, Tok Wang; MENG, XiaofengExisting XML keyword search methods focus on how to find relevant and meaningful data fragments for a keyword query, assuming each keyword is intended as part of it. However, user's queries usually contain irrelevant or mismatched terms, spelling errors etc, which causes the search results to be either empty or meaningless. In this paper, we introduce the problem of automatic XML keyword query refinement, where automatic means the search engine should be able to adaptively decide whether a query Q needs to be refined during the processing of Q, and at the same time find a list of promising refined query candidates and their matching results over XML data, without any user interaction or a second try. In order to achieve this goal, we build a primary framework which consists of two core parts: (1) we build a novel query ranking model to evaluate the quality of a refined query RQ, which takes into account of the relevance of RQ w.r.t Q over XML data, the morphological/semantical similarity between Q and RQ, and the dependence of keywords of RQ in XML data. (2) We integrate the exploration of RQ candidates and the generation of their matching results as a single problem at the same time of processing Q, which is fulfilled within a one-time scan of related keyword inverted lists optimally. Finally, an extensive empirical study verifies the efficiency and effectiveness of our framework.
- ItemDiscovering Semantics from Data-Centric XML(2013-06-17) LI, Luochen; LE, Thuy Ngoc; WU, Huayu; LING, Tok Wang; BRESSAN, StephaneIn database applications in general, and in applications using XML in particular, the availability of a conceptual schema or of elements of semantics constitute invaluable leverage for improving the effectiveness, and sometimes the efficiency, of many tasks including query processing, keyword search and schema and data integration. The Object-Relationship-Attribute model for Semi-Structured data (ORA-SS) model is a conceptual model designed to capture the semantics of the important constructs in a variety of data models in general and in semi-structured data models in particular. It is specifically intended to capture the semantics of object classes, object identifiers, relationship types, object attributes and relationship attributes underlying XML schemas and data. For a given application, we refer to the set of instances of these semantic concepts as the ORA-semantics of the application. While ORA-SS can be used a priori for the design of new applications, we are interested, in this work, in the automatically discovering of the ORA-semantics from existing XML data and XML schemas. In this paper, we present a novel approach to automatically discover the ORA-semantics from XML schemas and XML data. The approach we proposed is based on a set of techniques and heuristics that identify the different semantic concepts. We empirically and comparatively evaluate the effectiveness of the approach. We show by experiments that the semantics discovered by our approach has more than 90% accuracy.
- ItemDiscovering Semantics from XML(2012-03-26T01:54:23Z) LI, Luochen; LE, Thuy Ngoc; LING, Tok Wang; WU, Huayu; BRESSAN, StephaneIn database applications in general, and in applications using XML in particular, the availability of a conceptual schema or of elements of semantics constitute invaluable leverage for improving the effectiveness, and sometimes the efficiency, of many tasks including query processing, keyword search and schema and data integration. The Object-Relationship-Attribute model for Semi-Structured data (ORA-SS) model is a conceptual model designed to capture the semantics of the important constructs in a variety of data models in general and in semi-structured data models in particular. It is specifically intended to capture the semantics of object classes, object identifiers, relationship types, object attributes and relationship attributes underlying XML schemas and data. For a given application, we refer to the set of instances of these semantic concepts as the ORA-semantics of the application. While ORA-SS can be used a priori for the design of new applications, we are interested, in this work, in the automatically discovering of the ORA-semantics from existing XML data and schemas. In this paper, we present a novel approach to automatically discover the ORA-semantics from XML schemas and XML data. The approach we proposed is based on a set of techniques and heuristics that identify the different semantic concepts. We empirically and comparatively evaluate the effectiveness of the approach.
- ItemEffective XML Keyword Search with Nearest Common Object Node Semantics(2012-09-20) LE, Thuy Ngoc; LING, Tok Wang; LIN, Chunbin; LU, JiahengLowest Common Ancestor (LCA) semantics and its extensions such as SLCA, MLCA, VLCA and ELCA. However, these approaches commonly do not return a complete answer set for a query because they can only find the common ancestors of a set of keywords but cannot find their common information appearing at their descendants in an XML document. In this paper, we introduce a new semantics, called Nearest Common Objects Node (NCON), which guarantees that both common ancestors and common descendants are included in the answer set for a query and therefore enables us to answer a query more completely. We also propose an NCON-based approach for XML keyword search, which exploits not only the index of the original XML document, but also the index of its reversed XML document, and devise optimization techniques to facilitate the process of finding NCONs. We have developed XComplete, a system for our NCON-based approach, which essentially uses the NCON semantics and post-processing techniques, altogether enable XComplete to return an answer set with completeness, meaningfulness, no irrelevance, no duplicate and comprehension to users. The results of our extensive experiments show that our proposed approach outperforms the existing LCA-based approaches in terms of both effectiveness and efficiency.
- ItemExpressing and Processing Path-centric XML Queries(2012-04-02T09:09:42Z) WU, Huayu; TANG, Ruiming; LING, Tok Wang; BRESSAN, StephaneThe support for navigation and browsing in XML query languages is based on XPath. XQuery, a W3C standardized XML query language, introduces FLOWR constructs on top of XPath, to express more complex query purposes. However, a family of very practical queries, which aim to return or manipulate paths as first class objects, cannot be expressed by XPath or XQuery FLOWR expressions. We call them path-centric queries. Of course, a user can program an XQuery user-defined recursive function to meet pathcentric queries, but the concern is the convenience and practicality of requiring normal database users to be equipped with programming skills. In this paper, we analyze the root cause of the limited expressivity of XPath and XQuery for path-centric queries. Based on the analysis, we propose seamless extension to XQuery FLOWR to elegantly express path-centric queries. We further investigate intra-path aggregation, an analytical operation in pathcentric queries, and show how they can be easily expressed and efficiently processed.
- ItemFrom region encoding to extended Dewey: on efficient processing of XML twig pattern matching(2005-07-04T09:27:43Z) LU, Jiaheng; LING, Tok Wang; CHAN, Chee-Yong; CHEN, TingFinding all the occurrences of a twig pattern in an XML database is a core operation for efficient evaluation of XML queries. A number of algorithms have been proposed to process a twig query based on region encoding labeling scheme. While region encoding supports efficient determination of ancestor-descendant (or parent-child) relationship between two elements, we observe that the information within a single label is very limited. In this paper, we propose a new labeling scheme, called extended Dewey. This is a powerful labeling scheme, since from the label of an element alone, we can derive all the elements names along the path from the root to the element. Based on extended Dewey, we design a novel holistic twig join algorithm, called TJFast. Unlike all previous algorithms based on region encoding, to answer a twig query, TJFast only needs to access the labels of the leaf query nodes. Through this, not only do we reduce disk access, but we also support the efficient evaluation of queries with wildcards in branching nodes, which is very difficult to be answered by algorithms based on region encoding. Finally, we report our experimental results to show that our algorithms are superior to previous approaches in terms of the number of elements scanned, the size of intermediate results and query performance.
- ItemICRA: Effective Semantics for Ranked XML Keyword Search(2007-05-30) CHEN, Bo; LU, Jiaheng; LING, Tok WangKeyword search is a user-friendly way to query XML databases. Most previous efforts in this area focus on keyword proximity search in XML based on either tree data model or graph (or digraph) data model. Tree data model cannot capture connections such as ID references in XML databases. In the contrast, techniques based on graph (or digraph) data model capture connections, but are generally inefficient to compute. In this paper, we propose interconnected object trees model for keyword search to achieve the efficiency of tree model and meanwhile to capture the connections such as ID references in XML by fully exploiting the property and schema information of XML databases. In particular, we propose ICA (Interested Common Ancestor) semantics to find all predefined interested objects that contain all query keywords. We also introduce novel IRA (Interested Related Ancestors) semantics to capture the conceptual connections between interested objects and include more objects that only contain some query keywords. Then, a novel ranking metric, RelevanceRank, is studied to dynamically assign higher ranks to objects that are more relevant to a given keyword query according to the conceptual connections in IRAs. We design and analyze efficient algorithms for keyword search based on our data model; and experiment results show our approach is efficient and outperforms most existing systems in terms of result quality. A prototype of our ICRA system (ICRA = ICA + IRA) on the updated 321M DBLP data is available at http://xmldb.ddns.comp.nus.edu.sg/.
- ItemObject Semantics for XML Keyword Search(2013-05-21T01:18:54Z) LE, Thuy Ngoc; LING, Tok Wang; JAGADISH, H. V.; LIN, Chunbin; LU, JiahengWe know that some XML elements correspond to objects (in the sense of object-orientation) and others do not. The question we consider in this paper is what benefits we can derive from paying attention to such object semantics, particularly for the problem of keyword queries. Keyword queries against XML data have been studied extensively in recent years, with several lowest-common-ancestor based schemes proposed for this purpose, including SLCA, MLCA, VLCA, and ELCA. It is easy to see that identifying objects can help each of these techniques return more meaningful answers than just the LCA node (or subtree). It is more interesting to see that object semantics can also be used to benefit the search itself. For this purpose, we introduce a novel nearest common object node semantics (NCON), which includes not just common ancestors but also common descendants and referenced objects in evaluating a query. We have developed XComplete, a system for our NCON-based approach, and used it in our extensive experimental evaluation. The experimental results show that our proposed approach outperforms the existing LCA-based approaches in terms of both effectiveness and efficiency.
- ItemObject-Oriented XML Keyword Search(2010-07-22T02:03:58Z) WU, Huayu; LING, Tok Wang; Bao, Zhifeng; XU, LiangKeyword search is a user-friendly way to query XML data. Existing LCA-based XML keyword search approaches assign a Dewey ID to every document node and find the relevant LCA nodes of the query keywords based on the Dewey IDs. Despite many improved LCA-based semantics proposed, these approaches are still not as effective as expected. We observe that object is an important concept for returning meaningful information in database queries. In this paper, we propose an object-oriented approach for XML keyword search. Our approach only assigns different Dewey IDs to object nodes in the document, and introduces relational tables to organize data values. By considering the semantic relationship among object, property and value during keyword query processing, our approach significantly improves the search efficiency and quality, compared to existing work. Furthermore, after finding any object as a return node, our approach only outputs the useful information about that object, instead of the whole subtree rooted at the object node as in many other approaches. Finally we design an algorithm to rank the possible interpretations of an ambiguous query, and the search results are returned separately based on different query interpretations. The efficiency and effectiveness of our approach are demonstrated with a comprehensive experimental study.
- ItemProblems of LCA and Impact of ORA-semantics in XML Keyword Search(2012-03-26T02:13:57Z) LE, Thuy Ngoc; WU, Huayu; LING, Tok Wang; LI, LuochenMost keyword search approaches for data-centric XML documents are based on the computation of Lowest Common Ancestors (LCA). However, LCA-based search methods depend much on hierarchical structures of XML data. Therefore it may not be able to find desired answers for many keyword queries since a relationship among objects in XML data can be represented in different hierarchical structures. In this paper, we first point out serious problems of the LCA-based approach, due to its unawareness of semantics of object, relationship and attribute, referred to as ORA-semantics. Through detailed analysis of these problems, we show the impact of ORA-semantics in XML keyword search. We then propose an ORA-semantics based approach with rules to infer expected answers for XML keyword queries. Experimental results show that our ORA-semantics based approach can resolve the problems of the LCA-based approach, and thus can be a promising research direction for XML keyword search.
- ItemSchema-independence in XML Keyword Search(2014-06-24) LE, Thuy Ngoc; BAO, Zhifeng; LING, Tok WangXML keyword search has attracted a lot of interests with typical search based on lowest common ancestor (LCA). However, in this paper, we show that meaningful answers can be found beyond LCA and should be independent from schema designs of the same data content. Therefore, we propose a new semantics, called CR (Common Relative), which not only can find more answers beyond LCA, but the returned answers are independent from schema designs as well. To find answers based on the CR semantics, we propose an approach, in which we have new strategies for indexing and processing. Experimental results show that the CR semantics can improve the recall significantly and the answer set is independent from the schema designs.
- ItemTJFast: Efficient Processing of XML Twig Pattern Matching(2004-11-29T06:46:05Z) LU, Jiaheng; CHEN, Ting; LING, Tok WangFinding all the occurrences of a twig pattern in an XML database is a core operation for efficient evaluation of XML queries. Previous twig join algorithms were proposed as an I/O optimal solution when the twig pattern only involves ancestor-descendant relationships. In this paper, we propose a new efficient holistic twig join algorithm, namely TJFast, which is based on a variation of Dewey ID labeling scheme. In order to answer a twig query, TJFast only needs to access the labels of elements whose tags appear in the leaf nodes of query. TJFast guarantees the I/O optimality for queries with only ancestor-descendant edges connecting branching nodes and their child nodes. In other words, unlike previous algorithms, the optimality of TJFast allows parent-child edges to appear in some edges. Besides TJFast, we also present a novel streaming strategy, called tag+level streaming, where two elements appear in the same stream if and only if they have the same tag name and level number. We develop a twig join algorithm, called TJFast+ based on tag+level streaming. We show that TJFast+ can identify even larger query class to guarantee I/O optimality than TJFast. Therefore, in this paper, by proposing TJFast/TJFast, we significantly enlarge the optimal query class compared to previous algorithms. Finally, we report our experimental results to show that our algorithms are superior to previous approaches in terms of the number of elements scanned, the size of intermediate results and query performance.