Browsing by Author "LE, Thuy Ngoc"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
- ItemDiscovering Semantics from Data-Centric XML(2013-06-17) LI, Luochen; LE, Thuy Ngoc; WU, Huayu; LING, Tok Wang; BRESSAN, StephaneIn database applications in general, and in applications using XML in particular, the availability of a conceptual schema or of elements of semantics constitute invaluable leverage for improving the effectiveness, and sometimes the efficiency, of many tasks including query processing, keyword search and schema and data integration. The Object-Relationship-Attribute model for Semi-Structured data (ORA-SS) model is a conceptual model designed to capture the semantics of the important constructs in a variety of data models in general and in semi-structured data models in particular. It is specifically intended to capture the semantics of object classes, object identifiers, relationship types, object attributes and relationship attributes underlying XML schemas and data. For a given application, we refer to the set of instances of these semantic concepts as the ORA-semantics of the application. While ORA-SS can be used a priori for the design of new applications, we are interested, in this work, in the automatically discovering of the ORA-semantics from existing XML data and XML schemas. In this paper, we present a novel approach to automatically discover the ORA-semantics from XML schemas and XML data. The approach we proposed is based on a set of techniques and heuristics that identify the different semantic concepts. We empirically and comparatively evaluate the effectiveness of the approach. We show by experiments that the semantics discovered by our approach has more than 90% accuracy.
- ItemDiscovering Semantics from XML(2012-03-26T01:54:23Z) LI, Luochen; LE, Thuy Ngoc; LING, Tok Wang; WU, Huayu; BRESSAN, StephaneIn database applications in general, and in applications using XML in particular, the availability of a conceptual schema or of elements of semantics constitute invaluable leverage for improving the effectiveness, and sometimes the efficiency, of many tasks including query processing, keyword search and schema and data integration. The Object-Relationship-Attribute model for Semi-Structured data (ORA-SS) model is a conceptual model designed to capture the semantics of the important constructs in a variety of data models in general and in semi-structured data models in particular. It is specifically intended to capture the semantics of object classes, object identifiers, relationship types, object attributes and relationship attributes underlying XML schemas and data. For a given application, we refer to the set of instances of these semantic concepts as the ORA-semantics of the application. While ORA-SS can be used a priori for the design of new applications, we are interested, in this work, in the automatically discovering of the ORA-semantics from existing XML data and schemas. In this paper, we present a novel approach to automatically discover the ORA-semantics from XML schemas and XML data. The approach we proposed is based on a set of techniques and heuristics that identify the different semantic concepts. We empirically and comparatively evaluate the effectiveness of the approach.
- ItemEffective XML Keyword Search with Nearest Common Object Node Semantics(2012-09-20) LE, Thuy Ngoc; LING, Tok Wang; LIN, Chunbin; LU, JiahengLowest Common Ancestor (LCA) semantics and its extensions such as SLCA, MLCA, VLCA and ELCA. However, these approaches commonly do not return a complete answer set for a query because they can only find the common ancestors of a set of keywords but cannot find their common information appearing at their descendants in an XML document. In this paper, we introduce a new semantics, called Nearest Common Objects Node (NCON), which guarantees that both common ancestors and common descendants are included in the answer set for a query and therefore enables us to answer a query more completely. We also propose an NCON-based approach for XML keyword search, which exploits not only the index of the original XML document, but also the index of its reversed XML document, and devise optimization techniques to facilitate the process of finding NCONs. We have developed XComplete, a system for our NCON-based approach, which essentially uses the NCON semantics and post-processing techniques, altogether enable XComplete to return an answer set with completeness, meaningfulness, no irrelevance, no duplicate and comprehension to users. The results of our extensive experiments show that our proposed approach outperforms the existing LCA-based approaches in terms of both effectiveness and efficiency.
- ItemFrom Revisiting the LCA-based Approach to a New Semantics-based Approach for XML Keyword Search(2011-05-30) LE, Thuy Ngoc; WU, Huayu; LING, Tok wang; LI, LuochenMost keyword search approaches for data-centric XML documents are based on the computation of Lowest Common Ancestors (LCA), such as SLCA and MLCA. In this paper, we show that the LCA is not always a correct search model for processing keyword queries over general XML data. In particular, when an XML database contains relationships among objects, which is quite common in practical data, LCA-based search may not be able to find desired answers for many keyword queries. We propose to use semantics instead of the structure of XML data to perform keyword search, and show that the semantics-based search can solve the problems of the LCA-based approach. To the best of our knowledge, this is the first work to point out serious problems of the LCA-based XML keyword search approach, and propose an approach to perform XML keyword search based on semantics rather than the hierarchical structure of XML data to address those problems.
- ItemObject Semantics for XML Keyword Search(2013-05-21T01:18:54Z) LE, Thuy Ngoc; LING, Tok Wang; JAGADISH, H. V.; LIN, Chunbin; LU, JiahengWe know that some XML elements correspond to objects (in the sense of object-orientation) and others do not. The question we consider in this paper is what benefits we can derive from paying attention to such object semantics, particularly for the problem of keyword queries. Keyword queries against XML data have been studied extensively in recent years, with several lowest-common-ancestor based schemes proposed for this purpose, including SLCA, MLCA, VLCA, and ELCA. It is easy to see that identifying objects can help each of these techniques return more meaningful answers than just the LCA node (or subtree). It is more interesting to see that object semantics can also be used to benefit the search itself. For this purpose, we introduce a novel nearest common object node semantics (NCON), which includes not just common ancestors but also common descendants and referenced objects in evaluating a query. We have developed XComplete, a system for our NCON-based approach, and used it in our extensive experimental evaluation. The experimental results show that our proposed approach outperforms the existing LCA-based approaches in terms of both effectiveness and efficiency.
- ItemProblems of LCA and Impact of ORA-semantics in XML Keyword Search(2012-03-26T02:13:57Z) LE, Thuy Ngoc; WU, Huayu; LING, Tok Wang; LI, LuochenMost keyword search approaches for data-centric XML documents are based on the computation of Lowest Common Ancestors (LCA). However, LCA-based search methods depend much on hierarchical structures of XML data. Therefore it may not be able to find desired answers for many keyword queries since a relationship among objects in XML data can be represented in different hierarchical structures. In this paper, we first point out serious problems of the LCA-based approach, due to its unawareness of semantics of object, relationship and attribute, referred to as ORA-semantics. Through detailed analysis of these problems, we show the impact of ORA-semantics in XML keyword search. We then propose an ORA-semantics based approach with rules to infer expected answers for XML keyword queries. Experimental results show that our ORA-semantics based approach can resolve the problems of the LCA-based approach, and thus can be a promising research direction for XML keyword search.
- ItemSchema-independence in XML Keyword Search(2014-06-24) LE, Thuy Ngoc; BAO, Zhifeng; LING, Tok WangXML keyword search has attracted a lot of interests with typical search based on lowest common ancestor (LCA). However, in this paper, we show that meaningful answers can be found beyond LCA and should be independent from schema designs of the same data content. Therefore, we propose a new semantics, called CR (Common Relative), which not only can find more answers beyond LCA, but the returned answers are independent from schema designs as well. To find answers based on the CR semantics, we propose an approach, in which we have new strategies for indexing and processing. Experimental results show that the CR semantics can improve the recall significantly and the answer set is independent from the schema designs.