Browsing by Author "WU, Huayu"
Now showing 1 - 11 of 11
Results Per Page
Sort Options
- ItemConditioning Probabilistic Relational Data with Referential Constraints(2014-03-12T08:46:48Z) TANG, Ruiming; SHAO, Dongxu; BA, M. Lamine; WU, HuayuA probabilistic relational database is a compact form of a set of deterministic relational databases (namely, possible worlds), each of which has a probability. In our framework, the existence of tuples is determined by associated Boolean formulae based on elementary events. An estimation, within such a setting, of the probabilities of possible worlds uses a prior probability distribution speci ed over the elementary events. Direct observations and general knowledge, in the form of constraints, help re ning these probabilities, possibly ruling out some possible worlds. More precisely, new constraints can translate the observation of the existence or non-existence of a tuple, the knowledge of a well-de ned rule, such as primary key constraint, foreign key constraint, referential constraint, etc. Informally, the process of enforcing knowledge on a probabilistic database, which consists of computing a new subset of valid possible worlds together with their new (conditional) probabilities, is called conditioning. In this paper, we are interested in nding a new probabilistic relational database after conditioning with referential constraints involved. In the most general case, conditioning is intractable. As a result, we restricted our study to probabilistic relational databases in which formulae of tuples are independent events in order to achieve some tractability results. We devise and present polynomial algorithms for conditioning probabilistic relational databases with referential constraints.
- ItemDiscovering Semantics from Data-Centric XML(2013-06-17) LI, Luochen; LE, Thuy Ngoc; WU, Huayu; LING, Tok Wang; BRESSAN, StephaneIn database applications in general, and in applications using XML in particular, the availability of a conceptual schema or of elements of semantics constitute invaluable leverage for improving the effectiveness, and sometimes the efficiency, of many tasks including query processing, keyword search and schema and data integration. The Object-Relationship-Attribute model for Semi-Structured data (ORA-SS) model is a conceptual model designed to capture the semantics of the important constructs in a variety of data models in general and in semi-structured data models in particular. It is specifically intended to capture the semantics of object classes, object identifiers, relationship types, object attributes and relationship attributes underlying XML schemas and data. For a given application, we refer to the set of instances of these semantic concepts as the ORA-semantics of the application. While ORA-SS can be used a priori for the design of new applications, we are interested, in this work, in the automatically discovering of the ORA-semantics from existing XML data and XML schemas. In this paper, we present a novel approach to automatically discover the ORA-semantics from XML schemas and XML data. The approach we proposed is based on a set of techniques and heuristics that identify the different semantic concepts. We empirically and comparatively evaluate the effectiveness of the approach. We show by experiments that the semantics discovered by our approach has more than 90% accuracy.
- ItemDiscovering Semantics from XML(2012-03-26T01:54:23Z) LI, Luochen; LE, Thuy Ngoc; LING, Tok Wang; WU, Huayu; BRESSAN, StephaneIn database applications in general, and in applications using XML in particular, the availability of a conceptual schema or of elements of semantics constitute invaluable leverage for improving the effectiveness, and sometimes the efficiency, of many tasks including query processing, keyword search and schema and data integration. The Object-Relationship-Attribute model for Semi-Structured data (ORA-SS) model is a conceptual model designed to capture the semantics of the important constructs in a variety of data models in general and in semi-structured data models in particular. It is specifically intended to capture the semantics of object classes, object identifiers, relationship types, object attributes and relationship attributes underlying XML schemas and data. For a given application, we refer to the set of instances of these semantic concepts as the ORA-semantics of the application. While ORA-SS can be used a priori for the design of new applications, we are interested, in this work, in the automatically discovering of the ORA-semantics from existing XML data and schemas. In this paper, we present a novel approach to automatically discover the ORA-semantics from XML schemas and XML data. The approach we proposed is based on a set of techniques and heuristics that identify the different semantic concepts. We empirically and comparatively evaluate the effectiveness of the approach.
- ItemEdit Distance between XML and Probabilistic XML Documents(2011-06-03) TANG, Ruiming; WU, Huayu; NOBARI, Sadegh; BRESSAN, StephaneWe propose an efficient algorithm for computing of the edit distance between an XML document and a probabilistic XML document. Probabilistic XML is a hierarchical data model capturing uncertainty of both value and structure. It is suitable to many modern applications such as information extraction, scientific data management and data integration. The computation of similarity is an essential building block for the comparison, alignment, clustering and classification of data in these applications. Several algorithms exist for measuring the structural similarity between XML documents among themselves or XML documents and XML document type definitions and schemas. The new challenge in efficiently computing the similarity between an XML document and a probabilistic XML document is the multiplicity of the possible worlds that a probabilistic XML document represents. In this paper, we devise and discuss algorithms for computing the similarity between an XML document and a probabilistic XML document. We empirically and comparatively evaluate their performance. In the absence of established corpora and benchmarks for probabilistic XML, we also propose and use random probabilistic XML models together with the associated random generation algorithms.
- ItemExpressing and Processing Path-centric XML Queries(2012-04-02T09:09:42Z) WU, Huayu; TANG, Ruiming; LING, Tok Wang; BRESSAN, StephaneThe support for navigation and browsing in XML query languages is based on XPath. XQuery, a W3C standardized XML query language, introduces FLOWR constructs on top of XPath, to express more complex query purposes. However, a family of very practical queries, which aim to return or manipulate paths as first class objects, cannot be expressed by XPath or XQuery FLOWR expressions. We call them path-centric queries. Of course, a user can program an XQuery user-defined recursive function to meet pathcentric queries, but the concern is the convenience and practicality of requiring normal database users to be equipped with programming skills. In this paper, we analyze the root cause of the limited expressivity of XPath and XQuery for path-centric queries. Based on the analysis, we propose seamless extension to XQuery FLOWR to elegantly express path-centric queries. We further investigate intra-path aggregation, an analytical operation in pathcentric queries, and show how they can be easily expressed and efficiently processed.
- ItemA Framework for Conditioning Uncertain(2012-01-25) TANG, Ruiming; CHENG, Reynold; WU, Huayu; BRESSAN, StephaneWe propose a framework for representing conditioned probabilistic relational data. In this framework the existence of tuples in possible worlds is determined by Boolean expressions composed from elementary events. The probability of a possible world is computed from the probabilities associated with these elementary events. In addition, a set of global constraints conditions the database. Conditioning is the formalization of the process of adding knowledge to a database. Some worlds may be impossible given the constraints and the probabilities of possible worlds are accordingly re-defined. The new constraints can come from the observation of the existence or non-existence of a tuple, from the knowledge of a specific rule, such as the existence of an exclusive set of tuples, or from the knowledge of a general rule, such as a functional dependency. We are therefore interested in computing a concise representation of the possible worlds and their respective probabilities after the addition of new constraints, namely an equivalent probabilistic database instance without constraints after conditioning. We devise and present a general algorithm for this computation. Unfortunately, the general problem involves the simplification of general Boolean expressions and is NP-hard. We therefore identify specific practical families of constraints for which we devise and present efficient algorithms.
- ItemFrom Revisiting the LCA-based Approach to a New Semantics-based Approach for XML Keyword Search(2011-05-30) LE, Thuy Ngoc; WU, Huayu; LING, Tok wang; LI, LuochenMost keyword search approaches for data-centric XML documents are based on the computation of Lowest Common Ancestors (LCA), such as SLCA and MLCA. In this paper, we show that the LCA is not always a correct search model for processing keyword queries over general XML data. In particular, when an XML database contains relationships among objects, which is quite common in practical data, LCA-based search may not be able to find desired answers for many keyword queries. We propose to use semantics instead of the structure of XML data to perform keyword search, and show that the semantics-based search can solve the problems of the LCA-based approach. To the best of our knowledge, this is the first work to point out serious problems of the LCA-based XML keyword search approach, and propose an approach to perform XML keyword search based on semantics rather than the hierarchical structure of XML data to address those problems.
- ItemMeasuring XML Structured-ness with Entropy(2011-06-03) TANG, Ruiming; WU, Huayu; BRESSAN, StephaneXML is semi-structured. It can be used to annotate unstructured data, to represent structured data and almost anything in-between. Yet, it is unclear how to formally characterize, yet to quantify, structuredness of XML. In this paper we propose and evaluate entropy-based metrics for XML structured-ness. The metrics measure the structural uniformity of path and subtrees, respectively. We empirically study the correlation of these metrics with real and synthetic data sets.
- ItemObject-Oriented XML Keyword Search(2010-07-22T02:03:58Z) WU, Huayu; LING, Tok Wang; Bao, Zhifeng; XU, LiangKeyword search is a user-friendly way to query XML data. Existing LCA-based XML keyword search approaches assign a Dewey ID to every document node and find the relevant LCA nodes of the query keywords based on the Dewey IDs. Despite many improved LCA-based semantics proposed, these approaches are still not as effective as expected. We observe that object is an important concept for returning meaningful information in database queries. In this paper, we propose an object-oriented approach for XML keyword search. Our approach only assigns different Dewey IDs to object nodes in the document, and introduces relational tables to organize data values. By considering the semantic relationship among object, property and value during keyword query processing, our approach significantly improves the search efficiency and quality, compared to existing work. Furthermore, after finding any object as a return node, our approach only outputs the useful information about that object, instead of the whole subtree rooted at the object node as in many other approaches. Finally we design an algorithm to rank the possible interpretations of an ambiguous query, and the search results are returned separately based on different query interpretations. The efficiency and effectiveness of our approach are demonstrated with a comprehensive experimental study.
- ItemThe Price is Right(2012-05-28T08:07:59Z) TANG, Ruiming; WU, Huayu; KISTER, Thomas; BRESSAN, StephaneData is a modern commodity. Electronic data market places are being designed and deployed. Yet current pricing models either focus on processing or are proprietary, opaque and not conducive of a healthy commodity market dynamics. In this paper we propose a generic data pricing model that is based on minimal lineages, i.e. sets of tuples contributing to the result of a query. We show that the proposed model fulfils desirable properties such as contribution monotonicity, bounded price and arbitrage-freedom. We extend the model to the case of probabilistic databases in the X-tuple model. We propose algorithms to the compute the price of a query based on our pricing models.
- ItemProblems of LCA and Impact of ORA-semantics in XML Keyword Search(2012-03-26T02:13:57Z) LE, Thuy Ngoc; WU, Huayu; LING, Tok Wang; LI, LuochenMost keyword search approaches for data-centric XML documents are based on the computation of Lowest Common Ancestors (LCA). However, LCA-based search methods depend much on hierarchical structures of XML data. Therefore it may not be able to find desired answers for many keyword queries since a relationship among objects in XML data can be represented in different hierarchical structures. In this paper, we first point out serious problems of the LCA-based approach, due to its unawareness of semantics of object, relationship and attribute, referred to as ORA-semantics. Through detailed analysis of these problems, we show the impact of ORA-semantics in XML keyword search. We then propose an ORA-semantics based approach with rules to infer expected answers for XML keyword queries. Experimental results show that our ORA-semantics based approach can resolve the problems of the LCA-based approach, and thus can be a promising research direction for XML keyword search.