Browsing by Author "TANG, Ruiming"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
- ItemConditioning Probabilistic Relational Data with Referential Constraints(2014-03-12T08:46:48Z) TANG, Ruiming; SHAO, Dongxu; BA, M. Lamine; WU, HuayuA probabilistic relational database is a compact form of a set of deterministic relational databases (namely, possible worlds), each of which has a probability. In our framework, the existence of tuples is determined by associated Boolean formulae based on elementary events. An estimation, within such a setting, of the probabilities of possible worlds uses a prior probability distribution speci ed over the elementary events. Direct observations and general knowledge, in the form of constraints, help re ning these probabilities, possibly ruling out some possible worlds. More precisely, new constraints can translate the observation of the existence or non-existence of a tuple, the knowledge of a well-de ned rule, such as primary key constraint, foreign key constraint, referential constraint, etc. Informally, the process of enforcing knowledge on a probabilistic database, which consists of computing a new subset of valid possible worlds together with their new (conditional) probabilities, is called conditioning. In this paper, we are interested in nding a new probabilistic relational database after conditioning with referential constraints involved. In the most general case, conditioning is intractable. As a result, we restricted our study to probabilistic relational databases in which formulae of tuples are independent events in order to achieve some tractability results. We devise and present polynomial algorithms for conditioning probabilistic relational databases with referential constraints.
- ItemEdit Distance between XML and Probabilistic XML Documents(2011-06-03) TANG, Ruiming; WU, Huayu; NOBARI, Sadegh; BRESSAN, StephaneWe propose an efficient algorithm for computing of the edit distance between an XML document and a probabilistic XML document. Probabilistic XML is a hierarchical data model capturing uncertainty of both value and structure. It is suitable to many modern applications such as information extraction, scientific data management and data integration. The computation of similarity is an essential building block for the comparison, alignment, clustering and classification of data in these applications. Several algorithms exist for measuring the structural similarity between XML documents among themselves or XML documents and XML document type definitions and schemas. The new challenge in efficiently computing the similarity between an XML document and a probabilistic XML document is the multiplicity of the possible worlds that a probabilistic XML document represents. In this paper, we devise and discuss algorithms for computing the similarity between an XML document and a probabilistic XML document. We empirically and comparatively evaluate their performance. In the absence of established corpora and benchmarks for probabilistic XML, we also propose and use random probabilistic XML models together with the associated random generation algorithms.
- ItemAn Efficient and Truthful Pricing Mechanism for Team Formation in Crowdsourcing Markets(2014-10-20T08:36:51Z) LIU, Qing; LUO, Tie; TANG, Ruiming; BRESSAN, StephaneIn a crowdsourcing market, a requester is looking to form a team of workers to perform a complex task that requires a variety of skills. Candidate workers advertise their certified skills and bid prices for their participation. We design four incentive mechanisms for selecting workers to form a valid team (that can complete the task) and determining each individual worker’s payment.We examine the profitability, individually rationality, computationally efficiency, and truthfulness for each of the four mechanisms. Our study analytically shows that one of the mechanisms, called TruTeam, is superior to the others particularly due to its computationally efficiency and truthfulness. Furthermore, our extensive simulations confirm the analysis and demonstrate that TruTeam is an efficient and truthful pricing mechanism for team formation in crowdsourcing markets.
- ItemExpressing and Processing Path-centric XML Queries(2012-04-02T09:09:42Z) WU, Huayu; TANG, Ruiming; LING, Tok Wang; BRESSAN, StephaneThe support for navigation and browsing in XML query languages is based on XPath. XQuery, a W3C standardized XML query language, introduces FLOWR constructs on top of XPath, to express more complex query purposes. However, a family of very practical queries, which aim to return or manipulate paths as first class objects, cannot be expressed by XPath or XQuery FLOWR expressions. We call them path-centric queries. Of course, a user can program an XQuery user-defined recursive function to meet pathcentric queries, but the concern is the convenience and practicality of requiring normal database users to be equipped with programming skills. In this paper, we analyze the root cause of the limited expressivity of XPath and XQuery for path-centric queries. Based on the analysis, we propose seamless extension to XQuery FLOWR to elegantly express path-centric queries. We further investigate intra-path aggregation, an analytical operation in pathcentric queries, and show how they can be easily expressed and efficiently processed.
- ItemA Framework for Conditioning Uncertain(2012-01-25) TANG, Ruiming; CHENG, Reynold; WU, Huayu; BRESSAN, StephaneWe propose a framework for representing conditioned probabilistic relational data. In this framework the existence of tuples in possible worlds is determined by Boolean expressions composed from elementary events. The probability of a possible world is computed from the probabilities associated with these elementary events. In addition, a set of global constraints conditions the database. Conditioning is the formalization of the process of adding knowledge to a database. Some worlds may be impossible given the constraints and the probabilities of possible worlds are accordingly re-defined. The new constraints can come from the observation of the existence or non-existence of a tuple, from the knowledge of a specific rule, such as the existence of an exclusive set of tuples, or from the knowledge of a general rule, such as a functional dependency. We are therefore interested in computing a concise representation of the possible worlds and their respective probabilities after the addition of new constraints, namely an equivalent probabilistic database instance without constraints after conditioning. We devise and present a general algorithm for this computation. Unfortunately, the general problem involves the simplification of general Boolean expressions and is NP-hard. We therefore identify specific practical families of constraints for which we devise and present efficient algorithms.
- ItemGet a Sample for a Discount Sampling-Based XML Data Pricing(2014-03-12) TANG, Ruiming; AMARILLI, Antoine; SENELLART, Pierre; BRESSAN, StéphaneWhile price and data quality should define the major tradeoff for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension. In our setting, the data provider offers an XML document, and sets both the price of the document and a weight to each node of the document, depending on its potential worth. The data consumer proposes a price. If the proposed price is lower than that of the entire document, then the data consumer receives a sample, i.e., a random rooted subtree of the document whose selection depends on the discounted price and the weight of nodes. By requesting several samples, the data consumer can iteratively explore the data in the document. We show that the uniform random sampling of a rooted subtree with prescribed weight is unfortunately intractable. However, we are able to identify several practical cases that are tractable. The first case is uniform random sampling of a rooted subtree with prescribed size; the second case restricts to binary weights. For both these practical cases we present polynomial-time algorithms and explain how they can be integrated into an iterative exploratory sampling approach.
- ItemMeasuring XML Structured-ness with Entropy(2011-06-03) TANG, Ruiming; WU, Huayu; BRESSAN, StephaneXML is semi-structured. It can be used to annotate unstructured data, to represent structured data and almost anything in-between. Yet, it is unclear how to formally characterize, yet to quantify, structuredness of XML. In this paper we propose and evaluate entropy-based metrics for XML structured-ness. The metrics measure the structural uniformity of path and subtrees, respectively. We empirically study the correlation of these metrics with real and synthetic data sets.
- ItemThe Price is Right(2012-05-28T08:07:59Z) TANG, Ruiming; WU, Huayu; KISTER, Thomas; BRESSAN, StephaneData is a modern commodity. Electronic data market places are being designed and deployed. Yet current pricing models either focus on processing or are proprietary, opaque and not conducive of a healthy commodity market dynamics. In this paper we propose a generic data pricing model that is based on minimal lineages, i.e. sets of tuples contributing to the result of a query. We show that the proposed model fulfils desirable properties such as contribution monotonicity, bounded price and arbitrage-freedom. We extend the model to the case of probabilistic databases in the X-tuple model. We propose algorithms to the compute the price of a query based on our pricing models.
- ItemWhat you Pay for is What you Get(2013-04-16T03:16:07Z) TANG, Ruiming; SHAO, Dongxu; BRESSAN, Stephane; VALDURIEZ, PatrickIn most data markets, prices are prescribed and accuracy is determined by the data. Instead, we consider a model in which accuracy can be traded for discounted prices: \what you pay for is what you get". The data market model consists of data consumers, data providers and data market owners. The data market owners are brokers between the data providers and data consumers. A data consumer proposes a price for the data that she requests. If the price is less than the price set by the data provider, then she gets an approximate value. The data market owners negotiate the pricing schemes with the data providers. They implement these schemes for the computation of the discounted approximate values. We propose a theoretical and practical pricing framework with its algorithms for the above mechanism. In this framework, the value published is randomly determined from a probability distribution. The distribution is computed such that its distance to the actual value is commensurate to the discount. The published value comes with a guarantee on the probability to be the exact value. The probability is also commensurate to the discount. We present and formalize the principles that a healthy data market should meet for such a transaction. We defi ne two ancillary functions and describe the algorithms that compute the approximate value from the proposed price using these functions. We prove that the functions and the algorithm meet the required principles.