A Stratified Approach to Progressive Approximate Joins
No Thumbnail Available
Date
2007-09-24
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Users often do not require a complete answer to their query but rather only a sample. They expect the sample to be either the largest possible or the most representative (or both) given the resources available. We call the query processing techniques that deliver such results approximate. Process- ing of queries to streams of data is said to be progressive when it can continuously produce results as data arrives. In this paper, we are interested in the progressive and approxi- mate processing of queries to data streams when processing is limited to main memory. In particular, we study one of the main building blocks of such processing: the progressive approximate join. We devise and present several novel progressive approximate join algorithms. We empirically evaluate the performance of our algorithms and compare them with those of algorithms based on existing techniques. In particu- lar we study the trade-off between maximization throughput and maximization of representativeness of the sample.