Efficient and Effective Similarity Search over Probabilistic Data based on Earth Mover's Distance
No Thumbnail Available
Files
Date
2010-05-03
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Probabilistic data is coming as a new deluge along with the technical advances on geographical
tracking, multimedia processing, sensor network and RFID. While similarity search is an important
functionality supporting the manipulation of probabilistic data, it raises new challenges to traditional relational database. The problem stems from the limited e®ectiveness of the distance metric supported by the existing database system. On the other hand, some complicated distance operators have been
proved their values for better distinguishing ability in probabilistic domain. In this paper, we discuss the similarity search problem with the Earth Mover's Distance, which is the most successful distance metric on probabilistic histograms and an expensive operator with cubic complexity. We present a new database approach to answer range queries and k-nearest neighbor queries on probabilistic data, on the basis of Earth Mover's Distance. Our solution utilizes the primal-dual theory in linear programming and
deploys B+ tree index structures for e®ective candidate pruning. Extensive experiments show that our proposal dramatically improves the scalability of probabilistic databases.