Browsing by Author "TUNG, Anthony K.H."
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- ItemEdit Distance Evaluation on Graph Structures(2008-06-19) ZENG, Zhiping; TUNG, Anthony K.H.; WANG, Jianyong; ZHOU, LizhuGraph data has became ubiquitous and manipulating them based on similarity is essential for many applications. Graph edit distance is one of the most widely accepted measure to determine similarities between graphs and has extensive applications in the fields of pattern recognition, computer vision etc. Unfortunately, the problem of graph edit dis- tance computation is NP-Hard in general and is unlikely to be approximated within a polynomial time computable fac- tor f(n) in polynomial time. Accordingly, in this paper we introduce three novel methods to compute the upper and lower bounds for the edit distance between two graphs in polynomial time. Applying these methods, three algorithms AppFull, ExactSub and AppSub are introduced to per- form different kinds of graph search on graph databases. Comprehensive experimental studies are conducted on both real and synthetic datasets to examine various aspects of the methods for bounding graph edit distance. Result shows that these methods .....
- ItemEfficient and Effective Similarity Search over Probabilistic Data based on Earth Mover's Distance(2010-05-03) XU, Jia; ZHANG, Zhenjie; TUNG, Anthony K.H.; YU, GeProbabilistic data is coming as a new deluge along with the technical advances on geographical tracking, multimedia processing, sensor network and RFID. While similarity search is an important functionality supporting the manipulation of probabilistic data, it raises new challenges to traditional relational database. The problem stems from the limited e®ectiveness of the distance metric supported by the existing database system. On the other hand, some complicated distance operators have been proved their values for better distinguishing ability in probabilistic domain. In this paper, we discuss the similarity search problem with the Earth Mover's Distance, which is the most successful distance metric on probabilistic histograms and an expensive operator with cubic complexity. We present a new database approach to answer range queries and k-nearest neighbor queries on probabilistic data, on the basis of Earth Mover's Distance. Our solution utilizes the primal-dual theory in linear programming and deploys B+ tree index structures for e®ective candidate pruning. Extensive experiments show that our proposal dramatically improves the scalability of probabilistic databases.
- ItemMinimizing the Communication Cost for Continuous Skyline Maintenance(2008-05-29) ZHANG, Zhenjie; CHENG, Reynold; PAPADIAS, Dimitris; TUNG, Anthony K.H.Numerous algorithms in the recent database literature deal with variants of skyline queries in different problem settings. However, the existing work focuses on optimizing the processing cost. This paper aims at minimization of the communication overhead in client-server architectures, where a server continuously maintains the skyline of dynamic objects. Our first contribution is a Filter method that avoids transmission of updates from objects that cannot influence the skyline. Specifically, each object is assigned a filter so that it needs to issue an update only if it violates its filter. The Filter method achieves significant savings over the naive approach of transmitting all updates. Going one step further, we introduce the concept of frequent skyline query over a sliding window (FSQW). The motivation is that snapshot skylines are not very useful in streaming environments because they keep changing over time. Instead, FSQW reports the objects that appear in the skylines of at least ? of the s most recent times- tamps. The Filter method can be easily adapted to FSQW processing, however, with potentially high overhead for large and frequently updated datasets. To further reduce the communication cost, we propose a Sampling method, which returns approximate FSQW results without computing each snapshot skyline. Finally, we integrate the Filter and Sampling methods in a Hybrid approach that combines their individual advantages. We evaluate our techniques with extensive experiments.
- ItemSINGS: Similarity Search in Large Graph Databases(2010-06-29T01:55:49Z) WANG, Xiaoli; DING, Xiaofeng; TUNG, Anthony K.H.; YING, Shanshan; JIN, HaiGraphs are popular models for representing complex structured data and similarity search for graphs has become a fundamental research problem. Many techniques have been proposed to support similarity search on graphs based on the graph edit distance. However, they all suffer from certain drawbacks: high computational complexity, poor scalability in terms of the number of graphs and the size of the graph, or not taking full advantage of indexes. To address these problems, in this paper, we propose SINGS, an indexing and query processing framework for graph similarity search. An effective two-level index based on star structures is constructed and a novel search strategy based on the new indexing structure is also proposed. Two algorithms adapted from the TA and CA algorithms are seamlessly integrated into this proposed strategy for speeding up graph search. Extensive experiments are conducted on both real and synthetic datasets to evaluate the effectiveness and scalability of our approaches.