Browsing by Author "LEE, Mong Li"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
- ItemAnalyzing Temporal Keyword Queries for Interactive Search over Temporal DatabasesGAO, Qiao; LEE, Mong Li; LING, Tok Wang; DOBBIE, Gillian; ZENG, ZhongQuerying temporal relational databases is a challenge for non-expert database users, since it requires users to understand the semantics of the database and apply temporal joins as well as temporal conditions correctly in SQL statements. Traditional keyword search approaches are not directly applicable to temporal relational databases since they treat time-related keywords as tuple values and do not consider the temporal joins between relations, which leads to missing answers, incorrect answers and missing query interpretations. In this work, we extend keyword queries to allow the temporal predicates, and design a schema graph approach based on the Object-Relationship-Attribute (ORA) semantics. This approach enables us to identify temporal attributes of objects/relationships and infer the target temporal data of temporal predicates, thus improving the completeness and correctness of temporal keyword search and capturing the various possible interpretations of temporal keyword queries.
- ItemCorrelation-based Attribute Outlier Detection in XML(2007-06-25) KOH, Judice L. Y.; LEE, Mong Li; HSU, Wynne; ANG, Wee TiongOutlier detection - the problem of identifying deviating patterns in data sets, has important applications in data cleaning, fraud detection, and stock market analysis. Compared to relational data models, the hierarchical structure of semi-structured data such as XML provides semantically meaningful neighborhoods advancing existing outlier detection methods. In this paper, we propose a systematic framework - XODDS (XML Outlier Detection from Data Subspace) for outlier detection in XML data. XODDS utilizes the correlation between attributes to adaptively identify outliers. XODDS consists of four key steps: (1) attribute aggregation defines summarizing elements in the hierarchical XML structures, (2) subspace identification determines contextually informative neighborhoods for outlier detection, (3) outlier scoring computes the extent of outlier-ness using correlation-based metrics, and (4) outlier identification adaptively determine the optimal thresholds distinguishing the outliers from non-outliers. Experimental results on both synthetic and real-world data sets indicate that XODDS is effective in detecting outliers in XML data.
- ItemDiscovering Spatial Interaction Patterns(2007-06-29) SHENG, Chang; HSU, Wynne; LEE, Mong LiAdvances in sensing and satellite technologies and the growth of Internet have resulted in a vast amount of spatial data that are easily accessible. Extracting useful knowledge from these data has remained an important and challenging task. Among the various spatial analysis tasks, finding interaction among spatial features is one of the most important problem. Existing works typically adopt a grid-like approach to transform the continuous space to a discrete space. This may lead to some meaningful knowledge being missed. In this paper, we propose to model the spatial features in a continuous space through the use of influence functions. For each feature type, we build an influence map that captures the distribution of the feature instances. Superimposing the influence maps allows the interaction of the feature types to be quickly determined. Experiments on both synthetic and real world datasets indicate that the proposed approach is scalable and is able to discover patterns that have ...
- ItemEfficient Mining of Dense Periodic Patterns in Time Series(2005-10-31) SHENG, Chang; HSU, Wynne; LEE, Mong LiExisting techniques to mine periodic patterns in time series data are focused on discovering full-cycle periodic patterns from an entire time series. However, many useful partial periodic patterns are hidden in long and complex time series data. In this paper, we aim to discover the partial periodicity in local segments of the time series data. We introduce the notion of character density to partition the time series into variable-length fragments and to determine the lower bound of each character's period. We propose a novel algorithm, called DPMiner, to .nd the dense periodic patterns in time series data. The algorithm makes use of an Apriori-like property to prune the search space. Experimental results on both synthetic and real-life datasets demonstrate that the proposed algorithm is effective and ef.cient to reveal interesting dense periodic patterns.
- ItemERkNN: Efficient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation(2006-07-31T09:01:26Z) XIA, Chenyi; HSU, Wynne; LEE, Mong LiThe Reverse k-Nearest Neighbors (RkNN) queries are important in profile-based marketing, information retrieval, decision support and data mining systems. However, they are very expensive and existing algorithms are not scalable to queries in high dimensional spaces or of large values of k. This paper describes an efficient estimation-based RkNN search algorithm (ERkNN) which answers RkNN queries based on local kNN-distance estimation methods. The proposed approach utilizes estimation-based filtering strategy to lower the computation cost of RkNN queries. The results of extensive experiments on both synthetic and real life datasets demonstrate that ERkNN algorithm retrieves RkNN efficiently and is scalable with respect to data dimensionality, k, and data size.
- ItemMining Progressive Confident Rules(2006-06-09) ZHANG, Minghua; HSU, Wynne; LEE, Mong LiMany real world objects have states that change overtime. By tracking the state sequences of these objects, we can study their behavior and take preventive measures before they reach some undesirable states. In this paper, we propose a new kind of pattern, called progressive confident rules, to describe sequences of states with an increasing confidence that lead to a particular end state. We give a formal definition of progressive confident rules and their concise set. We propose new pruning strategies and employ the concise set analysis of rules in the mining process to reduce the enormous search space. Experiment result shows that the proposed algorithmis efficient and scalable. We also demonstrate the application of progressive confident rules in classification.
- ItemNon-blocking Spatial Join(2007-07-25) TOK, Wee Hyong; BRESSAN, Stephane; LEE, Mong LiWe propose and study sequential non-blocking algorithms for the processing of spatial joins on continuous data streams with unpredictable arrival rates or on large collections of spatial data that are not indexed. Given two sets of spatial data represented by their bounding boxes, the algorithms immediately and continuously compute and output the pairs of data from each set whose bounding boxes intersect. The different algorithms we propose take advantage of different possible characteristics of the data such as clustering of the input to build indexes or synopses to accelerate the production of results. We comparatively analyze the performance of the proposed algorithms using several synthetic and realistic data sets.
- ItemA Semantic Framework for Designing Temporal SQL DatabasesGAO, Qiao; LEE, Mong Li; DOBBIE, Gillian; ZHONG, ZengMany real world applications need to capture a mix of temporal and non-temporal entities, relationships and attributes. These concepts add complexity when designing database schemas and existing works are unable to capture the temporal semantics precisely. We propose a new framework for designing SQL databases that distinguishes between temporal and non-temporal concepts while also distinguishing between entities, relationships and attributes at every step. The framework rst utilizes an Entity-Relationship (ER) diagram to capture the real world semantics. Temporal constructs in the ER diagram are then annotated. Finally we map the temporal ER diagram to a normal form database schema that reduces redundant data by separating current data from historical data.We also describe how data consistency is maintained during updates. Experiment results show that we can generate database schemas that support efficientt access to both current and historical information, and enable better management of temporal data.