Browsing by Author "HSU, Wynne"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- ItemCorrelation-based Attribute Outlier Detection in XML(2007-06-25) KOH, Judice L. Y.; LEE, Mong Li; HSU, Wynne; ANG, Wee TiongOutlier detection - the problem of identifying deviating patterns in data sets, has important applications in data cleaning, fraud detection, and stock market analysis. Compared to relational data models, the hierarchical structure of semi-structured data such as XML provides semantically meaningful neighborhoods advancing existing outlier detection methods. In this paper, we propose a systematic framework - XODDS (XML Outlier Detection from Data Subspace) for outlier detection in XML data. XODDS utilizes the correlation between attributes to adaptively identify outliers. XODDS consists of four key steps: (1) attribute aggregation defines summarizing elements in the hierarchical XML structures, (2) subspace identification determines contextually informative neighborhoods for outlier detection, (3) outlier scoring computes the extent of outlier-ness using correlation-based metrics, and (4) outlier identification adaptively determine the optimal thresholds distinguishing the outliers from non-outliers. Experimental results on both synthetic and real-world data sets indicate that XODDS is effective in detecting outliers in XML data.
- ItemDiscovering Spatial Interaction Patterns(2007-06-29) SHENG, Chang; HSU, Wynne; LEE, Mong LiAdvances in sensing and satellite technologies and the growth of Internet have resulted in a vast amount of spatial data that are easily accessible. Extracting useful knowledge from these data has remained an important and challenging task. Among the various spatial analysis tasks, finding interaction among spatial features is one of the most important problem. Existing works typically adopt a grid-like approach to transform the continuous space to a discrete space. This may lead to some meaningful knowledge being missed. In this paper, we propose to model the spatial features in a continuous space through the use of influence functions. For each feature type, we build an influence map that captures the distribution of the feature instances. Superimposing the influence maps allows the interaction of the feature types to be quickly determined. Experiments on both synthetic and real world datasets indicate that the proposed approach is scalable and is able to discover patterns that have ...
- ItemEfficient Mining of Dense Periodic Patterns in Time Series(2005-10-31) SHENG, Chang; HSU, Wynne; LEE, Mong LiExisting techniques to mine periodic patterns in time series data are focused on discovering full-cycle periodic patterns from an entire time series. However, many useful partial periodic patterns are hidden in long and complex time series data. In this paper, we aim to discover the partial periodicity in local segments of the time series data. We introduce the notion of character density to partition the time series into variable-length fragments and to determine the lower bound of each character's period. We propose a novel algorithm, called DPMiner, to .nd the dense periodic patterns in time series data. The algorithm makes use of an Apriori-like property to prune the search space. Experimental results on both synthetic and real-life datasets demonstrate that the proposed algorithm is effective and ef.cient to reveal interesting dense periodic patterns.
- ItemERkNN: Efficient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation(2006-07-31T09:01:26Z) XIA, Chenyi; HSU, Wynne; LEE, Mong LiThe Reverse k-Nearest Neighbors (RkNN) queries are important in profile-based marketing, information retrieval, decision support and data mining systems. However, they are very expensive and existing algorithms are not scalable to queries in high dimensional spaces or of large values of k. This paper describes an efficient estimation-based RkNN search algorithm (ERkNN) which answers RkNN queries based on local kNN-distance estimation methods. The proposed approach utilizes estimation-based filtering strategy to lower the computation cost of RkNN queries. The results of extensive experiments on both synthetic and real life datasets demonstrate that ERkNN algorithm retrieves RkNN efficiently and is scalable with respect to data dimensionality, k, and data size.
- ItemMining Progressive Confident Rules(2006-06-09) ZHANG, Minghua; HSU, Wynne; LEE, Mong LiMany real world objects have states that change overtime. By tracking the state sequences of these objects, we can study their behavior and take preventive measures before they reach some undesirable states. In this paper, we propose a new kind of pattern, called progressive confident rules, to describe sequences of states with an increasing confidence that lead to a particular end state. We give a formal definition of progressive confident rules and their concise set. We propose new pruning strategies and employ the concise set analysis of rules in the mining process to reduce the enormous search space. Experiment result shows that the proposed algorithmis efficient and scalable. We also demonstrate the application of progressive confident rules in classification.