Correlation-based Attribute Outlier Detection in XML

No Thumbnail Available
Date
2007-06-25
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Outlier detection - the problem of identifying deviating patterns in data sets, has important applications in data cleaning, fraud detection, and stock market analysis. Compared to relational data models, the hierarchical structure of semi-structured data such as XML provides semantically meaningful neighborhoods advancing existing outlier detection methods. In this paper, we propose a systematic framework - XODDS (XML Outlier Detection from Data Subspace) for outlier detection in XML data. XODDS utilizes the correlation between attributes to adaptively identify outliers. XODDS consists of four key steps: (1) attribute aggregation defines summarizing elements in the hierarchical XML structures, (2) subspace identification determines contextually informative neighborhoods for outlier detection, (3) outlier scoring computes the extent of outlier-ness using correlation-based metrics, and (4) outlier identification adaptively determine the optimal thresholds distinguishing the outliers from non-outliers. Experimental results on both synthetic and real-world data sets indicate that XODDS is effective in detecting outliers in XML data.
Description
Keywords
Outlier Detection, XML, Data Mining
Citation