Browsing by Author "Wai Lup LOW"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemTowards Cleaning XML Databases: Experience and Performance Evaluation(2003-01-01T00:00:00Z) Wai Lup LOW; Wee Hyong TOK; Mong Li LEE; Tok Wang LINGWith the increasing popularity of data-centric XML, data warehousing and mining applications are being developed for the rapidly burgeoning XML data repositories. Data quality will no doubt be a critical factor for the success of such applications. Data cleaning, which refers to the processes used to improve data quality, has been well researched in the context of traditional databases. In this work, we present a novel attempt to clean XML databases. We discuss the new challenges that arise in XML data cleaning and propose solutions to overcome these problems. Our experimental dataset is the DBLP database, a popular online XML bibliography database used by many researchers. The DBLP database is a large collection of small XML documents. Our study shows the benefits of performance gains, flexibility and maintainability that can be achieved by leveraging on the use of a relational database management system to clean XML data. We also investigate the conventional practice of using XML parsers when the structure of the XML data is simple and static, and compare their performance against string matching approaches.