Towards Cleaning XML Databases: Experience and Performance Evaluation

dc.contributor.authorWai Lup LOWen_US
dc.contributor.authorWee Hyong TOKen_US
dc.contributor.authorMong Li LEEen_US
dc.contributor.authorTok Wang LINGen_US
dc.date.accessioned2004-10-21T14:28:52Zen_US
dc.date.accessioned2017-01-23T07:00:03Z
dc.date.available2004-10-21T14:28:52Zen_US
dc.date.available2017-01-23T07:00:03Z
dc.date.issued2003-01-01T00:00:00Zen_US
dc.description.abstractWith the increasing popularity of data-centric XML, data warehousing and mining applications are being developed for the rapidly burgeoning XML data repositories. Data quality will no doubt be a critical factor for the success of such applications. Data cleaning, which refers to the processes used to improve data quality, has been well researched in the context of traditional databases. In this work, we present a novel attempt to clean XML databases. We discuss the new challenges that arise in XML data cleaning and propose solutions to overcome these problems. Our experimental dataset is the DBLP database, a popular online XML bibliography database used by many researchers. The DBLP database is a large collection of small XML documents. Our study shows the benefits of performance gains, flexibility and maintainability that can be achieved by leveraging on the use of a relational database management system to clean XML data. We also investigate the conventional practice of using XML parsers when the structure of the XML data is simple and static, and compare their performance against string matching approaches.en_US
dc.format.extent608943 bytesen_US
dc.format.extent530633 bytesen_US
dc.format.mimetypeapplication/pdfen_US
dc.format.mimetypeapplication/postscripten_US
dc.identifier.urihttps://dl.comp.nus.edu.sg/xmlui/handle/1900.100/1428en_US
dc.language.isoenen_US
dc.relation.ispartofseriesTRA1/03en_US
dc.titleTowards Cleaning XML Databases: Experience and Performance Evaluationen_US
dc.typeTechnical Reporten_US
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
report.ps
Size:
518.2 KB
Format:
Postscript Files
Description:
Loading...
Thumbnail Image
Name:
report.pdf
Size:
594.67 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.52 KB
Format:
Plain Text
Description: