Sampling and Its Application in Data Mining: A Survey

dc.contributor.authorBaohua GUen_US
dc.contributor.authorFeifang HUen_US
dc.contributor.authorHuan LIUen_US
dc.date.accessioned2004-10-21T14:28:52Zen_US
dc.date.accessioned2017-01-23T06:59:45Z
dc.date.available2004-10-21T14:28:52Zen_US
dc.date.available2017-01-23T06:59:45Z
dc.date.issued2000-06-01T00:00:00Zen_US
dc.description.abstractLarge data sets are becoming obstacles for efficient data mining. Sampling, as a well-established technique in statistics, is desired to play its role in overcoming the obstacles. Statistical community has provided plenty of sampling strategies which are generally believed also applicable in data mining. However, since data mining community has different starting-points and requirements from statistics community, some of these strategies may need to be reexamined when applied to data mining and it is also desirable to invent novel strategies for specific data mining tasks on specific data. This paper summarizes basic ideas and general considerations of sampling and categorizes sampling strategies existing in statistics so as to obtain potentially useful sampling strategies for data mining. Then the state-of-the-art ways of applying sampling in data mining are reviewed. By analyzing the strategies used in different data mining tasks and relating them to their precedents in statistics, we show that how traditional strategies are directly or indirectly applied. We discuss general considerations and research issues of sampling in data mining. We show that these issues are either usually not considered in statistics or not well-studied yet but essential to data mining. We believe extensive studies on sampling will contribute more to data mining, especially when dealing with large data sets.en_US
dc.format.extent226605 bytesen_US
dc.format.extent1121027 bytesen_US
dc.format.mimetypeapplication/pdfen_US
dc.format.mimetypeapplication/postscripten_US
dc.identifier.urihttps://dl.comp.nus.edu.sg/xmlui/handle/1900.100/1408en_US
dc.language.isoenen_US
dc.relation.ispartofseriesTRA6/00en_US
dc.titleSampling and Its Application in Data Mining: A Surveyen_US
dc.typeTechnical Reporten_US
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
report.ps
Size:
1.07 MB
Format:
Postscript Files
Description:
Loading...
Thumbnail Image
Name:
report.pdf
Size:
221.29 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.52 KB
Format:
Plain Text
Description: