An Empirical Study of Fitting Learning Curves

dc.contributor.authorBaohua GUen_US
dc.contributor.authorFeifang HUen_US
dc.contributor.authorHuan LIUen_US
dc.date.accessioned2004-10-21T14:28:52Zen_US
dc.date.accessioned2017-01-23T06:59:39Z
dc.date.available2004-10-21T14:28:52Zen_US
dc.date.available2017-01-23T06:59:39Z
dc.date.issued2001-04-01T00:00:00Zen_US
dc.description.abstractIt is well known that many learning algorithms have diminishing returns for increased training data size. This paper empirically studies fitting learning curves of large data sets in search of a principled stopping criterion. Such a criterion is particularly useful when the data size is huge as in most data mining applications. Learning curves are obtained by running decision tree algorithm C4.5 and logistic discrimination algorithm LOG on eight large UCI data sets, then fitted with six competing models, which are compared and ranked in terms of their performance on fitting full-size learning curves and on predicting late portion with curves fitted from early portions of learning curves with small data sizes. The three-parameters power law is found in the experiments overall close to the best in fitting and the best in predicting. It is also found that although the fit ranking of these fitting models is almost consistent for all the eight data sets using the two algorithms, their prediction ranking varies more for LOG than for C4.5 over the eight data sets and the amount of data used in fitting. The findings here can be used in effective data mining with large data.en_US
dc.format.extent233474 bytesen_US
dc.format.extent515101 bytesen_US
dc.format.mimetypeapplication/pdfen_US
dc.format.mimetypeapplication/postscripten_US
dc.identifier.urihttps://dl.comp.nus.edu.sg/xmlui/handle/1900.100/1415en_US
dc.language.isoenen_US
dc.relation.ispartofseriesTRA4/01en_US
dc.titleAn Empirical Study of Fitting Learning Curvesen_US
dc.typeTechnical Reporten_US
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
report.ps
Size:
503.03 KB
Format:
Postscript Files
Description:
Loading...
Thumbnail Image
Name:
report.pdf
Size:
228 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.52 KB
Format:
Plain Text
Description: