An Empirical Study of Fitting Learning Curves

Baohua GU; Feifang HU; Huan LIU

An Empirical Study of Fitting Learning Curves

Files

report.ps(503.03 KB)

report.pdf(228 KB)

Date

2001-04-01T00:00:00Z

Authors

Baohua GU

Feifang HU

Huan LIU

Abstract

It is well known that many learning algorithms have diminishing returns for increased training data size. This paper empirically studies fitting learning curves of large data sets in search of a principled stopping criterion. Such a criterion is particularly useful when the data size is huge as in most data mining applications. Learning curves are obtained by running decision tree algorithm C4.5 and logistic discrimination algorithm LOG on eight large UCI data sets, then fitted with six competing models, which are compared and ranked in terms of their performance on fitting full-size learning curves and on predicting late portion with curves fitted from early portions of learning curves with small data sizes. The three-parameters power law is found in the experiments overall close to the best in fitting and the best in predicting. It is also found that although the fit ranking of these fitting models is almost consistent for all the eight data sets using the two algorithms, their prediction ranking varies more for LOG than for C4.5 over the eight data sets and the amount of data used in fitting. The findings here can be used in effective data mining with large data.

URI

https://dl.comp.nus.edu.sg/xmlui/handle/1900.100/1415

Collections

Technical Reports

Full item page