Comparison Of Missing Value Estimation In Cell-Cycleregulated Genes Prediction
No Thumbnail Available
Date
2006-09-08T05:30:51Z
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Missing values in the microarray data are problematic in many biological applications. In literature, different methods have been proposed to estimate the missing values, and their performances are evaluated with the normalized root mean square error (NRMSE) on the simulated missing data. Although NRMSE is indicative on the performance of the proposed methods, it does not tell us the methods' effects on the real application. However, in general, the NRMSE in different papers can not be compared directly, since the simulated missing data in the different papers are different. In this paper, we examined six different missing value estimation methods on the real application of cell-cycle-regulated genes prediction, as well as on the simulated missing data. The experiments show that, in terms of NRMSE, our improved knn-based method performs better than the naïve knn-based method, and BPCA and LLSimpute have the smallest NRMSE. In the real application, most of the methods performed similarly in terms of the accuracy of the cell-cycle-regulated genes predicted. Surprisingly, the simple row-mean method can achieve the accuracy as good as other sophisticated methods. And the LLSimpute method performs quite worse in terms of the accuracy of the cell-cycle-regulated genes prediction probably due to overfitting. The results of LLSimpute suggest that the performances in NRMSE and in the real application are not directly correlated. Hence, in the research of the missing value estimation, we need to compare the methods' performance not only in terms of NRMSE, but also in terms of the possible criteria in the real application. The data sets, the improved knn-based method, and results in our experiments are available online http://www.comp.nus.edu.sg/~ligl/missing_values/.