Browsing by Author "LI, Guoliang"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- ItemActive Learning for Causal Bayesian Network Structure with Non-symmetrical Entropy(2008-06-26) LI, Guoliang; LEONG, Tze-YunCausal knowledge is crucial for facilitating comprehension, diagnosis, prediction, and control in automated reasoning. Active learning in Bayesian networks involves interventions by manipulating specific variables or their interactions, and observing the patterns of change over the other variables to derive causal relationships for knowledge discovery. In this paper, we propose a new active learning approach that supports interventions with node selection. Our method admits a node selection criterion based on non-symmetrical information entropy and a stop criterion based on minimizing structure entropy of the resulting networks. We examine the technical challenges and practical issues in developing effective node selection and stopping criteria in our method. Experimental results on a set of benchmark Bayesian networks are promising. The proposed method is applicable in many real-life applications where multiple instances are simultaneously sampled as a data set in each active learning step.
- ItemComparison Of Missing Value Estimation In Cell-Cycleregulated Genes Prediction(2006-09-08T05:30:51Z) LI, Guoliang; LEONG, Tze-Yun; ZHANG, LouxinMissing values in the microarray data are problematic in many biological applications. In literature, different methods have been proposed to estimate the missing values, and their performances are evaluated with the normalized root mean square error (NRMSE) on the simulated missing data. Although NRMSE is indicative on the performance of the proposed methods, it does not tell us the methods' effects on the real application. However, in general, the NRMSE in different papers can not be compared directly, since the simulated missing data in the different papers are different. In this paper, we examined six different missing value estimation methods on the real application of cell-cycle-regulated genes prediction, as well as on the simulated missing data. The experiments show that, in terms of NRMSE, our improved knn-based method performs better than the naïve knn-based method, and BPCA and LLSimpute have the smallest NRMSE. In the real application, most of the methods performed similarly in terms of the accuracy of the cell-cycle-regulated genes predicted. Surprisingly, the simple row-mean method can achieve the accuracy as good as other sophisticated methods. And the LLSimpute method performs quite worse in terms of the accuracy of the cell-cycle-regulated genes prediction probably due to overfitting. The results of LLSimpute suggest that the performances in NRMSE and in the real application are not directly correlated. Hence, in the research of the missing value estimation, we need to compare the methods' performance not only in terms of NRMSE, but also in terms of the possible criteria in the real application. The data sets, the improved knn-based method, and results in our experiments are available online http://www.comp.nus.edu.sg/~ligl/missing_values/.
- ItemCoronary Artery Disease Prediction with Bayesian Networks and Constraint Elicitation(2006-09-12) CHEN, Qiongyu; LI, Guoliang; HAN, Bin; HENG, Chew Kiat; LEONG, Tze-YunCoronary artery disease (CAD) is one of the major causes of death in the world. Finding cost-effective methods to predict CAD is a major challenge for public health. In this paper, we propose a Bayesian network learning approach with constraint elicitation mechanism to predict the risk of CAD. The underlying causal assumption and interpretability make Bayesian networks a good tool for medical applications, in this case CAD risk prediction involving both genetic and environmental factors. The constraint elicitation process improves model accuracy by incorporating relevant domain knowledge. We performed experiments to compare our results with those from other machine learning methods, such as naive Bayes, support vector machines, K nearest neighbors, neural networks and decision trees. Our method is shown to be comparable to these methods in terms of prediction accuracy but at the same time offers an intuitive representation of the relationships among variables in the problem domain. Conforming to the domain knowledge, the results identified the important environmental factors for CAD prediction and the relevant groups of gene markers contributing to the risk of CAD. The results also indicated that some gene markers that are relevant to CAD risk in western populations, but may not be relevant in Chinese, Indian and Malay populations local to Singapore.
- ItemExperimental Analysis on Severe Head Injury Outcome Prediction – A Preliminary Study(2006-09-12T01:09:44Z) YIN, Hongli; LI, Guoliang; LEONG, Tze-Yun; KURALMANI, Vellaisamy; PANG, Boon Chuan; ANG, Beng Ti; LEE, Kah Keow; NG, IvanSevere head injury management is a very costly and labor-intensive process. There has been growing interest in building outcome analysis models using existing patient records to facilitate decision making and resource planning. However, traditional methods and results in the literature are often inconsistent in variable discretization, accuracy evaluation and class label assignment. In this paper, we examined the effectiveness of applying different outcome analysis methods in head injury management in a uniform manner, based on a set of actual patient records. We have conducted a set of experiments using sound statistical techniques to derive the results. Besides the comparative analysis that highlight the strengths and limitations of different outcome analysis methods, the experiments also show that Minimal-Description-Length (MDL)-based discretization method can help improve prediction accuracy substantially, and that class label assignments in the classification techniques play a very important role on prediction accuracy.