Reproducibility of parameter learning with missing observations in naive Wnt Bayesian network trained on colorectal cancer samples and doxycycline-treated cell lines†‡
Abstract
In this manuscript the reproducibility of parameter learning with missing observations in a naive Bayesian network and its effect on the prediction results for Wnt signaling activation in colorectal cancer is tested. The training of the network is carried out separately on doxycycline-treated LS174T cell lines (GSE18560) as well as normal and adenoma samples (GSE8671). A computational framework to test the reproducibility of the parameters is designed in order check the veracity of the prediction results. Detailed experimental analysis suggests that the prediction results are accurate and reproducible with negligible deviations. Anomalies in estimated parameters are accounted for due to the representation issues of the Bayesian network model. High prediction accuracies are reported for normal (N) and colon-related adenomas (AD), colorectal cancer (CRC), carcinomas (C), adenocarcinomas (ADC) and replication error colorectal cancer (RER CRC) test samples. Test samples from inflammatory bowel diseases (IBD) do not fare well in the prediction test. Also, an interesting case regarding hypothesis testing came up while proving the statistical significance of the different design setups of the Bayesian network model. It was found that hypothesis testing may not be the correct way to check the significance between design setups, especially when the structure of the model is the same, given that the model is trained on a single piece of test data. The significance test does have value when the datasets are independent. Finally, in comparison to the biologically inspired models, the naive Bayesian model may give accurate results, but this accuracy comes at the cost of a loss of crucial biological knowledge which might help reveal hidden relations among intra/extracellular factors affecting the Wnt pathway.