Abstract
Disease stat can be predicted from biomedical data studies by machine learning. However, many available biomedical data feature high dimensional or imbalanced, and these always led to high false positive or false negative rate in prediction. How to construct suitable machine learning model is the key to the performance.
This article describes a novel approach that applies PSO boosted ensembles of lazy learners to mining biomedical data. The learned model is evaluated on three published data sets during 10-fold cross-validation; Experiment result reveals that the proposed model can tackle data interference and performs better than other available rule learning methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Gopalakrishnan, V., Ganchev, P., Ranganathan, S., et al.: Rule learning for disease-specific biomarker discovery from clinical proteomic mass spectra. In: Li, J., Yang, Q., Tan, A.-H. (eds.) BioDM 2006. LNCS (LNBI), vol. 3916, pp. 93–105. Springer, Heidelberg (2006)
Gopalakrishnan, V., Williams, E., Ranganathan, S., et al.: Proteomic data mining challenges in identification of disease-specific biomarkers from variable resolution mass spectra. In: SIAM Bioinformatics Workshop. Society of Industrial and Applied Mathematics International Conference on Data Mining, Lake Buena Vista, FL (2004)
Gopalakrishnan, V., Lustgarten, J.L., Visweswaran, S., Cooper, G.F.: Bayesian rule learning for biomedical data mining. Bioinformatics 26(5), 668–675 (2010)
Park, C., Cho, S.-B.: Evolutionary Computation for Optimal Ensemble Classifier in Lymphoma Cancer Classification. In: Carbonell, J.G., Siekmann, J. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 521–530. Springer, Heidelberg (2003)
Datta, S., Pihur, V., Datta, S.: An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data. BMC Bioinformatics 11, 427 (2010)
Viola, P., Jones, M.: Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade. Advances in Neural Information Processing System 14, 1311–1318 (2001)
Aha, D.W.: Lazy learning. Lazy learning, 7–10 (1997)
van den Bosch, A., Weijters, T., van den Herik, H.J.: When small disjuncts abound, try lazy learning: A case study. In: Proceedings Seventh Benelearn Conference, pp. 109–118 (1997)
Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning. Artificial Intelligence Review 11(1-5), 11–73 (1997)
Zheng, Z., Webb, G.I., Ting, K.M.: Lazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees. In: Proc. 16th International Conf. on Machine Learning, pp. 493–502 (1999)
Schapirere, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)
Zenobi, G., Cunningham, P.: An Approach to Aggregating Ensembles of Lazy Learners that Supports Explanation. In: Craw, S., Preece, A.D. (eds.) ECCBR 2002. LNCS (LNAI), vol. 2416, pp. 121–160. Springer, Heidelberg (2002)
Chen, Y., Zhao, Y.: A novel ensemble of classifiers for microarray data classification. Applied Soft Computing 8(4), 1664–1669 (2008)
Wang, X., Wang, H.: Classification by evolutionary ensembles. Pattern Recognition 39, 595–607 (2006)
Kennedy, J., Eberhart, R.: Particle Swarm Optimization. In: Proceedings of IEEE International Conference on Neural Networks, vol. IV, pp. 1942–1948 (1995)
Parsopoulos, K.E., Vrahatis, M.N.: Particle swarm optimization method in multiobjective problems. In: Proceedings of ACM Symp. on Applied Computing, Madrid The Association for Computing Machinery, pp. 603–607. ACM Press, New York (2002)
Kim, K.-J., Cho, S.-B.: An evolutionary algorithm approach to optimal ensemble classifiers for DNA microarray data analysis. IEEE Transactions On Evolutionary Computation 12(3), 377–388 (2008)
Wold, S., SjÄolstrÄom, M., Erikson, L.: PLS-regression: A Basic Tool of Chemometrics. Chemometrics and Intelligent Laboratory Systems 130, 58–109 (2001)
Rocke, D.M., Dai, J.: Sampling and Subsampling for Cluster Analysis in Data Mining: With Applications to Sky Survey Data. Data Mining and Knowledge Discovery 7(2), 215–232 (2003)
Rosenwald, A., Wright, G., Chan, W.C., et al.: The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 346, 1937–1947 (2002)
Alon, U., Barkai, N., Notterman, D.A., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)
Iizuka, N., Oka, M., Yamada-Okabe, H., et al.: Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. The Lancet 361(9361), 923–929 (2003)
Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond Accuracy, F-score and ROC: a Family of Discriminant Measures for Performance Evaluation. In: Sattar, A., Kang, B.-h. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1015–1021. Springer, Heidelberg (2006)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
Quinlan, J.R.: Bagging, Boosting, and C4.5. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 725–730 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pengfei, L., Wulei, T. (2011). Apply Ensemble of Lazy Learners to Biomedical Data Mining. In: Chen, R. (eds) Intelligent Computing and Information Science. ICICIS 2011. Communications in Computer and Information Science, vol 134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18129-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-18129-0_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18128-3
Online ISBN: 978-3-642-18129-0
eBook Packages: Computer ScienceComputer Science (R0)