Better Interpretable Models for Proteomics Data Analysis Using Rule-Based Mining

Jayrannejad, Fahrnaz; Conrad, Tim O. F.

doi:10.1007/978-3-319-69775-8_4

Fahrnaz Jayrannejad¹⁷ &
Tim O. F. Conrad^17,18

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10344))

976 Accesses
1 Citations
1 Altmetric

Abstract

Recent advances in -omics technology has yielded in large data-sets in many areas of biology, such as mass spectrometry based proteomics. However, analyzing this data is still a challenging task mainly due to the very high dimensionality and high noise content of the data. One of the main objectives of the analysis is the identification of relevant patterns (or features) which can be used for classification of new samples to healthy or diseased. So, a method is required to find easily interpretable models from this data.

To gain the above mentioned goal, we have adapted the disjunctive association rule mining algorithm, TitanicOR, to identify emerging patterns from our mass spectrometry proteomics data-sets. Comparison to five state-of-the-art methods shows that our method is advantageous them in terms of identifying the inter-dependency between the features and the TP-rate and precision of the features selected. We further demonstrate the applicability of our algorithm to one previously published clinical data-set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vapnik, V.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 774–780 (1963)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Helleputte, T.: LiblineaR: Linear Predictive Models Based on the LIBLINEAR C/C++ Library. R package version 2.10-8 (2017)
Google Scholar
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39(5), 1–13 (2011)
Article Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Article Google Scholar
Therneau, T., Beth Atkinson, B.R.: Recursive Partitioning and Regression Trees. R package version 4.1-10 (2015)
Google Scholar
Kuhn, M.: Classification and Regression Training. R package version 6.0-73 (2016)
Google Scholar
Vimieiro, R., Moscato, P.: Mining disjunctive minimal generators with titanicor. Expert Syst. Appl. 39(9), 8228–8238 (2012)
Article Google Scholar
Gibb, S., Strimmer, K.: Multi-Class Discriminant Analysis using Binary Predictors. R package version 1.0.3 (2015)
Google Scholar
Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inf. 3(2), 119–131 (2016)
Article Google Scholar
Holzinger, A., Plass, M., Holzinger, K., Crisan, G.C., Pintea, C.M., Palade, V.: A glass-box interactive machine learning approach for solving np-hard problems with the human-in-the-loop. arXiv preprint (2017). arXiv:1708.01104
Bakin, S., et al.: Adaptive regression and model selection in data mining problems. Ph.D. thesis, The Australian National University (1999)
Google Scholar
Lawton, W.H., Sylvestre, E.A.: Self modeling curve resolution. Technometrics 13(3), 617–633 (1971)
Article Google Scholar
Loekito, E., Bailey, J.: Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 307–316. ACM (2006)
Google Scholar
Vimieiro, R., Moscato, P.: A new method for mining disjunctive emerging patterns in high-dimensional datasets using hypergraphs. Inf. Syst. 40, 1–10 (2014)
Article MATH Google Scholar
Vimieiro, R.: Mining disjunctive patterns in biomedical data sets. Ph.D. thesis, University of Newcastle, Faculty of Engineering & Built Environment, School of Electrical Engineering and Computer Science (2012)
Google Scholar
Zhao, L., Zaki, M.J., Ramakrishnan, N.: Blosom: a framework for mining arbitrary boolean expressions. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 827–832. ACM (2006)
Google Scholar
Liu, Q., Sung, A.H., Qiao, M., Chen, Z., Yang, J.Y., Yang, M.Q., Huang, X., Deng, Y.: Comparison of feature selection and classification for maldi-ms data. BMC Genom. 10(1), S3 (2009)
Article Google Scholar
Swan, A.L., Mobasheri, A., Allaway, D., Liddell, S., Bacardit, J.: Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology. Omics: J. Integr. Biol. 17(12), 595–610 (2013)
Article Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM Sigmod Record, vol. 22, pp. 207–216. ACM (1993)
Google Scholar
Varadan, V., Anastassiou, D.: Inference of disease-related molecular logic from systems-based microarray analysis. PLoS Comput. Biol. 2(6), e68 (2006)
Article Google Scholar
Sahoo, D., Dill, D.L., Gentles, A.J., Tibshirani, R., Plevritis, S.K.: Boolean implication networks derived from large scale, whole genome microarray datasets. Genome Biol. 9(10), R157 (2008)
Article Google Scholar
Li, J., Li, H., Wong, L., Pei, J., Dong, G.: Minimum description length principle: Generators are preferable to closed patterns. AAA I, 409–414 (2006)
Google Scholar
Gibb, S., Strimmer, K.: MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics 28(17), 2270–2271 (2012)
Article Google Scholar
Savitzky, A., Golay, M.J.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36(8), 1627–1639 (1964)
Article Google Scholar
He, Q.P., Wang, J., Mobley, J.A., Richman, J., Grizzle, W.E.: Self-calibrated warping for mass spectra alignment. Cancer Inf. 10, 65 (2011)
Google Scholar
Fayyad, U., Irani, K.: Multi-interval discretization of continuous valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1029 (1993)
Google Scholar
Kim, H.: Data preprocessing, discretization for classification. R package version 1.0-1 (2010)
Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Article MATH Google Scholar
Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with titanic. Data Knowl. Eng. 42(2), 189–222 (2002)
Article MATH Google Scholar
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings 20th International Conference Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Google Scholar
Li, J.: Prediction by collective likelihood from emerging patterns, US Patent Ap. 10/524,606, 22 August 2002
Google Scholar
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 43–52. ACM (1999)
Google Scholar
Fiedler, G.M., Leichtle, A.B., Kase, J., Baumann, S., Ceglarek, U., Felix, K., Conrad, T., Witzigmann, H., Weimann, A., Schütte, C., et al.: Serum peptidome profiling revealed platelet factor 4 as a potential discriminating peptide associated with pancreatic cancer. Clin. Cancer Res. 15(11), 3812–3819 (2009)
Article Google Scholar
Conrad, T.O., Genzel, M., Cvetkovic, N., Wulkow, N., Leichtle, A., Vybiral, J., Kutyniok, G., Schütte, C.: Sparse proteomics analysis-a compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data. BMC Bioinf. 18(1), 160 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Zuse Institute Berlin, Takustr. 7, 14195, Berlin, Germany
Fahrnaz Jayrannejad & Tim O. F. Conrad
Department of Mathematics, Freie Universität Berlin, Arnimallee 6, Berlin, Germany
Tim O. F. Conrad

Authors

Fahrnaz Jayrannejad
View author publications
You can also search for this author in PubMed Google Scholar
Tim O. F. Conrad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fahrnaz Jayrannejad .

Editor information

Editors and Affiliations

Medical University Graz, Graz, Austria
Andreas Holzinger
University of Alberta, Edmonton, Alberta, Canada
Randy Goebel
Bologna University, Bologna, Italy
Massimo Ferri
Coventry University, Coventry, United Kingdom
Vasile Palade

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jayrannejad, F., Conrad, T.O.F. (2017). Better Interpretable Models for Proteomics Data Analysis Using Rule-Based Mining. In: Holzinger, A., Goebel, R., Ferri, M., Palade, V. (eds) Towards Integrative Machine Learning and Knowledge Extraction. Lecture Notes in Computer Science(), vol 10344. Springer, Cham. https://doi.org/10.1007/978-3-319-69775-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-69775-8_4
Published: 29 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69774-1
Online ISBN: 978-3-319-69775-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics