Improving the Predictive Power of Business Performance Measurement Systems by Constructed Data Quality Features? Five Cases

Vattulainen, Markus

doi:10.1007/978-3-319-20910-4_1

Markus Vattulainen⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9165))

Included in the following conference series:

Industrial Conference on Data Mining

1452 Accesses
1 Citations

Abstract

Predictive power is an important objective for current business performance measurement systems and it is based on metrics design, collection and preprocessing of data and predictive modeling. A promising but less studied preprocessing activity is to construct additional features that can be interpreted to express the quality of data and thus provide predictive models not only data points but also their quality characteristics. The research problem addressed in this study is: can we improve the predictive power of business performance measurement systems by constructing additional data quality features? Unsupervised, supervised and domain knowledge approaches were used to operationalize eight features based on elementary data quality dimensions. In the case studies five corporate datasets Toyota Material Handling Finland, Innolink group, 3StepIt, Papua Merchandising and Lempesti constructed data quality features performed better than minimally processed data sets in 29/38 and equally in 9/38 tests. Comparison to a competing method of preprocessing combinations with the first two datasets showed that constructed features had slightly lower prediction performance, but they were clearly better in execution time and easiness of use. Additionally, constructed data quality features helped to visually explore high dimensional data quality patterns. Further research is needed to expand the range of constructed features and to map the findings systematically to data quality concepts and practices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdul-Rahmana, S., Abu Bakara, A., Hussein, B., Zeti, A.: An intelligent data pre-processing of complex datasets. Intell. Data Anal. 16, 305–325 (2012)
Google Scholar
Bellman, R.E.: Dynamic Programming. Rand Corporation, Princeton University Press, New Jersey (1957)
MATH Google Scholar
Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F.: Guide to Intelligent Data Analysis – How to Intelligently Make Sense of Real Data. Springer, London (2010)
Book MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of ACM SIGMOD 2000 International Conference on Management of Data, pp. 93–104 (2000)
Google Scholar
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection for libraries of models. In: Proceedings of ICML, p. 18 (2004)
Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15 (2009)
Article Google Scholar
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: Crisp-Dm 1.0 Step by Step Data Mining Guide. Crisp-DM Consortium (2000)
Google Scholar
Crone, S.F., Lessmann, S., Stahlbock, R.: The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur. J. Oper. Res. 173(3), 781–800 (2005)
Article MathSciNet Google Scholar
Engel, J., Gerretzen, J., Szymanka, E., Jeroen, J.J., Downey, G., Blanchet, L., Buydens, L.: Breaking with trends in preprocessing. TrAC Trends in Analytical Chemistry 50, 96–106 (2013)
Article Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The KDD process for extracting useful knowledge from volumes of data. Commun. ACM 39(11), 27–34 (1996)
Article Google Scholar
Filzmoser, P., Maronna, R., Werner, M.: Outlier identification in high dimensions. Comput. Stat. Data Anal. 52(3), 1694–1711 (2008)
Article MATH MathSciNet Google Scholar
Franco-Santos, M., Kennerley, M., Micheli, P., Martinez, V., Mason, S., Marr, B., Gray, D., Neely, A.: Towards a definition of a business performance measurement system. Int. J. Oper. Prod. Manag. 27(8), 784–801 (2007)
Article Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1995)
Article MathSciNet Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Han, J., Kamber, M., Pei, J.: Data mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2012)
Book Google Scholar
Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A Practical Guide to Support Vector Classification. Taiwan National University, Taipei (2010)
Google Scholar
Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
Article MATH Google Scholar
Hu, M.-X., Salvucci, S.: A Study of Imputation Algorithms, Institure of Education Science, NCES, New York (1991)
Google Scholar
Järvinen, P.: On Research Methods. Opinpajan kirja, Tampere (2012)
Google Scholar
Kaplan, R.S., Norton, D.P.: the balanced scorecard – measures that drive performance. Harvard Bus. Rev. 71(1), 71–79 (1992)
Google Scholar
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256 (1992)
Google Scholar
Kitchenham, B., Brereton, O.P., Budgen, D., Turner, M., Bailey, J., Linkman, S.: Systematic literature reviews in software engineering - a systematic literature review. J. Inf. Softw. Technol. 51(1), 7–15 (2009)
Article Google Scholar
Kriegel, H.-P., Borgwardt, K.M., Kröger, P., Pryakhin, A., Schubert, M., Zimek, A.: Future trends in data mining. Data Min. Knowl. Disc. 15(1), 87–97 (2007)
Article Google Scholar
Kriegel, H.-P., Kröger, P., Zimek, A.: Outlier detection techniqes. In: 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC (2010)
Google Scholar
Kuhn, M., Johnson, K.: Applied Predictive Modeling. Springer, New York (2013)
Book MATH Google Scholar
Kochanski, A., Perzyk, M., Klebczyk, M.: Knowledge in imperfect data in advances in knowledge representation. In: Ramirez, C. (ed), DOI: 10.5772/37714. http://www.intechopen.com/books/advances-inknowledge-representation/knowledge-in-imperfect-data (2012)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Article MATH Google Scholar
Kotsiantis, S.B., Kanellopoulos, D., Pintelas, P.E.: Data preprocessing for supervised learning. Int. J. Comput. Sci. 2, 111–117 (2006)
Google Scholar
Ludmila, K.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, New Jersey (2004)
Google Scholar
Longadge, R., Dongre, S.S., Malik, L.: Class imbalance problem in data mining: review. Int. J. Comput. Sci. Netw. 2(1) (2013)
Google Scholar
March, S., Smith, G.: Design and natural science research on information technology. J. Decis. Support Syst. 15(4), 251–266 (1995)
Article Google Scholar
Nørreklit, H.: The balance on the balanced scorecard—a critical analysis of some of its assumptions. Manag. Acc. Res. 11(1), 65–88 (2000)
Article Google Scholar
Peltonen, J.: Dimensionality Reduction. Lecture Series, University of Tampere (2014)
Google Scholar
Pyle, D.: Data Preparation for Data Mining. Morgan Kauffman, San Francisco (2003)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kauffman, San Francisco (1993)
Google Scholar
Sadiq, S., Khodabandehloo, Y.N., Induska, M.: 20 Years of data quality research: themes, trends and synergies. In: ADC 2011 Proceedings of the Twenty-Second Australasian Database Conference, vol. 115, pp. 153–162 (2011)
Google Scholar
Torgo, L.: Data Mining with R: Learning with Case Studies. CRC Press, Boca Raton (2010)
Book Google Scholar
Vattulainen, M.: A method to improve the predictive power of a business performance measurement system by data preprocessing combinations: two cases in predictive classification of service sales volume from balanced data. In: Ghazawneh, A., Nørbjerg, J., Pries-Heje, J. (eds.) Proceedings of the 37th Information Systems Research Seminar in Scandinavia (IRIS 37), Ringsted, Denmark, pp.10–13 (2014)
Google Scholar
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Article Google Scholar
Wand, Y., Wang, R.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (1996)
Article Google Scholar
Wu, X., Zhu, X., Wu, G.-Q., Ding, W.: Data mining with big data. IEEE Trans. Kowl. Disc. Data Eng. 26(1), 97–107 (2013)
Google Scholar
Wolpert, D.: Stacked generalization. Neural Netw. 5, 241–259 (1992)
Article Google Scholar
Yang, Q., Wu, X.: 10 Challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)
Article Google Scholar
Zhao, H., Sudra, R.: Entity identification for heterogenous database integration —a multiple classifier system approach and empirical evaluation. Inf. Syst. 30(2), 119–132 (2005)
Article Google Scholar

Download references

Acknowledgements

Professor emeritus Pertti Järvinen, professor Martti Juhola and Dr. Kati Iltanen University of Tampere, Finland. After sales director Jarmo Laamanen Toyota Material Handling Finland, managing director Marko Kukkola Innolink Group, sales director Mika Karjalainen 3StepIt, managing director Olli Vaaranen Papua Merchandising and managing director Sirpa Kauppila Lempesti.

Author information

Authors and Affiliations

University of Tampere, Tampere, Finland
Markus Vattulainen

Authors

Markus Vattulainen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markus Vattulainen .

Editor information

Editors and Affiliations

IBaI, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vattulainen, M. (2015). Improving the Predictive Power of Business Performance Measurement Systems by Constructed Data Quality Features? Five Cases. In: Perner, P. (eds) Advances in Data Mining: Applications and Theoretical Aspects. ICDM 2015. Lecture Notes in Computer Science(), vol 9165. Springer, Cham. https://doi.org/10.1007/978-3-319-20910-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-20910-4_1
Published: 20 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20909-8
Online ISBN: 978-3-319-20910-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics