Information Systems Frontiers

, Volume 17, Issue 6, pp 1195–1208 | Cite as

An intelligent approach to data extraction and task identification for process mining

  • Jiexun Li
  • Harry Jiannan Wang
  • Xue Bai


Business process mining has received increasing attention in recent years due to its ability to provide process insights by analyzing event logs generated by various enterprise information systems. A key challenge in business process mining projects is extracting process related data from massive event log databases, which requires rich domain knowledge and advanced database skills and could be very labor-intensive and overwhelming. In this paper, we propose an intelligent approach to data extraction and task identification by leveraging relevant process documents. In particular, we analyze those process documents using text mining techniques and use the results to identify the most relevant database tables for process mining. The novelty of our approach is to formalize data extraction and task identification as a problem of extracting attributes as process components, and relations among process components, using sequence kernel techniques. Our approach can reduce the effort and increase the accuracy of data extraction and task identification for process mining. A business expense imbursement case is used to illustrate our approach.


Business process management Computational experiments Data extraction Process mining Task identification Text mining 



This research was partially supported by a JPMorgan Chase Fellowship from the Institute of Financial Services Analytics at the University of Delaware.


  1. Aldowaisan, T. A., & Gaafar, L. K. (1999). Business process reengineering: an approach for process mapping. Omega, 27(5), 515–24.CrossRefGoogle Scholar
  2. Bunescu, R., & Mooney R. (2005). A Shortest path dependency kernel for relation extraction. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. (pp. 724–731) Vancouver, B.C, Canada: Association for Computational Linguistics.
  3. Bunescu, R., Mooney, R., Weiss, Y., Schölkopf, B., & Platt, J. (2006). Subsequence kernels for relation extraction. Advances in Neural Information Processing Systems, 18, 171–78.Google Scholar
  4. Cobb, C.G. (2004). Enterprise process mapping: Integrating systems for compliance and business excellence. {ASQ} Quality Press.Google Scholar
  5. Cristianini, N., & Shawe-Taylor J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press.Google Scholar
  6. Culotta, A., and J. Sorensen. (2004). Dependency tree kernels for relation extraction. In 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04) (pp. 423–429). Barcelona, Spain.Google Scholar
  7. Dennis, A., Wixom, B.H., and Tegarden D. (2004). Systems analysis and design with UML Version 2.0: An Object-Oriented Approach. Wiley.Google Scholar
  8. Dietterich, T.G., Becker S., Ghahramani Z., Collins M., and Duffy N. (2002). Convolution kernels for natural language. in Advances in Neural Information Processing Systems 14. MIT.Google Scholar
  9. Grigori, D., et al. (2004). Business process intelligence. Computers in Industry, 53, 321–43.CrossRefGoogle Scholar
  10. Günther, C.W., & van der Aalst, W.M.P. (2007). Fuzzy mining: Adaptive process simplification based on multi-perspective metrics. In G. Alonso, P. Dadam, M. Rosemann (Eds.), Lecture Notes in Computer Science: Vol. 4714. Proceedings of the 5th International Conference on Business Process Management (pp. 328–343). Berlin, Heidelberg: Springer-Verlag. doi: 10.1007/978-3-540-75183-0.
  11. Hofacker, I., & Vetschera, R. (2001). Algorithmical approaches to business process design. Computers & Operations Research, 28(13), 1253–75.CrossRefGoogle Scholar
  12. Hunt, V. D. (1996). Process Mapping : How to Reengineer Your Business Processes. Wiley. New YorkGoogle Scholar
  13. Ingvaldsen, J.E. (2011). Semantic process mining of enterprise transaction data. Norwegian University of Science and Technology.Google Scholar
  14. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence 2, pp. 1137–1143. San Francisco, CA, USA: Morgan Kaufmann.Google Scholar
  15. Lafferty, J., McCallum A., & Pereira F. (2001). Conditional random FIelds: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (pp. 282–289). San Francisco.
  16. Li, J., Wang, H. J., Zhang, Z., & Leon Zhao, J. (2010). A policy-based process mining framework: mining business policy texts for discovering process models. Journal of Information Systems and E-Business Management, 8, 169–88.CrossRefGoogle Scholar
  17. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2(3), 419–44.Google Scholar
  18. Madison, D. (2005). Process mapping, process improvement and process management. Paton Press.Google Scholar
  19. Mans, R. S., Schonenberg, M. H., Song, M., & Bakker, P. J. M. (2009). Application of process mining in healthcare – a case study in a dutch hospital. Biomedical Engineering Systems and Technologies, 25, 425–38.CrossRefGoogle Scholar
  20. Reijers, H. A., Limam, S., & van der Aalst, W. M. P. (2003). Product-based workflow design. Journal of Management Information Systems, 20(1), 229–62.Google Scholar
  21. Rodríguez, C., Engel, R., Kostoska, G., Daniel, F., Casati, F., & Aimar, M. (2012). Eventifier: Extracting process execution logs from operational databases. In Proceedings of the 10th International Conference on Business Process Management. Tallinn, Estonia.Google Scholar
  22. Russell, N., van der Aalst W.M.P., ter Hofstede, A. H. M., & Edmond, D. (2005). Workflow resource patterns: Identification, representation and tool support. In Proceedings of the 17th International Conference on Advanced Information Systems Engineering (pp. 216–232). Porto: Portugal.Google Scholar
  23. Van der Aalst, W. (2000). Workflow verification: Finding control-flow errors using petri-net-based techniques. Business Process Management 19–128.Google Scholar
  24. Van der Aalst, W. M. P. (2012). Process mining: overview and opportunities. ACM Transactions on Management Information Systems (TMIS), 3(2), 7.Google Scholar
  25. Van der Aalst, W. M. P., & Weijters, A. (2004). Process mining: a research agenda. Computers in Industry, 53(3), 231–44.CrossRefGoogle Scholar
  26. Van der Aalst, W. M. P., et al. (2007). Business process mining: an industrial application. Information Systems, 32(1), 713–32.CrossRefGoogle Scholar
  27. Van der Aalst, W. M. P., Schonenberg, M. H., & Song, M. (2011). Time prediction based on process mining. Information Systems, 36(2), 450–75.CrossRefGoogle Scholar
  28. Wang, H. J., & Harris, W. (2010). Supporting process design for E-business via an integrated process repository. Information Technology and Management, 12(2), 97–109.CrossRefGoogle Scholar
  29. WFMC. (1999). Interface 1: Process definition interchange {Q&A} and Examples ({WFMC-TC-1016-X)}, Draft 7.01. Workflow Management Coalition.Google Scholar
  30. Zelenko, D., Aone, C., & Richardella, A. (2003). Kernel methods for relation extraction. Journal of Machine Learning Research, 3(6), 1083–1106.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Business Information Systems, College of BusinessOregon State UniversityCorvallisUSA
  2. 2.Department of Accounting and Management Information Systems, Lerner College of Business and EconomicsUniversity of DelawareNewarkUSA
  3. 3.Department of Operations and Information Management, School of BusinessUniversity of ConnecticutStorrsUSA

Personalised recommendations