Skip to main content

Responsible Process Mining - A Data Quality Perspective

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11675))

Abstract

Modern organisations consider data to be their lifeblood. The potential benefits of data-driven analyses include a better understanding of business performance and more-informed decision making for business growth. A key road block to this vision is the lack of transparency surrounding the quality of data. A process mining study that utilises low-quality, unrepresentative data as input has little or no value for the organisation and becomes a catalyst for erroneous conclusions (‘Garbage-in-Garbage-out’). Many process mining techniques do not take into account inherent inaccuracies in the data, or how the data might have been manipulated or pre-processed. It is thus impossible to ascertain the degree to which analysis outcomes can be relied upon. This tutorial paper outlines foundational concepts of data quality with a special focus on typical data quality issues found in event data used for process mining analyses. Key challenges and possible approaches to tackle these data quality problems are elaborated on.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#58f51e5d6f63.

  2. 2.

    http://processminingbook.com.

  3. 3.

    https://mimic.physionet.org/.

References

  1. van der Aalst, W., et al.: Process mining manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM 2011. LNBIP, vol. 99, pp. 169–194. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28108-2_19

    Chapter  Google Scholar 

  2. Van der Aalst, W.M.P.: Process Mining: Data Science in Action, 2nd edn. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4d

    Book  Google Scholar 

  3. van der Aalst, W.M.P., Bichler, M., Heinzl, A.: Responsible data science. Bus. Inf. Syst. Eng. 59(5), 311–313 (2017)

    Article  Google Scholar 

  4. Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41(3), 16 (2009)

    Article  Google Scholar 

  5. Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-33173-5

    Book  MATH  Google Scholar 

  6. Bose, J.C., Mans, R., van der Aalst, W.M.P.: Wanna improve process mining results - it’s high time we consider data quality issues seriously. In: IEEE Symposium on Computational Intelligence and Data Mining, pp. 127–134 (2013)

    Google Scholar 

  7. Dixit, P.M., et al.: Detection and interactive repair of event ordering imperfection in process logs. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 274–290. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_17

    Chapter  Google Scholar 

  8. Eppler, M.J.: Managing Information Quality: Increasing the Value of Information in Knowledge-intensive Products and Processes. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-32225-6

    Book  Google Scholar 

  9. Jayawardene, V., Sadiq, S., Indulska, M.: The curse of dimensionality in data quality. In: 24th Australasian Conference on Information Systems (ACIS), pp. 1–12. RMIT University (2013)

    Google Scholar 

  10. Juran, J., Godfrey, A.: Quality Handbook. Republished McGraw-Hill, New York (1999)

    Google Scholar 

  11. Lee, Y.W., Strong, D.M., Kahn, B.K., Wang, R.Y.: AIMQ: a methodology for information quality assessment. Inf. Manag. 40(2), 133–146 (2002)

    Article  Google Scholar 

  12. Lu, X., Fahland, D.: A conceptual framework for understanding event data quality for behavior analysis. In: Kopp, O., Lenhard, J., Pautasso, C. (eds.) Central European Workshop on Services and their Composition ZEUS. CEUR Workshop Proceeedings, vol. 1826, pp. 11–14 (2017)

    Google Scholar 

  13. Lu, X., et al.: Semi-supervised log pattern detection and exploration using event concurrence and contextual information. In: Panetto, H., et al. (eds.) OTM 2017. LNCS, vol. 10573, pp. 154–174. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69462-7_11

    Chapter  Google Scholar 

  14. Scannapieco, M., Catarci, T.: Data quality under a computer science perspective. Arch. Comput. 2, 1–15 (2002)

    Google Scholar 

  15. Srivastava, D., Scannapieco, M., Redman, T.C.: Ensuring high-quality private data for responsible data science: vision and challenges. J. Data Inf. Qual. 11(1), 1:1–1:9 (2019)

    Google Scholar 

  16. Stvilia, B., Gasser, L., Twidale, M.B., Smith, L.C.: A framework for information quality assessment. J. Am. Soc. Inform. Sci. Technol. 58(12), 1720–1733 (2007)

    Article  Google Scholar 

  17. Suriadi, S., Andrews, R., ter Hofstede, A.H.M., Wynn, M.T.: Event log imperfection patterns for process mining: towards a systematic approach to cleaning event logs. Inf. Syst. 64, 132–150 (2017)

    Article  Google Scholar 

  18. Verhulst, R.: Evaluating quality of event data within event logs: an extensible framework. Master’s thesis, Technische Universiteit Eindhoven, August 2016

    Google Scholar 

  19. Wang, R.Y.: A product perspective on total data quality management. Commun. ACM 41(2), 58–65 (1998)

    Article  Google Scholar 

  20. Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–33 (1996)

    Article  Google Scholar 

  21. Yeganeh, N.K., Sadiq, S., Sharaf, M.A.: A framework for data quality aware query systems. Inf. Syst. 46, 24–44 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the input from QUT researchers (Professor ter Hofstede, Dr Andrews, Dr Suriadi and Dr Poppe) who work on this topic. This work is partly supported by ARC Discovery Project DP190102141 on Building Crowd Sourced Data Curation Processes.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moe Thandar Wynn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wynn, M.T., Sadiq, S. (2019). Responsible Process Mining - A Data Quality Perspective. In: Hildebrandt, T., van Dongen, B., Röglinger, M., Mendling, J. (eds) Business Process Management. BPM 2019. Lecture Notes in Computer Science(), vol 11675. Springer, Cham. https://doi.org/10.1007/978-3-030-26619-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26619-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26618-9

  • Online ISBN: 978-3-030-26619-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics