Skip to main content

A Sequential Data Preprocessing Tool for Data Mining

  • Conference paper
Book cover Computational Science and Its Applications – ICCSA 2014 (ICCSA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8581))

Included in the following conference series:

Abstract

Sequential dataset is a collection of records written and read in sequential order. Information from the sequential dataset is very useful in understanding the sequential patterns and finally making an appropriate decision. However, generating of sequential dataset from log file is quite complicated and difficult. Therefore, in this study we proposed a sequential preprocessing model (SPM) and sequential preprocessing tool (SPT) as an attempt to generate the sequential dataset. The result shows that SPT can be used in generating the sequential dataset. We evaluated the performance of the developed model against the log activities captured from UMT’s e-Learning System called myLearn. With the minimum modification of the dataset, it can be used by other data mining tool for further sequential patterns analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdullah, Z., Herawan, T., Deris, M.M.: Detecting Definite Least Association Rules in Medical Databases. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). LNEE, vol. 285, pp. 127–134. Springer, Singapore (2014)

    Chapter  Google Scholar 

  2. Abdullah, Z., Herawan, T., Deris, M.M.: Mining Indirect Least Association Rule. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). LNEE, vol. 285, pp. 159–166. Springer, Singapore (2014)

    Chapter  Google Scholar 

  3. Abdullah, Z., Herawan, T., Deris, M.M.: Discovering Interesting Association Rules from Student Admission Dataset. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). LNEE, vol. 285, pp. 135–142. Springer, Singapore (2014)

    Chapter  Google Scholar 

  4. Herawan, T., Vitasari, P., Abdullah, Z.: Mining critical least association rules of student suffering language and social anxieties. International Journal of Continuing Engineering Education and Life-Long Learning 23(2), 128–146 (2013)

    Google Scholar 

  5. Abdullah, Z., Herawan, T., Deris, M.M.: Tracing significant association rules using critical least association rules model. International Journal of Innovative Computing and Applications 5(1), 3–17 (2013)

    Article  Google Scholar 

  6. Herawan, T., Noraziah, A., Abdullah, Z., Deris, M.M., Abawajy, J.H.: IPMA: Indirect patterns mining algorithm. In: Nguyen, N.T., Trawiński, B., Katarzyniak, R., Jo, G.-S. (eds.) Adv. Methods for Comput. Collective Intelligence. SCI, vol. 457, pp. 187–196. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. Abdullah, Z., Herawan, T., Deris, M.M.: Mining Highly-Correlated of Least Association Rules using Scalable Trie-based Algorithm. Journal of Chinese Institute of Engineers 35(5), 547–554 (2012)

    Article  Google Scholar 

  8. Herawan, T., Vitasari, P., Abdullah, Z.: Mining interesting association rules on student suffering study anxieties using SLP-Growth algorithm. IGI-Global - International Journal of Knowledge and Systems Science 3(2), 24–41 (2012)

    Article  Google Scholar 

  9. Martinez-Maldonado, R., Yacef, K., Kay, J., Kharrufa, A., Al-Qaraghuli, A.: Analysing frequent sequential patterns of collaborative learning activity around an interactive tabletop. In: 4th International Conference on Educational Data Mining (EDM 2011), pp. 111–120 (2011)

    Google Scholar 

  10. Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proceedings of the 11th International Conference on Data Engineering (ICDE), pp. 3–14. IEEE Computer Society (1995)

    Google Scholar 

  11. Pei, J., Han, J., Wang, W.: Constraint-based Sequential Pattern Mining: the Pattern-Growth Methods. Journal of Intelligence and Information System 28(2), 133–160 (2007)

    Article  Google Scholar 

  12. Minos, G., Hill, M., Rastogi, R., Shim, K.: Mining sequential patterns with regular expression constraints. IEEE Transactions on Knowledge and Data Engineering 14(3), 530–552

    Google Scholar 

  13. Kettner, J., Ebbers, M., O’Brien, W., Ogden, B.: Introduction to the New Mainframe: z/OS Basics. IBM Redbooks, NY (2011)

    Google Scholar 

  14. Bharadwaj, B.K., Pal, S.: Mining Educational Data to Analyze Students Performance. International Journal of Computer Science and Information Security (IJACSA) 6(2), 63–69 (2011)

    Google Scholar 

  15. Cocea, M., Weibelzahl, S.: Eliciting Motivation Knowledge from Log Files Towards Motivation Diagnosis for Adaptive Systems. In: Conati, C., McCoy, K., Paliouras, G. (eds.) UM 2007. LNCS (LNAI), vol. 4511, pp. 197–206. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  16. Masseglia, F., Tanasa, D., Trousse, B.: Web Usage Mining: Sequential Pattern Extraction with a Very Low Support. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 513–522. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. Romero, C., Ventura, S., García, E.: Data mining in course management systems: Moodle case study and tutorial. Computers & Education 51(1), 368–384 (2008)

    Article  Google Scholar 

  18. Lile, A.: Analyzing E-Learning Systems Using Educational Data Mining Techniques. Mediterranean Journal of Social Sciences 2(3), 403–419 (2011)

    Google Scholar 

  19. Wahab, M.H.A., Mohd, M.N., Hanafi, H.F., Mohsin, M.F.M.: Data Pre-processing on Web Server Logs for Generalized. Proceedings of World Academic of Science, Engineering and Technology 26, 970–977 (2008)

    Google Scholar 

  20. Castellano, G., Fanelli, A.M., Torsello, M.A.: Log Data Preparation for Mining Web Usage Patterns. In: IADIS International Conference Applied Computing, pp. 371–378 (2007)

    Google Scholar 

  21. Salama, S.E., Marie, M.I., El-Fangary, L.M., Helmy, Y.K.: Web Server Logs for Preprocessing for Web Intrusion Detection. Computer and Information Science 4(4), 123–133 (2011); Canadian Center of Science and Education

    Google Scholar 

  22. Li, Y., Feng, B., Mao, Q.: Research on Path Completion Technique in Web Usage Mining. In: IEEE International Symposium on Computer Science and Computational Technology, pp. 554–559 (2008)

    Google Scholar 

  23. Patil, P., Patil, U.: Preprocessing of web server log file for web mining. World Journal of Science and Technology 2(3), 14–18 (2012)

    Google Scholar 

  24. Zhang, G., Zhang, M.: The Algorithm of Data Preprocessing in Web Log Mining Based on Cloud Computing. In: Proc. of International Conference on Information Technology and Management Science, pp. 468–474 (2012)

    Google Scholar 

  25. Valsamidis, S., Kontogiannis, S., Kazanidis, I., Theodosiou, T., Karakos, A.: A Clustering Methodology of Web Log Data for Learning Management Systems. Educational Technology & Society 15(2), 154–167 (2012)

    Google Scholar 

  26. Marija Blagojevic, M., Micic, Z.: Contribution to the Creation Of DMX Queries in Mining Student Data. Int. J. Emerg. Sci. 2(3), 334–344 (2012)

    Google Scholar 

  27. Romero, C., Porras, A., Ventura, S., Hervas, C., Zafra, A.: Using Sequential Pattern Mining for Links Recommendation in Adaptive Hypermedia Educational Systems. In: International Conference Current Developments in Technology-Assisted Educations, pp. 1015–1020 (2006)

    Google Scholar 

  28. Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.-C.: FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining. In: Proc. 2000 ACM SIGKDD Int’l Conf. Knowledge Discovery in Databases (KDD 2000), pp. 355–359 (2000)

    Google Scholar 

  29. Agrawal, R., Srikant, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Proceedings of the Fifth Int. Conference on Extending Database Technology, pp. 3–17. Avignon, France (1996)

    Google Scholar 

  30. Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PADKK 2000), Current Issues and New Applications, pp. 396–407 (2000)

    Google Scholar 

  31. Zaki, M.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning 40, 31–60 (2001)

    Article  Google Scholar 

  32. Pei, J., Han, J., Mortazavi-Asl, W.J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach. IEEE Transactions on Knowledge and Data Engineering 16(10), 1–17 (2004)

    Article  Google Scholar 

  33. Shie, B.-E., Hsiao, H.-F., Tseng, V.S., Yu, P.S.: Mining high utility mobile sequential patterns in mobile commerce environments. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011, Part I. LNCS, vol. 6587, pp. 224–238. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  34. Ahmed, C.F., Tanbeer, S.K., Jeong, B.S.: Mining High Utility Web Access Sequences in Dynamic Web Log Data. In: Proceeding of: 11th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, SNPD 2010, pp. 76–81 (2010)

    Google Scholar 

  35. Yin, J., Zheng, Z., Cao, L.: USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012, pp. 660–668 (2012)

    Google Scholar 

  36. Kalia, H., Dehuri, S., Ghosh, A.: A Survey on Fuzzy Association Rule Mining. International Journal of Data Warehousing and Mining 9(1), 1–27 (2013)

    Article  Google Scholar 

  37. Priya, R.V., Vadivel, A.: User Behaviour Pattern Mining from Weblog. International Journal of Data Warehousing and Mining 8(2), 1–22 (2012)

    Article  Google Scholar 

  38. Taniar, D., Goh, J.: On Mining Movement Pattern from Mobile Users. International Journal of Distributed Sensor Networks 3(1), 69–86 (2007)

    Article  Google Scholar 

  39. Daly, O., Taniar, D.: Exception Rules Mining Based on Negative Association Rules. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds.) ICCSA 2004. LNCS, vol. 3046, pp. 543–552. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  40. Taniar, D., Rahayu, W., Lee, V.C.S., Daly, O.: Exception rules in association rule mining. Applied Mathematics and Computation 205(2), 735–750 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  41. Ashrafi, M.Z., Taniar, D., Smith, K.A.: Redundant association rules reduction techniques. International Journal of Business Intelligence and Data Mining 2(1), 29–63 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Abdullah, Z., Herawan, T., Chiroma, H., Deris, M.M. (2014). A Sequential Data Preprocessing Tool for Data Mining. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8581. Springer, Cham. https://doi.org/10.1007/978-3-319-09150-1_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09150-1_54

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09149-5

  • Online ISBN: 978-3-319-09150-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics