Infrequent pattern mining in smart healthcare environment using data summarization

Article
  • 19 Downloads

Abstract

A summarization technique creates a concise version of large amount of data (big data!) which reduces the computational cost of analysis and decision-making. There are interesting data patterns, such as rare anomalies, which are more infrequent in nature than other data instances. For example, in smart healthcare environment, the proportion of infrequent patterns is very low in the underlying cyber physical system (CPS). Existing summarization techniques overlook the issue of representing such interesting infrequent patterns in a summary. In this paper, a novel clustering-based technique is proposed which uses an information theoretic measure to identify the infrequent frequent patterns for inclusion in a summary. The experiments conducted on seven benchmark CPS datasets show substantially good results in terms of including the infrequent patterns in summaries than existing techniques.

Keywords

Summarization Clustering Information theory Cyber physical systems Smart health care Big data 

Notes

Acknowledgements

The authors would like to thank Mr. Munir Ahmad Saeed for proof reading the manuscript.

References

  1. 1.
    Zhang Y, Qiu M, Tsai CW, Hassan MM, Alamri A (2017) Health-CPS: healthcare cyber-physical system assisted by cloud and big data. IEEE Syst J 11(1):88–95CrossRefGoogle Scholar
  2. 2.
    Forkan A, Khalil I, Ibaida A, Tari Z (2015) Bdcam: big data for context-aware monitoring—a personalized knowledge discovery framework for assisted healthcare. IEEE Trans Cloud Comput PP(99):1Google Scholar
  3. 3.
    Haque SA, Aziz SM, Rahman M (2014) Review of cyber-physical system in healthcare. Int J Distrib Sens Netw 10(4):217415CrossRefGoogle Scholar
  4. 4.
    Saleem K, Tan Z, Buchanan W (2017) Security for cyber-physical systems in healthcare. Springer, Cham, pp 233–251Google Scholar
  5. 5.
    Islam SMR, Kwak D, Kabir MH, Hossain M, Kwak KS (2015) The internet of things for health care: a comprehensive survey. IEEE Access 3:678–708CrossRefGoogle Scholar
  6. 6.
    Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107CrossRefGoogle Scholar
  7. 7.
    Ahmed M, Mahmood AN, Maher MJ (2015) A novel approach for network traffic summarization. In: Jung JJ, Badica C, Kiss A (eds) Scalable information systems. Springer, Cham, pp 51–60Google Scholar
  8. 8.
    Ahmed M, Mahmood A (2014) Clustering based semantic data summarization technique: a new approach. In: 2014 IEEE 9th Conference on Industrial Electronics and Applications (ICIEA), pp 1780–1785Google Scholar
  9. 9.
    Ahmed M (2018) Data summarization: a survey. Knowl Inf Syst.  https://doi.org/10.1007/s10115-018-1183-0
  10. 10.
    Ahmed M, Mahmood AN, Maher MJ (2015) An efficient technique for network traffic summarization using multiview clustering and statistical sampling. EAI Endorsed Trans Scalable Inf Syst 2(5):1–9Google Scholar
  11. 11.
    Ahmed M (2017) Reservoir-based network traffic stream summarization for anomaly detection. Pattern Anal Appl.  https://doi.org/10.1007/s10044-017-0659-y
  12. 12.
    Ahmed M (2017) An unsupervised approach of knowledge discovery from big data in social network. EAI Endorsed Trans Scalable Inf Syst 17(14):9Google Scholar
  13. 13.
    Ahmed M, Anwar A, Mahmood AN, Shah Z, Maher MJ (2015) An investigation of performance analysis of anomaly detection techniques for big data in scada systems. EAI Endorsed Trans Ind Netw Intell Syst 15(3):5Google Scholar
  14. 14.
    Ahmed M, Mahmood A Naser, Hu J (2016) A survey of network anomaly detection techniques. J Netw Comput Appl 60(C):19–31CrossRefGoogle Scholar
  15. 15.
    Ahmed M, Mahmood AN (2013) A novel approach for outlier detection and clustering improvement. In: 2013 8th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp 577–582Google Scholar
  16. 16.
    Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Future Gener Comput Syst 55:278–288CrossRefGoogle Scholar
  17. 17.
    Ahmed M, Mahmood AN, Hu J (2014) Outlier detection. In: Pathan ASK (ed) The state of the art in intrusion prevention and detection, Chapter 1. CRC Press, New York, pp 3–21CrossRefGoogle Scholar
  18. 18.
    Ahmed M (2017) Infrequent pattern identification in SCADA systems using unsupervised learning. IGI global, pp. 215–225 (In book: Security solutions and applied cryptography in smart grid communications, edited by Ferrag MA and Ahmim A)Google Scholar
  19. 19.
    Cochran William G (1977) Sampling techniques, 3rd edn. Wiley, New YorkMATHGoogle Scholar
  20. 20.
    Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1):37–57MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Pouzols FM, Lopez DR, Barros AB (2011) Summarization and analysis of network traffic flow records. In: Mining and control of network traffic by computational intelligence. Studies in computational intelligence, vol 342. Springer, Berlin, pp 147–189Google Scholar
  22. 22.
    Yager Ronald R (1982) A new approach to the summarization of data. Inf Sci 28(1):69–86MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Cai Y, Cercone N, Han J (1991) Attribute-oriented induction in relational databases. In: Piatetsky-Shapiro G, Frawley WJ (eds) Knowledge discovery in databases. AAAI/MIT Press, Cambridge, pp 213–228Google Scholar
  24. 24.
    Han J, Fu Y, Huang Y, Cai Y, Cercone N (1994) DBLearn: a system prototype for knowledge discovery in relational databases. SIGMOD Rec (ACM Spec Interest Group Manag Data) 23(2):516Google Scholar
  25. 25.
    Han J, Fu Y, Wang W, Chiang J, Gong W, Koperski K, Li D, Lu Y, Rajan A, Stefanovic N, Xia B, Zaiane OR (1996) Dbminer: a system for mining knowledge in large relational databases. In: Proceedings of 1996 International Conference on Data Mining and Knowledge Discovery (KDD’96). AAAI Press, pp 250–255Google Scholar
  26. 26.
    Han J, Cai Y, Cercone N (1992) Knowledge discovery in databases: an attribute oriented approach. In: Proceedings of the 18th International Conference on Very Large Data Bases (VLDB’92). Morgan Kaufmann, pp 547–559Google Scholar
  27. 27.
    Han J, Fu Y (1996) Exploration of the power of attribute-oriented induction. In: Advances in knowledge discovery and data mining. AAAI/MIT Press, Cambridge, pp 399–421Google Scholar
  28. 28.
    Chandola V, Kumar V (2007) Summarization: compressing data into an informative representation. Knowl Inf Syst 12(3):355–378CrossRefGoogle Scholar
  29. 29.
    Hoplaros D, Tari Z, Khalil I (2014) Data summarization for network traffic monitoring. J Netw Comput Appl 37:194–205CrossRefGoogle Scholar
  30. 30.
    Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2015) A hybrid approach to clustering in big data. IEEE Trans Cybern PP(99):1–1Google Scholar
  31. 31.
    Ha-Thuc V, Nguyen D-C, Srinivasan P (2008) A quality-threshold data summarization algorithm. In: Proceedings of IEEE International Conference on Research, Innovation and Vision for the Future (RIVF), pp 240–246Google Scholar
  32. 32.
    Wagstaff L, Shu P, Mazzoni D, Castano R (2005) Semi-supervised data summarization: using spectral libraries to improve hyperspectral clustering. In: The interplanetary network progress report, vol 42Google Scholar
  33. 33.
    Wendel P, Ghanem M, Guo Y (2005) Scalable clustering on the data grid. In: Proceedings of the 5th IEEE International Symposium Cluster Computing and the Grid (CCGrid)Google Scholar
  34. 34.
    More P, Hall L (2004) Scalable clustering: a distributed approach. In: Proceedings of the IEEE International Conference on Fuzzy Systems, vol 1, pp 143–148Google Scholar
  35. 35.
    Pelleg D, Moore AW (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning. ICML ’00. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 727–734Google Scholar
  36. 36.
    Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya A, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279CrossRefGoogle Scholar
  37. 37.
    Ahmed M (2018) Collective anomaly detection techniques for network traffic analysis. Ann Data Sci.  https://doi.org/10.1007/s40745-018-0149-0
  38. 38.
    Ahmed M, Mahmood AN (2014) Network traffic analysis based on collective anomaly detection. In: 2014 9th IEEE Conference on Industrial Electronics and Applications, pp 1141–1146Google Scholar
  39. 39.
    Ahmed M (2017) Thwarting dos attacks: a framework for detection based on collective anomalies and clustering. Computer 50(9):76–82CrossRefGoogle Scholar
  40. 40.
    Ahmed M, Mahmood AN (2013) A novel approach for outlier detection and clustering improvement. In: 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA), pp 577–582Google Scholar
  41. 41.
    Ahmed M, Mahmood AN (2015) Novel approach for network traffic pattern analysis using clustering-based collective anomaly detection. Ann Data Sci 2(1):111–130CrossRefGoogle Scholar
  42. 42.
    Ahmed M, Mahmood AN (2015) Network traffic pattern analysis using improved information theoretic co-clustering based collective anomaly detection. In: Tian J, Jing J, Srivatsa M (eds) International Conference on Security and Privacy in Communication Networks. Springer, Cham, pp 204–219Google Scholar
  43. 43.
    Ahmed M, Choudhury N, Uddin S (2017) Anomaly detection on big data in financial markets. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. ASONAM ’17. ACM, New York, NY, USA, pp 998–1001Google Scholar
  44. 44.
    Asuncion DNA (2007) UCI machine learning repository (Online)Google Scholar
  45. 45.
    Almalawi A, Tari Z, Khalil I, Fahad A (2013) SCADAVT-A framework for SCADA security testbed based on virtualization technology. In: 2013 IEEE 38th Conference on Local Computer Networks (LCN), pp 639–646Google Scholar
  46. 46.
    Suthaharan S, Alzahrani S, Rajasegarar S, Leckie C, Palaniswami M (2010) Labelled data collection for anomaly detection in wireless sensor networks. In: 2010 Sixth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), pp 269–274Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Centre for Cyber Security and GamesCanberra Institute of TechnologyCanberraAustralia

Personalised recommendations