Advertisement

Banded Pattern Mining Algorithms in Multi-dimensional Zero-One Data

  • Fatimah B. AbdullahiEmail author
  • Frans Coenen
  • Russell Martin
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9670)

Abstract

A zero-one high-dimensional data set is said to be banded if all the dimensions can be reorganised such that the “non zero” entries are arranged along the leading diagonal across the dimensions. Our goal is to develop effective algorithms that identify banded patterns in multi-dimensional zero-one data by automatically rearranging the ordering of all the dimensions. Rearranging zero-one data so as to feature “bandedness” allows for the identification of hidden information and enhances the operation of many data mining algorithms (and other algorithms) that work with zero-one data. In this paper two N-Dimensional Banded Pattern Mining (NDBPM) algorithms are presented. The first is an approximate algorithm (NDBPM\(_{APPROX}\)) and the second an exact algorithm (NDBPM\(_{EXACT}\)). Two variations of NDBPM\(_{EXACT}\) are presented (Euclidean and Manhattan). Both algorithms are fully described together with evaluations of their operation.

Keywords

Banded patterns Zero-one data Banded Pattern Mining 

References

  1. 1.
    Abdullahi, F.B., Coenen, F., Martin, R.: A scalable algorithm for banded pattern mining in multi-dimensional zero-one data. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 345–356. Springer, Heidelberg (2014)Google Scholar
  2. 2.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD 1993, pp. 207–216 (1993)Google Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings 20th International Conference on Very Large Data Bases (VLDB 1994), pp. 487–499 (1994)Google Scholar
  4. 4.
    Alizadeh, F., Karp, R.M., Newberg, L.A., Weisser, D.K.: Physical mapping of chromosomes: a combinatorial problem in molecular biology. Algorithmica 13, 52–76 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Baeza-Yates, R., RibeiroNeto, B.: Modern Information Retrieval. Addison-Wesley, Wokingham (1999)Google Scholar
  6. 6.
    Blake, C.I., Merz, C.J.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/mlearn/MLRepository.htm
  7. 7.
    Cheng, K.Y.: Minimising the bandwidth of sparse symmetric matrices. Computing 11, 103–110 (1973)CrossRefzbMATHGoogle Scholar
  8. 8.
    Coenen, F., Goulbourne, G., Leng, P.: Computing association rules using partial totals. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 54–66. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  9. 9.
    Cuthill, A.E., McKee, J.: Reducing bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 29th ACM National Conference, pp. 157–172 (1969)Google Scholar
  10. 10.
    Fortelius, M., Kai Puolamaki, M.F., Mannila, H.: Seriation in paleontological data using Markov Chain Monte method. PLoS Comput. Biol. 2, e6 (2006)CrossRefGoogle Scholar
  11. 11.
    Gemma, G.C., Junttila, E., Mannila, H.: Banded structures in binary matrices. Knowl. Discov. Inf. Syst. 28, 197–226 (2011)CrossRefGoogle Scholar
  12. 12.
    Green, D.M., Kao, R.R.: Data quality of the Cattle Tracing System in great Britain. Vet. Rec. 161, 439–443 (2007)CrossRefGoogle Scholar
  13. 13.
    Junttila, E.: Pattern in Permuted Binary Matrices. PhD thesis (2011)Google Scholar
  14. 14.
    Von Luxburg, U.A.: A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Makinen, E., Siirtola, H.: The barycenter heuristic and the reorderable matrix. Informatica 29, 357–363 (2005)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Mannila, H., Terzi, E.: Nestedness and segmented nestedness. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2007, New York, NY, USA, pp. 480–489 (2007)Google Scholar
  17. 17.
    Mueller, C.: Sparse matrix reordering algorithms for cluster identification. Mach. Learn. Bioinform. 1532 (2004)Google Scholar
  18. 18.
    Papadimitrious, C.H.: The NP-completeness of the bandwidth minimisation problem. Computing 16, 263–270 (1976)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Nohuddin, P.N.E., Christley, R., Coenen, F., Setzkorn, C.: Trend mining in social networks: a study using a large cattle movement database. In: Perner, P. (ed.) ICDM 2010. LNCS, vol. 6171, pp. 464–475. Springer, Heidelberg (2010)Google Scholar
  20. 20.
    Robinson, S., Christley, R.M.: Identifying temporal variation in reported birth, death and movements of cattle in Britain. BMC Vet. Res. 2, 11 (2006)CrossRefGoogle Scholar
  21. 21.
    Rosen, R.: Matrix bandwidth minimisation. In: ACM National Conference Proceedings, pp. 585–595 (1968)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Fatimah B. Abdullahi
    • 1
    Email author
  • Frans Coenen
    • 1
  • Russell Martin
    • 1
  1. 1.The Department of Computer ScienceThe University of LiverpoolLiverpoolUK

Personalised recommendations