Skip to main content

A Robust Approach for Multivariate Binary Vectors Clustering and Feature Selection

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7063))

Abstract

Given a set of binary vectors drawn from a finite multiple Bernoulli mixture model, an important problem is to determine which vectors are outliers and which features are relevant. The goal of this paper is to propose a model for binary vectors clustering that accommodates outliers and allows simultaneously the incorporation of a feature selection methodology into the clustering process. We derive an EM algorithm to fit the proposed model. Through simulation studies and a set of experiments involving handwritten digit recognition and visual scenes categorization, we demonstrate the usefulness and effectiveness of our method.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. of KDD, pp. 226–231 (1996)

    Google Scholar 

  2. Bouguila, N., Daoudi, K.: A Statistical Approach for Binary Vectors Modeling and Clustering. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 184–195. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  3. Bouguila, N., Daoudi, K.: Learning Concepts from Visual Scenes Using a Binary Probabilistic Model. In: Proc. of IEEE International Workshop on Multimedia Signal Processing (MMSP), pp. 1–5 (October 2009)

    Google Scholar 

  4. Abend, K., Harley, T.J., Kanal, L.N.: Classification of Binary Random Patterns. IEEE Transactions on Information Theory 11(4), 538–544 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  5. Aitchison, J., Aitken, C.G.G.: Multivariate Binary Discrimination by the Kernel Method. Biometrika 63(3), 413–420 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bezdek, J.C.: Feature Selection for Binary Data: Medical Diagnosis with Fuzzy Sets. In: Proc. of the National Computer Conference and Exposition, New York, NY, USA, pp. 1057–1068 (1976)

    Google Scholar 

  7. Moore II, D.H.: Evaluation of Five Discrimination Procedures for Binary Variables. Journal of the American Statistical Association 68(342), 399–404 (1973)

    Article  Google Scholar 

  8. Saund, E.: Unsupervised Learning of Mixtures of Multiple Causes in Binary Data. In: Advances in Neural Information Processing Systems (NIPS), pp. 27–34 (1993)

    Google Scholar 

  9. Bouguila, N.: On multivariate binary data clustering and feature weighting. Computational Statistics & Data Analysis 54(1), 120–134 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  10. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: Proc. of the ACM SIGMOD International Conference on Management of Data (MOD), pp. 93–104 (2000)

    Google Scholar 

  11. Boutemedjet, S., Ziou, D., Bouguila, N.: Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data. In: Advances in Neural Information Processing Systems (NIPS), pp. 177–184 (2007)

    Google Scholar 

  12. Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous Feature Selection and Clustering Using Mixture Models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(9), 1154–1166 (2004)

    Article  Google Scholar 

  13. Schwarz, G.: Estimating the Dimension of a Model. Annals of Statistics 16, 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  14. Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Proc. of ICML, pp. 148–156 (1996)

    Google Scholar 

  15. Blake, C.L., Merz, C.J.: Repository of Machine Learning Databases. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  16. Jeannin, S., Bober, M.: Description of core experiments for MPEG-7 motion/shape. Technical Report ISO/IEC JTC 1/SC 29/WG 11 MPEG99/N2690, MPEG-7 Visual Group, Seoul (March 1999)

    Google Scholar 

  17. Everingham, M., Zisserman, A., Williams, C.K.I., Van Gool, L., Allan, M., Bishop, C.M., Chapelle, O., Dalal, N., Deselaers, T., Dorkó, G., Duffner, S., Eichhorn, J., Farquhar, J.D.R., Fritz, M., Garcia, C., Griffiths, T., Jurie, F., Keysers, D., Koskela, M., Laaksonen, J., Larlus, D., Leibe, B., Meng, H., Ney, H., Schiele, B., Schmid, C., Seemann, E., Shawe-Taylor, J., Storkey, A.J., Szedmak, S., Triggs, B., Ulusoy, I., Viitaniemi, V., Zhang, J.: The 2005 PASCAL Visual Object Classes Challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 117–176. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)

    Article  Google Scholar 

  19. Ke, Y., Sukthankar, R.: PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. In: Proc. of IEEE CVPR, pp. 506–513 (2004)

    Google Scholar 

  20. Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: Proc. of 24rd International Conference on Very Large Data Bases (VLDB), pp. 392–403 (1998)

    Google Scholar 

  21. Durst, R., Champion, T., Witten, B., Miller, E., Spagnuolo, L.: Testing and Evaluating Computer Intrusion Detection Systems. Commun. ACM 42, 53–61 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mashrgy, M.A., Bouguila, N., Daoudi, K. (2011). A Robust Approach for Multivariate Binary Vectors Clustering and Feature Selection. In: Lu, BL., Zhang, L., Kwok, J. (eds) Neural Information Processing. ICONIP 2011. Lecture Notes in Computer Science, vol 7063. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24958-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24958-7_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24957-0

  • Online ISBN: 978-3-642-24958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics