A Hierarchical Method for Clustering Binary Text Image

  • Yiguo Pu
  • Jinqiao Shi
  • Li Guo
Part of the Communications in Computer and Information Science book series (CCIS, volume 320)


Image clustering is a crucial task in image retrieving, filtering and organizing. Most of recent work focuses on dealing with color images or gray scale images with features extracted from text content, annotation or image content. This paper aims at binary text images and proposes a novel clustering method that can be used for automatic image procession in digital library and automatic office. The method is divided into three main steps. Firstly images are preprocessed to denoise, correct orientation and produce coarse classes. Secondly, features are extracted and similar images are grouped into new classes with hierarchical clustering algorithm. At last new classes are combined to the nearest old ones under distance condition. To speed clustering Local Sensitive Hash algorithm is imported for boosting merging procedure. Experiments show that this method is faster and efficient compared with the basic clustering method.


binary text image hierarchical cluster LSH 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Xiang, Y.-J., Xie, S.-L.: Survey of image retrieving techniques. Journal of Chongqing University of Posts and Telecommunications (Natural Science) 18(3) (2006)Google Scholar
  2. 2.
    Song, Y., et al.: Research on medical image clustering Based on Approximate density Function. Journal of Computer Research and Development 43(11), 1947–1952 (2006)CrossRefGoogle Scholar
  3. 3.
    Chang, F.: Retrieving Information from Document Images: Problem and Solution. International Journal on Document Analysis and Recognition 4(1), 46–55 (2001)CrossRefGoogle Scholar
  4. 4.
    Yu, L.-S., Zhang, T.-W.: Image Clustering Based on Correlation Between Visual Features and Annotations. Actael Ectronica Sinica 34(7) (2006)Google Scholar
  5. 5.
    Hu, Z., Lin, X., Yan, H.: Document image retrieval based on multi-density features. Journal of Tsinghua Univ (Sci. & Tech.) 46(7) (2006)Google Scholar
  6. 6.
    Liu, Z., Zhuang, Y.: A Comparative and Analysis Study of Ten color Feature—based Image Retrieval Algorithms. Signal Processing 16(1) (2000)Google Scholar
  7. 7.
    Wang, C., Chen, T., Chan, Y., Hwang, R., Huang, W.: Chinese document image retrieval system based on proportion of black pixel area in a character image. In: Proc. 6th ICACT, pp. 25–29 (2004)Google Scholar
  8. 8.
    Guan, X.-P., Zhao, L.-X., Tang, Y.-G.: Mixed Filter for Image Denoising. Journal of Image and Graphics 10(3) (2005)Google Scholar
  9. 9.
    Qu, Y., Yang, L.-P.: Hough Transform OCR Image Slant Correction Method. Journal Of Image and Graphics 6(A)(2) (2001)Google Scholar
  10. 10.
    Lu, X.-B., Bao, M., Huang, W.: Projection Based Skew Detection of Vehicle License Plate. Journal of Transportation Engineering and Information 2(4) (2004)Google Scholar
  11. 11.
    Wang, T., Zhu, Y., Wang, H.: Document Images Skew Correction Based on Run-length Smoothing. Computer Engineering 30(1) (2004)Google Scholar
  12. 12.
    Andoni, A., Indyk, P.: E2LSH 0.1 User Manual (2006)Google Scholar
  13. 13.
    Yang, Y., Jin, F., Kamel, M.: Survey of clustering validity evaluation. Application Research of Computers 25(6) (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Yiguo Pu
    • 1
    • 2
    • 3
  • Jinqiao Shi
    • 1
    • 3
  • Li Guo
    • 1
    • 3
  1. 1.Institute of Information EngineeringCASChina
  2. 2.Graduate University, CASChina
  3. 3.Chinese National Engineering Laboratory for Information Security TechnologiesChina

Personalised recommendations