A Hierarchical Method for Clustering Binary Text Image
Image clustering is a crucial task in image retrieving, filtering and organizing. Most of recent work focuses on dealing with color images or gray scale images with features extracted from text content, annotation or image content. This paper aims at binary text images and proposes a novel clustering method that can be used for automatic image procession in digital library and automatic office. The method is divided into three main steps. Firstly images are preprocessed to denoise, correct orientation and produce coarse classes. Secondly, features are extracted and similar images are grouped into new classes with hierarchical clustering algorithm. At last new classes are combined to the nearest old ones under distance condition. To speed clustering Local Sensitive Hash algorithm is imported for boosting merging procedure. Experiments show that this method is faster and efficient compared with the basic clustering method.
Keywordsbinary text image hierarchical cluster LSH
Unable to display preview. Download preview PDF.
- 1.Xiang, Y.-J., Xie, S.-L.: Survey of image retrieving techniques. Journal of Chongqing University of Posts and Telecommunications (Natural Science) 18(3) (2006)Google Scholar
- 4.Yu, L.-S., Zhang, T.-W.: Image Clustering Based on Correlation Between Visual Features and Annotations. Actael Ectronica Sinica 34(7) (2006)Google Scholar
- 5.Hu, Z., Lin, X., Yan, H.: Document image retrieval based on multi-density features. Journal of Tsinghua Univ (Sci. & Tech.) 46(7) (2006)Google Scholar
- 6.Liu, Z., Zhuang, Y.: A Comparative and Analysis Study of Ten color Feature—based Image Retrieval Algorithms. Signal Processing 16(1) (2000)Google Scholar
- 7.Wang, C., Chen, T., Chan, Y., Hwang, R., Huang, W.: Chinese document image retrieval system based on proportion of black pixel area in a character image. In: Proc. 6th ICACT, pp. 25–29 (2004)Google Scholar
- 8.Guan, X.-P., Zhao, L.-X., Tang, Y.-G.: Mixed Filter for Image Denoising. Journal of Image and Graphics 10(3) (2005)Google Scholar
- 9.Qu, Y., Yang, L.-P.: Hough Transform OCR Image Slant Correction Method. Journal Of Image and Graphics 6(A)(2) (2001)Google Scholar
- 10.Lu, X.-B., Bao, M., Huang, W.: Projection Based Skew Detection of Vehicle License Plate. Journal of Transportation Engineering and Information 2(4) (2004)Google Scholar
- 11.Wang, T., Zhu, Y., Wang, H.: Document Images Skew Correction Based on Run-length Smoothing. Computer Engineering 30(1) (2004)Google Scholar
- 12.Andoni, A., Indyk, P.: E2LSH 0.1 User Manual (2006)Google Scholar
- 13.Yang, Y., Jin, F., Kamel, M.: Survey of clustering validity evaluation. Application Research of Computers 25(6) (2008)Google Scholar