Abstract
In this paper, the problem of spam filtering for images, a type of fast-spreading spam where the text is included in images to overcome the text-based spam filter. One common method for detecting spam is the optical character recognition system (OCR) that detecting and recognizing the text embedded, following by a classifier which distinguishes spam from ham. Nevertheless, the spammers begin hiding image text for preventing OCR from detecting spam. To recompense for the shortages of the OCR system, a method based on the detection algorithm is proposed for the text region. To estimate the performance of the projected system, the methodology was applied to a group of unwanted images Dredze (available to the public) to check the efficiency of our method which outperforms the initial OCR system in sensible use with a complex background in spam. The test results indicated that the new method gives good text regions detection even for noisy images.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Gupta, Y., Sharma, S.H., Bedwal, T.: Text extraction techniques. Int. J. Comput. Appl. NSFTICE, 10–12 (2015)
Natei, K.N., Viradiya, J., Sasikumar, S.: Extracting text from image document and displaying its related information. J. Eng. Res. Appl. 8(5), 27–33 (Part-V) (2018). ISSN: 2248-9622
Mathur, G., Rikhari, S.: Text detection in document images: highlight on using FAST algorithm. Int. J. Adv. Eng. Res. Sci. (IJAERS) 4(3) (2017). ISSN: 2349-6495(P)|2456-1908(O)
Kulkarni, C.R., Barbadekar, A.B.: Text detection and recognition: a review. Int. Res. J. Eng. Technol. (IRJET) (2017). e-ISSN: 2395-0056, p-ISSN: 2395-0072
Dai, J., Wang, Z., Zhao, X., Shao, S.: Scene text detection based on enhanced multi-channels MSER and a fast text grouping process. Int. J. Comput. Linguist. Res. 9(2) (2018)
Lee, H.: Wavelet analysis for image processing. Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan. On http://disp.ee.ntu.edu.tw/henry/wavelet_analysis.pdf
Javed, M., Nagabhushan, P., Chaudhuri, B.B.: Extraction of projection profile, run-histogram and entropy features straight from run-length compressed text documents. In: IAPR Asian conference on Pattern Recognition, IEEE proceedings, pp. 813–817 (2013)
Burger, W., Burge, M.J.: Principles of digital image processing. Cor Algorithms. Springer Publishing Company (2009)
Gonz´alez, A., Bergasa, L.M., Yebes, J.J., Bron, S.: Text location in complex images. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 617–620, Tsukuba, 11–15 Nov 2012
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dhahi, E.H., Ali, S.A., Naser, M.A. (2020). Text Region Extraction for Noisy Spam Image. In: Mallick, P., Balas, V., Bhoi, A., Chae, GS. (eds) Cognitive Informatics and Soft Computing. Advances in Intelligent Systems and Computing, vol 1040. Springer, Singapore. https://doi.org/10.1007/978-981-15-1451-7_25
Download citation
DOI: https://doi.org/10.1007/978-981-15-1451-7_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1450-0
Online ISBN: 978-981-15-1451-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)