Skip to main content

Text Region Extraction for Noisy Spam Image

  • Conference paper
  • First Online:
Cognitive Informatics and Soft Computing

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1040))

  • 623 Accesses

Abstract

In this paper, the problem of spam filtering for images, a type of fast-spreading spam where the text is included in images to overcome the text-based spam filter. One common method for detecting spam is the optical character recognition system (OCR) that detecting and recognizing the text embedded, following by a classifier which distinguishes spam from ham. Nevertheless, the spammers begin hiding image text for preventing OCR from detecting spam. To recompense for the shortages of the OCR system, a method based on the detection algorithm is proposed for the text region. To estimate the performance of the projected system, the methodology was applied to a group of unwanted images Dredze (available to the public) to check the efficiency of our method which outperforms the initial OCR system in sensible use with a complex background in spam. The test results indicated that the new method gives good text regions detection even for noisy images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gupta, Y., Sharma, S.H., Bedwal, T.: Text extraction techniques. Int. J. Comput. Appl. NSFTICE, 10–12 (2015)

    Google Scholar 

  2. Natei, K.N., Viradiya, J., Sasikumar, S.: Extracting text from image document and displaying its related information. J. Eng. Res. Appl. 8(5), 27–33 (Part-V) (2018). ISSN: 2248-9622

    Google Scholar 

  3. Mathur, G., Rikhari, S.: Text detection in document images: highlight on using FAST algorithm. Int. J. Adv. Eng. Res. Sci. (IJAERS) 4(3) (2017). ISSN: 2349-6495(P)|2456-1908(O)

    Article  Google Scholar 

  4. Kulkarni, C.R., Barbadekar, A.B.: Text detection and recognition: a review. Int. Res. J. Eng. Technol. (IRJET) (2017). e-ISSN: 2395-0056, p-ISSN: 2395-0072

    Google Scholar 

  5. Dai, J., Wang, Z., Zhao, X., Shao, S.: Scene text detection based on enhanced multi-channels MSER and a fast text grouping process. Int. J. Comput. Linguist. Res. 9(2) (2018)

    Google Scholar 

  6. Lee, H.: Wavelet analysis for image processing. Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan. On http://disp.ee.ntu.edu.tw/henry/wavelet_analysis.pdf

  7. Javed, M., Nagabhushan, P., Chaudhuri, B.B.: Extraction of projection profile, run-histogram and entropy features straight from run-length compressed text documents. In: IAPR Asian conference on Pattern Recognition, IEEE proceedings, pp. 813–817 (2013)

    Google Scholar 

  8. Burger, W., Burge, M.J.: Principles of digital image processing. Cor Algorithms. Springer Publishing Company (2009)

    Google Scholar 

  9. Gonz´alez, A., Bergasa, L.M., Yebes, J.J., Bron, S.: Text location in complex images. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 617–620, Tsukuba, 11–15 Nov 2012

    Google Scholar 

  10. From https://www.cs.jhu.edu/~mdredze/datasets/image_spam/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suhad A. Ali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dhahi, E.H., Ali, S.A., Naser, M.A. (2020). Text Region Extraction for Noisy Spam Image. In: Mallick, P., Balas, V., Bhoi, A., Chae, GS. (eds) Cognitive Informatics and Soft Computing. Advances in Intelligent Systems and Computing, vol 1040. Springer, Singapore. https://doi.org/10.1007/978-981-15-1451-7_25

Download citation

Publish with us

Policies and ethics