Skip to main content

Research of Spam Filtering System Based on LSA and SHA

  • Conference paper
  • 2969 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5264))

Abstract

Along with the widespread concern of spam problem, at present, there are spam filtering system nowadays about the problem of semantic imperfection and spam filter low effect in the multi-send spam. This paper proposes a model of spam filtering which based on latent semantic analysis (LSA) and message-digest algorithm 5 (SHA). Making use of the LSA marks the latent feature phrase in the spam, semantic analysis is led into the spam filtering technique; the "e-mail fingerprint" of multi-send spam is born with SHA on the LSA analytical foundation, the problem of filtering technique’s low effect in the multi-send spam is resolved with this kind of method. We have designed a spam filtering system based on this model. Our designed system was evaluated with an optional dataset. The results obtained were compared with KNN algorithm filter experiment results show that system based on Latent Semantic Analysis and SHA performs KNN. The experiments show the expected results obtained, and the feasibility and advantage of the new spam filtering method is validated.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anti-spam Alliance in China, http://www.anti-spam.org.cn

  2. Hoanca, B.: How Good are Our Weapons in the Spam Wars? Technology and Society Magazine 25(1), 22–30 (2006)

    Article  Google Scholar 

  3. Whitworth, B., Whitworth, E.: Spam and the Social Technical Gap. Computer & Graphics 37(10), 38–45 (2004)

    Google Scholar 

  4. Tang, P.Z., Li, L.Q., Zuo, L.M.: A New Verification Technology Based on SHA and OTP. Journal of East China Jiao Tong University 22(2), 55–59 (2005)

    Google Scholar 

  5. Wang, G.P.: An Efficient Implementation of SHA-1 Hash Function. In: The 2006 IEEE International Conference on Information Technology, pp. 575–579. IEEE Press, China (2006)

    Google Scholar 

  6. Chen, H., Zhou, J.L., Feng, S.: Double Figure Authentication System Based on SHA and RSA. Network & Computer Security 4, 6–8 (2006)

    Google Scholar 

  7. Burr, W.E.: Cryptographic Hash Standards: Where Do We Go From Here? Security & Privacy Magazine 4(2), 88–91 (2006)

    Article  Google Scholar 

  8. Zhu, W.Z., Chen, C.M.: Storylines: Visual Exploration and Analysis in Latent Semantic Spaces. Computers & Graphics 31(3), 78–79 (2007)

    Article  Google Scholar 

  9. Maletic, J.I., Marcus, A.: Using Latent Semantic Analysis to Identify Similarities in Source Code to Support Program Understanding. In: 12th IEEE International Conference on Tools with Artificial Intelligence, pp. 46–53. IEEE Press, New York (2000)

    Google Scholar 

  10. Martin, D.I., Martin, J.C., Berry, M.W.: Out-of-core SVD Performance for Document Indexing. Applied Numerical Mathematics 57(11-12), 224–226 (1994)

    MathSciNet  Google Scholar 

  11. Gai, J., Wang, Y., Wu, G.S.: The Theory and Application of Latent Semantic Analysis. Application Research of Computers 21(3), 161–164 (2004)

    Google Scholar 

  12. Michail, H., Kakarountas, A.P.: A Low-power and High-throughput Implementation of the SHA-1 Hash Function. In: The 2005 IEEE International Symposium on Circuits and Systems, vol. 4, pp. 4086–4089. IEEE Press, Kobe Japan (2005)

    Chapter  Google Scholar 

  13. Wang, M.Y., Su, C.P., Huang, C.T., Wu, C.W.: An HMAC Processor with Integrated SHA-1 and MD5 Algorithms. In: Design Automation Conference, Proceedings of the ASP-DAC 2004, Japan, pp. 456–458 (2004)

    Google Scholar 

  14. Paul, D.B.: MySQL: The Definitive Guide to Using, Programming, and Administering MySQL 4, 2nd edn. China Machine Press, China (2004)

    Google Scholar 

  15. Learning to Filter Unsolicited Commercial E-mail, http://www.aueb.gr/users/ion/docs/TR2004_updated.pdf

  16. Deshpande, V.P., Erbacher, R.F., Harris, C.: An Evaluation of Naïve Bayesian Anti-Spam Filtering. In: Information Assurance and Security Workshop, pp. 333–340. IEEE SMC Press, Spain (2007)

    Chapter  Google Scholar 

  17. Li, J.Z., Zhang, D.D.: Algorithms for Dynamically Adjusting the Sizes of Sliding Windows. Journal of Software 15(12), 13–16 (2004)

    Google Scholar 

  18. Parthasarathy, G., Chatterji, B.N.: A Class of New KNN Methods for Low Sample Problems. Systems, Man and Cybernetics 20(3), 715–718 (1990)

    Article  Google Scholar 

  19. Yuan, W., Liu, J., Zhou, H.B.: An Improved KNN Method and Its Application to Tumor Diagnosis. In: The 2004 IEEE International Conference on Machine Learning and Cybernetics, vol. 5, pp. 2836–2841. IEEE Press, Shanghai (2004)

    Google Scholar 

  20. Soucy, P., Mineau, G.W.: A Simple KNN Algorithm for Text Categorization. In: Data Mining. The 2001 IEEE International Conference on Data Mining, pp. 647–648. IEEE Press, USA (2001)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sun, J., Zhang, Q., Yuan, Z., Huang, W., Yan, X., Dong, J. (2008). Research of Spam Filtering System Based on LSA and SHA. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds) Advances in Neural Networks - ISNN 2008. ISNN 2008. Lecture Notes in Computer Science, vol 5264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87734-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87734-9_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87733-2

  • Online ISBN: 978-3-540-87734-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics