Abstract
Aiming at the problem with filtering E-mail, based on analyzing defects of the traditional mutual information, an approach based on quadratic TF * IDF mutual information feature selection is presented in the paper; then the importance of characteristic words appearing just in only one class is again measured to solve the problem that feature selection is not effectively done because of equal mutual information value. Finally, Bayesian classifier is used for experiment and experimental result shows that compared with the original method, the presented approach possesses higher correct rate and more efficiency of classification in text classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Payack, P.J.J.: Number of Words in the English Language: 1,005,939 (EB/OL) (July 4, 2010), http://www.languagemonitor.com/
Dasgupta, A., Drineas, P., Harb, B., Josifovski, V., Mahoney, M.W.: Feature Selection Methods for Text Classification. In: Proceedings of the 13th ACM SIGKDD International Conference, pp. 230–239 (2007)
Zheng, Z., Wu, X., Srihari, R.: Feature Selection for Text Categorization on Imbalanced Data. ACM SIGKDD Explorations Newsletter 6(1), 80–89 (2004)
Zhang, H., Wang, L.: Automatic text categorization feature selection methods research. Computer Engineering and Design 27(20), 3838–3841 (2006)
Lu, Y.: Reserch on Content-based Spam Filtering Technology. Southwest Jiaotong University (2009)
Shi, C., Xu, C., Yang, X.: Study of TFIDF algorithm. Journal of Computer Applications 29, 167–170 (2009)
Huang, Z.: Design and Implement of Chinese-based Bayesian Spam Filtering System. University of Electronic Science and Technology of China (2007)
Han, J., Kamber, M.: Data Mining Concepts and Techniques. China Machine Press, Beijing (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gong, S., Gong, X., Wang, Y. (2012). Study of E-mail Filtering Based on Mutual Information Text Feature Selection Method. In: Zhang, T. (eds) Instrumentation, Measurement, Circuits and Systems. Advances in Intelligent and Soft Computing, vol 127. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27334-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-27334-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27333-9
Online ISBN: 978-3-642-27334-6
eBook Packages: EngineeringEngineering (R0)