Abstract
Information obtained nowadays often contains malicious contents. These malicious contents such as profane words have to be censored as they can influence the minds of the young ones and create hate among people. In censoring the profane words, this paper introduces a hybrid text censoring method which is based on Bayesian Filtering and Approximate String Matching techniques. The Bayesian filtering technique is used to detect the malicious contents (profane words) while the Approximate String Matching technique is used to enhance the effectiveness of detecting profane words. In evaluating the performance of the proposed system, the evaluation metrics of Precision, Recall, F-measure and MAE were used. The results show that Bayesian filtering technique can be used to filter profane words.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Wang, N., Jiang, K., Meier, R., Zeng, H.: Information Filtering against Information Pollution and Crime. In: 2012 International Conference on Computing, Measurement, Control and Sensor Network (CMCSN), pp. 45–47. IEEE (July 2012)
Belkin, N.J., Croft, W.B.: Information filtering and information retrieval: two sides of the same coin? Communications of the ACM 35(12), 29–38 (1992)
Polpinij, J., Sibunruang, C., Paungpronpitag, S., Chamchong, R., Chotthanom, A.: A web pornography patrol system by content-based analysis: In particular text and image. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2008, pp. 500–505. IEEE (October 2008)
Du, J., Yi, Z.A.: A New Knn Categorization Algorithm for Harmful Information Filtering. In: 2012 Fifth International Symposium on Computational Intelligence and Design (ISCID), vol. 1, pp. 489–492. IEEE (October 2012)
Yoon, T., Park, S.Y., Cho, H.G.: A smart filtering system for newly coined profanities by using approximate string alignment. In: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), pp. 643–650. IEEE (June 2010)
Christen, P.: Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer (2012)
Clayton, R.: Calculating Similarity (Part 2): Jaccard, Sørensen and Jaro-Winkler Similarity. Gettingcirrius.com (2011), http://www.gettingcirrius.com/2011/01/calculating-similarity-part-2-jaccard.html (retrieved February 21, 2014)
Graham, P.: A plan for spam. Available from World Wide Web (2002), http://www.paulgraham.com/spam.html
Anderson, D.: Statistical Spam Filtering, EECS595 (Fall 2006)
Bayesian Filtering Example (n.d), http://www.process.com/precisemail/bayesian_example.htm (retrieved)
Swear Word List (n.d). NoSwearing Website, http://www.noswearing.com/dictionary (retrieved)
The 1200 Most Frequently Used Words in the English Language (n.d), Utah State Office of Education (n.d), http://www.schools.utah.gov/CURR/langartelem/Core-Standards/Resources.aspx (retrieved April 3, 2014)
Alkahtani, H.S., Gardner-Stephen, P.A.U.L., Goodwin, R.: A taxonomy of email SPAM filters. In: Proc. the 12th International Arab Conference on Information Technology (ACIT), pp. 351–356 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ghauth, K.I., Sukhur, M.S. (2015). Text Censoring System for Filtering Malicious Content Using Approximate String Matching and Bayesian Filtering. In: Phon-Amnuaisuk, S., Au, T. (eds) Computational Intelligence in Information Systems. Advances in Intelligent Systems and Computing, vol 331. Springer, Cham. https://doi.org/10.1007/978-3-319-13153-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-13153-5_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13152-8
Online ISBN: 978-3-319-13153-5
eBook Packages: EngineeringEngineering (R0)