Abstract
Normalization of SMS is a very important task that must be addressed by the computational community because of the tremendous growth of services based on mobile devices, which make use of this kind of messages. There exist many limitations on the automatic treatment of SMS texts derived from the particular writing style used. Even if there are suficient problems dealing with this kind of texts, we are also interested in some tasks requiring to understand the meaning of documents in different languages, therefore, increasing the complexity of such tasks. Our approach proposes to normalize SMS texts employing machine translation techniques. For this purpose, we use a statistical bilingual dictionary calculated on the basis of the IBM-4 model for determining the best translation for a given SMS term. We have compared the presented approach with a traditional probabilistic method of information retrieval, observing that the normalization model proposed here highly improves the performance of the probabilistic one.
Chapter PDF
Similar content being viewed by others
References
Kim, H., Seo, J.: High-performance faq retrieval using an automatic clustering method of query logs. Inf. Process. Manage. 42, 650–661 (2006)
Kim, H., Lee, H., Seo, J.: A reliable faq retrieval system using a query log classification technique based on latent semantic analysis. Inf. Process. Manage. 43, 420–430 (2007)
Kim, H., Seo, J.: Cluster-based faq retrieval using latent term weights. IEEE Intelligent Systems 23, 58–65 (2008)
Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V., Liu, Y.: Statistical machine translation for query expansion in answer retrieval. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 464–471. Association for Computational Linguistics, Prague (2007)
Wu, C.H., Yeh, J.F., Chen, M.J.: Domain-specific faq retrieval using independent aspects. ACM Transactions on Asian Language Information Processing (TALIP) 4, 1–17 (2005)
Aw, A., Zhang, M., Xiao, J., Su, J.: A phrase-based statistical model for sms text normalization. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, COLING-ACL 2006, pp. 33–40. Association for Computational Linguistics, Stroudsburg (2006)
Kothari, G., Negi, S., Faruquie, T.A., Chakaravarthy, V.T., Subramaniam, L.V.: SMS based interface for FAQ retrieval. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009, vol. 2, pp. 852–860. Association for Computational Linguistics, Morristown (2009)
Contractor, D., Kothari, G., Faruquie, T.A., Subramaniam, L.V., Negi, S.: Handling noisy queries in cross language faq retrieval. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 87–96. Association for Computational Linguistics, Stroudsburg (2010)
Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., Rosso, P.: A statistical approach to crosslingual natural language tasks. J. Algorithms 64, 51–60 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vilariño, D., Pinto, D., Beltrán, B., León, S., Castillo, E., Tovar, M. (2012). A Machine-Translation Method for Normalization of SMS. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera López, J.A., Boyer, K.L. (eds) Pattern Recognition. MCPR 2012. Lecture Notes in Computer Science, vol 7329. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31149-9_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-31149-9_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31148-2
Online ISBN: 978-3-642-31149-9
eBook Packages: Computer ScienceComputer Science (R0)