Advertisement

A Machine-Translation Method for Normalization of SMS

  • Darnes Vilariño
  • David Pinto
  • Beatriz Beltrán
  • Saul León
  • Esteban Castillo
  • Mireya Tovar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7329)

Abstract

Normalization of SMS is a very important task that must be addressed by the computational community because of the tremendous growth of services based on mobile devices, which make use of this kind of messages. There exist many limitations on the automatic treatment of SMS texts derived from the particular writing style used. Even if there are suficient problems dealing with this kind of texts, we are also interested in some tasks requiring to understand the meaning of documents in different languages, therefore, increasing the complexity of such tasks. Our approach proposes to normalize SMS texts employing machine translation techniques. For this purpose, we use a statistical bilingual dictionary calculated on the basis of the IBM-4 model for determining the best translation for a given SMS term. We have compared the presented approach with a traditional probabilistic method of information retrieval, observing that the normalization model proposed here highly improves the performance of the probabilistic one.

References

  1. 1.
    Kim, H., Seo, J.: High-performance faq retrieval using an automatic clustering method of query logs. Inf. Process. Manage. 42, 650–661 (2006)CrossRefGoogle Scholar
  2. 2.
    Kim, H., Lee, H., Seo, J.: A reliable faq retrieval system using a query log classification technique based on latent semantic analysis. Inf. Process. Manage. 43, 420–430 (2007)CrossRefGoogle Scholar
  3. 3.
    Kim, H., Seo, J.: Cluster-based faq retrieval using latent term weights. IEEE Intelligent Systems 23, 58–65 (2008)Google Scholar
  4. 4.
    Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V., Liu, Y.: Statistical machine translation for query expansion in answer retrieval. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 464–471. Association for Computational Linguistics, Prague (2007)Google Scholar
  5. 5.
    Wu, C.H., Yeh, J.F., Chen, M.J.: Domain-specific faq retrieval using independent aspects. ACM Transactions on Asian Language Information Processing (TALIP) 4, 1–17 (2005)CrossRefGoogle Scholar
  6. 6.
    Aw, A., Zhang, M., Xiao, J., Su, J.: A phrase-based statistical model for sms text normalization. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, COLING-ACL 2006, pp. 33–40. Association for Computational Linguistics, Stroudsburg (2006)CrossRefGoogle Scholar
  7. 7.
    Kothari, G., Negi, S., Faruquie, T.A., Chakaravarthy, V.T., Subramaniam, L.V.: SMS based interface for FAQ retrieval. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009, vol. 2, pp. 852–860. Association for Computational Linguistics, Morristown (2009)Google Scholar
  8. 8.
    Contractor, D., Kothari, G., Faruquie, T.A., Subramaniam, L.V., Negi, S.: Handling noisy queries in cross language faq retrieval. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 87–96. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  9. 9.
    Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., Rosso, P.: A statistical approach to crosslingual natural language tasks. J. Algorithms 64, 51–60 (2009)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Darnes Vilariño
    • 1
  • David Pinto
    • 1
  • Beatriz Beltrán
    • 1
  • Saul León
    • 1
  • Esteban Castillo
    • 1
  • Mireya Tovar
    • 1
  1. 1.Faculty of Computer ScienceBenemérita Universidad Autónoma de PueblaMexico

Personalised recommendations