Skip to main content

Fuzzy String Matching Using Sentence Embedding Algorithms

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9949))

Included in the following conference series:

  • 3164 Accesses

Abstract

Fuzzy string matching has many applications. Traditional approaches mainly use the appearance information of characters or words but do not use their semantic meanings. We postulate that the latter information may also be important for this task. To validate this hypothesis, we build a pipeline in which approximate string matching is used to pre-select some candidates and sentence embedding algorithms are used to select the final results from these candidates. The aim of sentence embedding is to represent semantic meaning of the words. Two sentence embedding algorithms are tested, convolutional neural network (CNN) and averaging word2vec. Experiments show that the proposed pipeline can significantly improve the accuracy and averaging word2vec works slightly better than CNN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://word2vec.googlecode.com/svn/trunk/.

References

  1. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. (CSUR) 33(1), 31–88 (2001)

    Article  Google Scholar 

  2. Ye, Z., Byron, C.W.: A sensitivity analysis of (and practitioners guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015)

  3. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of Advances in Neural Information Processing Systems, South Lake Tahoe, pp. 3111–3119 (2013)

    Google Scholar 

  4. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  5. Edit Distance. https://en.wikipedia.org/wiki/Edit_distance

  6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, South Lake Tahoe, pp. 1097–1105 (2012)

    Google Scholar 

  7. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of Advances in Neural Information Processing Systems, Montréal, Quebec, Canada, pp. 3104–3112 (2014)

    Google Scholar 

  8. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678. ACM (2014)

    Google Scholar 

  9. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, pp. 1–9 (2015)

    Google Scholar 

  10. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of Advances in Neural Information Processing Systems, Montréal, Quebec, Canada, pp. 649–657 (2015)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Basic Research Program (973 Program) of China under Grant 2012CB316301 and Grant 2013CB329403, and in part by the National Natural Science Foundation of China under Grant 61273023, Grant 91420201, and Grant 61332007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaolin Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Rong, Y., Hu, X. (2016). Fuzzy String Matching Using Sentence Embedding Algorithms. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9949. Springer, Cham. https://doi.org/10.1007/978-3-319-46675-0_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46675-0_69

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46674-3

  • Online ISBN: 978-3-319-46675-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics