Skip to main content

Can We Neglect Function Words in Word Embedding?

  • Conference paper
  • First Online:
Natural Language Understanding and Intelligent Applications (ICCPOL 2016, NLPCC 2016)

Abstract

Distributed representation is the most popular way to capture semantic and syntactic features recently, and it has been widely used in various natural language processing tasks. Function words express a grammatical or structural relationship with other words in a sentence. However, previous works merely considered that function words are equal to content words or neglected function words, there is no experimental analyses about function words. In this paper, we explored the effect of function words on word embedding with a word analogy reasoning task and a paraphrase identification task. The results show that neglecting function words has different effects on syntactic and semantic related tasks, with an increase or a decrease in accuracy, moreover, the model of training word embeddings does also matter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.

  2. 2.

    https://dumps.wikimedia.org/enwiki/.

  3. 3.

    http://nlp.stanford.edu/software/tagger.html.

  4. 4.

    https://code.google.com/archive/p/word2vec/.

  5. 5.

    http://nlp.stanford.edu/projects/glove/.

References

  1. Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)

    Article  Google Scholar 

  2. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  3. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI, pp. 2267–2273 (2015)

    Google Scholar 

  4. Nasir, J.A., Varlamis, I., Karim, A., Tsatsaronis, G.: Semantic smoothing for text clustering. Knowl.-Based Syst. 54, 216–229 (2013)

    Article  Google Scholar 

  5. Zirikly, A., Diab, M.: Named entity recognition for arabic social media. In: Proceedings of NAACL-HLT, pp. 176–185 (2015)

    Google Scholar 

  6. Milajevs, D., Kartsaklis, D., Sadrzadeh, M., Purver, M.: Evaluating neural word representations in tensor-based compositional settings. arXiv preprint arXiv:1408.6179 (2014)

  7. Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: EMNLP, pp. 1025–1035 (2014)

    Google Scholar 

  8. Santos, C.D., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp 1818–1826 (2014)

    Google Scholar 

  9. Chen, W., Zhang, Y., Zhang, M.: Feature embedding for dependency parsing. In: COLING, pp. 816–826 (2014)

    Google Scholar 

  10. Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: ACL (1), pp. 1555–1565 (2014)

    Google Scholar 

  11. Zhang, J., Liu, S., Li, M., Zhou, M., Zong, C.: Bilingually-constrained phrase embeddings for machine translation. In: ACL (1), pp. 111–121 (2014)

    Google Scholar 

  12. Clinchant, C.S., Perronnin, F.: Aggregating continuous word embeddings for information retrieval. In: Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, pp. 100–109 (2013)

    Google Scholar 

  13. Lai, S., Liu, K., Xu, L., Zhao, J.: How to generate a good word embedding? ArXiv preprint arXiv:1507.05523 (2015)

  14. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In EMNLP 14, 1532–1543 (2014)

    Google Scholar 

  15. Tang, G., YU, D., Xun, E.: An unsupervised word sense disambiguation method based on sememe vector in HowNet. J. Chin. Inf. Process. 29(6), 23–29 (2015). (In Chinese)

    Google Scholar 

  16. Ling, W., Tsvetkov, Y., Amir, S., Fermandez, R., Dyer, C., Black, A.W., Trancoso, I., Chu-Cheng, L.: Not all contexts are created equal: better word representations with variable attention. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1367–1372. Association for Computational Linguistics, September 2015

    Google Scholar 

  17. Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting word vectors to semantic lexicons. ArXiv preprint arXiv:1411.4166 (2014)

  18. Fries, C.C.: The structure of english: an introduction to the construction of English sentences. Language 31(2), 312–345 (1952)

    Google Scholar 

  19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111–3119 (2013)

    Google Scholar 

  20. Dolan, B., Brockett, C., Quirk, C.: Microsoft research paraphrase corpus (2005). Retrieved 29 Mar 2008

    Google Scholar 

  21. Mitchell, J., Lapata, M.: Vector-based models of semantic composition. In: ACL, pp. 236–244 (2008)

    Google Scholar 

  22. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  23. Huang, H., Zhizhuo, Y.: Unsupervised word sense disambiguation using neighborhood knowledge. In: 25th Pacific Asia Conference on Language, Information and Computation, pp. 333–342 (2011)

    Google Scholar 

  24. Tang, G., Guo, Y., Yu, D., Xun, E.: A hybrid re-ranking method for entity recognition and linking in search queries. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) NLPCC 2015. LNCS (LNAI), vol. 9362, pp. 598–605. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25207-0_57

    Chapter  Google Scholar 

Download references

Acknowledgements

The research work is partially funded by the Natural Science Foundation of China (No. 61300081), and the National High Technology Research and Development Program of China (No. 2015AA015409).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Tang, G., Rao, G., Yu, D., Xun, E. (2016). Can We Neglect Function Words in Word Embedding?. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50496-4_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50495-7

  • Online ISBN: 978-3-319-50496-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics