Skip to main content

Statement Generation Based on Big Data for Keyword Search

  • Conference paper
  • First Online:
Machine Learning and Intelligent Communications (MLICOM 2019)

Abstract

Natural language generation (NLG) is the process of automatically generating a high-quality natural language text through a planning process based on some key information. Regular NLG generates sentences by analyzing grammatical and semantics, generating rules, and then organizing elements based on rules and heuristics. However, sentences generated by such methods are too strict, poorly scalable and difficult to adapt to the changing language style of human beings nowadays. Our goal is to generate smooth, personal, multi-sentence text for end users. This paper introduces a new NLG system, which can generate distinctive statements, and discard the knowledge of semantics, syntax etc., which are required by the original rule-based generation statements. This system turns out to be simple and efficient. We obtain required corpus from the network, and then use the idea of the search engine to find sentences from a large amount of data that matches the meaning of the keyword provided by users. Such generated sentences are more consistent with people’s daily life. Finally, we apply our system in the web commentary domain, evaluating our system based on three criteria. The result shows that our system works well in this field and can continue to deepen.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang, J., Chen, J.: Overview of natural language generation. Comput. Appl. Res. (08) (2006)

    Google Scholar 

  2. Knight, K., Hatzivassiloglou, V.: Two-level, many paths generation. In: Proceedings of ACL 1995 (1995)

    Google Scholar 

  3. Langkilde, I.: Forest-based statistical sentence generation. In: Proceedings of the 1st Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, WA (2000)

    Google Scholar 

  4. Galanis, D., Androutsopoulos, I.: Generating multilingual descriptions from linguistically annotated OWL ontologies: the NaturalOWL system. In: Proceedings of 11th European Workshop on Natural Language Generation, pp. 143–146 (2007)

    Google Scholar 

  5. Cimiano, P., Nagel, D., Unger, C.: Exploiting ontology lexica for generating natural language texts from RDF data. In: Proceedings of 14th European Workshop on NLG, pp. 10–19 (2013)

    Google Scholar 

  6. Mikolov, T., Karafit, M., Burget, L., Cernocky, J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings on InterSpeech (2010)

    Google Scholar 

  7. Zhang, X., Lapata, M.: Chinese poetry generation with recurrent neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 670–680. Association for Computational Linguistics (2014)

    Google Scholar 

  8. Wen, T.-H., et al.: Stochastic language generation in dialogue using recurrent neural networks with convolutional sentence reranking. In: Proceedings of SIGdial. Association for Computational Linguistics (2015)

    Google Scholar 

  9. Chakrabarti, S., Van Den Berg, M., Dom, B.: Focused crawling: a new approach to topic specific web resource discovery. Comput. Netw. 31(11), 1623–1640 (1999)

    Article  Google Scholar 

  10. Sun, L., He, G., Wu, L.: Research on the Web Crawler. Comput. Knowl. Technol. (2010)

    Google Scholar 

  11. Chen, H.: Research and realization on focused crawler key technologies of vertical search engine. Master’s thesis, Central China Normal University, Wuhan

    Google Scholar 

  12. Hailiang, Z., Li, S.: A customized focusing crawler. Electron. Technol. 51–54 (2009)

    Google Scholar 

  13. Zhang, R.: Research on RSS-based focused web crawler in college website group. Nanchang University (2012)

    Google Scholar 

  14. Fang, Q., Yang, G., Wu, Y.: Customized focused crawler for peer-to-peer Web search. J. Huazhong Univ. Sci. Technol. (Nat. Sci.) 153–157 (2007)

    Google Scholar 

  15. Hatcher, C.E., Gospodnetic, O., McCandless, M.: Lucene in Action, 2nd edn. Manning Publication, Stamford (2010)

    Google Scholar 

  16. Tang, H., He, Y., Xu, X.: Distributed parallel index based on Lucene. Comput. Technol. Dev. 123–126 (2011)

    Google Scholar 

  17. Liu, C., Guo, Q.: Analysis and research of web chinese retrieval system based Lucene, pp. 1051–1055. Computer Society (2009)

    Google Scholar 

  18. Sugiyama, H., Meguro, T., Higashinaka, R., Minami, Y.: Open-domain utterance generation for conversational dialogue systems using web-scale dependency structures. In: SIGDIAL, pp. 334–338 (2013)

    Google Scholar 

  19. Oh, A.H., Rudnicky, A.I.: Stochastic language generation for spoken dialogue systems. In: Proceedings of the 2000 ANLP/NAACL Workshop on Conversational systems, vol. 3, pp. 27–32. Association for Computational Linguistics (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhengyou Xia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Q., Xia, Z. (2019). Statement Generation Based on Big Data for Keyword Search. In: Zhai, X., Chen, B., Zhu, K. (eds) Machine Learning and Intelligent Communications. MLICOM 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-030-32388-2_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32388-2_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32387-5

  • Online ISBN: 978-3-030-32388-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics