Performance Evaluation of Similar Sentences Extraction

Gu, Yanhui; Yang, Zhenglu; Nakano, Miyuki; Kitsuregawa, Masaru

doi:10.1007/978-3-642-37134-9_7

Yanhui Gu¹⁷,
Zhenglu Yang¹⁷,
Miyuki Nakano¹⁷ &
…
Masaru Kitsuregawa¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7813))

Included in the following conference series:

International Workshop on Databases in Networked Information Systems

1456 Accesses
1 Citations

Abstract

Similar sentence extraction is an important issue because it is the basis of many applications. In this paper, we conduct comprehensive experiments on evaluating the performance of similar sentence extraction in a general framework. The effectiveness and the efficiency issues are explored on three real datasets, with different factors considered, i.e., size of data, top-k value. Moreover, the WordNet is taken into account as an additional semantic resource and incorporated into the framework. We thoroughly explore the performance of the updated framework to study the similar sentence extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 1606–1611 (2007)
Google Scholar
Gu, Y., Yang, Z., Nakano, M., Kitsuregawa, M.: Towards Efficient Similar Sentences Extraction. In: Yin, H., Costa, J.A.F., Barreto, G. (eds.) IDEAL 2012. LNCS, vol. 7435, pp. 270–277. Springer, Heidelberg (2012)
Chapter Google Scholar
Hatzivassiloglou, V., Klavans, J.L., Eskin, E.: Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, EMNLP/VLC 1999, pp. 203–212 (1999)
Google Scholar
Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Communications of ACM 18(6), 341–343 (1975)
Article MathSciNet MATH Google Scholar
Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data 2(2), 1–25 (2008)
Article Google Scholar
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104, 211–240 (1997)
Article Google Scholar
Leacock, C., Chodorow, M.: Combining local context and wordnet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 305–332. MIT Press (1998)
Google Scholar
Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)
MathSciNet Google Scholar
Li, Y., McLean, D., Bandar, Z., O’Shea, J., Crockett, K.A.: Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering 18(8), 1138–1150 (2006)
Article Google Scholar
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2006, pp. 775–780 (2006)
Google Scholar
Mihalcea, R., Tarau, P.: Textrank: Bringing order into text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2004, pp. 404–411 (2004)
Google Scholar
Sarawagi, S., Kirpal, A.: Efficient set joins on similarity predicates. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2004, pp. 743–754 (2004)
Google Scholar
Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37, 1–39 (2010)
MATH Google Scholar
Turney, P.D.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)
Chapter Google Scholar
Wang, K., Ming, Z.Y., Hu, X., Chua, T.S.: Segmentation of multi-sentence questions: towards effective question retrieval in cqa services. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, pp. 387–394 (2010)
Google Scholar
Yang, Z., Kitsuregawa, M.: Efficient searching top-k semantic similar words. In: Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI 2011, pp. 2373–2378 (2011)
Google Scholar
Yang, Z., Yu, J., Kitsuregawa, M.: Fast algorithms for top-k approximate string matching. In: Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2010, pp. 1467–1473 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Industrial Science, University of Tokyo, Japan, 4-6-1 Komaba, Meguro-ku, Tokyo, 153-8505, Japan
Yanhui Gu, Zhenglu Yang, Miyuki Nakano & Masaru Kitsuregawa

Authors

Yanhui Gu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenglu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Miyuki Nakano
View author publications
You can also search for this author in PubMed Google Scholar
Masaru Kitsuregawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate Department of Computer and Information Systems, University of Aizu, Ikki Machi, 965-8580, Aizu-Wakamatsu, Fukushima, Japan
Aastha Madaan , Shinji Kikuchi & Subhash Bhalla , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gu, Y., Yang, Z., Nakano, M., Kitsuregawa, M. (2013). Performance Evaluation of Similar Sentences Extraction. In: Madaan, A., Kikuchi, S., Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2013. Lecture Notes in Computer Science, vol 7813. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37134-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-37134-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37133-2
Online ISBN: 978-3-642-37134-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics