Abstract
Similar sentences extraction is an essential issue for many applications, such as natural language processing, Web page retrieval, question-answer model, and so forth. Although there are many studies exploring on this issue, most of them focus on how to improve the effectiveness aspect. In this paper, we address the efficiency issue, i.e., for a given sentence collection, how to efficiently discover the top-k semantic similar sentences to a query. The issue is very important for real applications because the data becomes huge and the existing state-of-the-art strategies cannot satisfy the users’ performance requirement. We propose efficient strategies to tackle the problem based on a general framework. Extensive experimental evaluations demonstrate that the efficiency of our proposal outperforms the state-of-the-art approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burgess, C., Livesay, K., Lund, K.: Explorations in context space: words, sentences, discourse. Discourse Processes (1998)
Ceccarelli, D., Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Caching query-biased snippets for efficient retrieval. In: EDBT (2011)
Cui, H., Sun, R., Li, K., Kan, M.-Y., Chua, T.-S.: Question answering passage retrieval using dependency relations. In: SIGIR (2005)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)
Hellerstein, J.M., Avnur, R., Chou, A., Hidber, C., Olston, C., Raman, V., Roth, T., Haas, P.J.: Interactive data analysis: The control project. Computer 32 (1999)
Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM (1975)
Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data (2008)
Landauer, T., Dumais, S.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review (1997)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes (1998)
Li, Y., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Transaction on Knowledge and Data Engineering (2006)
Metzler, D., Dumais, S.T., Meek, C.: Similarity Measures for Short Segments of Text. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 16–27. Springer, Heidelberg (2007)
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI (2006)
Radlinski, F., Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., Riedel, L.: Optimizing relevance and revenue in ad search: a query substitution approach. In: SIGIR (2008)
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: WWW (2006)
Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research (2010)
Turney, P.D.: Mining the web for synonyms: Pmi-ir versus lsa on toefl. In: EMCL (2001)
Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92(1) (1992)
Wei, F., Li, W., Lu, Q., He, Y.: Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization. In: SIGIR (2008)
Yang, Z., Kitsuregawa, M.: Efficient searching top-k semantic similar words. In: IJCAI (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gu, Y., Yang, Z., Nakano, M., Kitsuregawa, M. (2012). Towards Efficient Similar Sentences Extraction. In: Yin, H., Costa, J.A.F., Barreto, G. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2012. IDEAL 2012. Lecture Notes in Computer Science, vol 7435. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32639-4_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-32639-4_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32638-7
Online ISBN: 978-3-642-32639-4
eBook Packages: Computer ScienceComputer Science (R0)