Learning Discourse-Level Structures for Question Answering

Galitsky, Boris

doi:10.1007/978-3-030-04299-8_7

Boris Galitsky²

2196 Accesses
2 Citations

Abstract

Traditional parse trees are combined together and enriched with anaphora and rhetoric information to form a unified representation for a paragraph of text. We refer to these representations as parse thickets. They are introduced to support answering complex questions, which include multiple sentences, to tackle as many constraints expressed in this question as possible. The question answering system is designed so that an initial set of answers, which is obtained by a TF*IDF or other keyword search model, is re-ranked. Passage re-ranking is performed using matching of the parse thickets of answers with the parse thicket of the question. To do that, a graph representation and matching technique for parse structures for paragraphs of text have been developed. We define the operation of generalization of two parse thickets as a measure of semantic similarity between paragraphs of text to be the maximal common sub-thicket of these parse thickets.

Passage re-ranking improvement via parse thickets is evaluated in a variety of chatbot question-answering domains with long questions. Using parse thickets improves search accuracy compared with the bag-of words, the pairwise matching of parse trees for sentences, and the tree kernel approaches. As a baseline, we use a web search engine API, which provides much more accurate search results than the majority of search benchmarks, such as TREC. A comparative analysis of the impact of various sources of discourse information on the search accuracy is conducted. An open source plug-in for SOLR is developed so that the proposed technology can be easily integrated with industrial search engines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aronovich L, Spiegler I (2007) CM-tree: a dynamic clustered index for similarity search in metric databases. Data Knowl Eng 63(3):919–946
Article Google Scholar
Bar-Haim R, Dagan I, Greental I, Shnarch E (2007) Semantic inference at the lexical-syntactic level. In: AAAI’07 Proceedings of the 22nd national conference on artificial intelligence, Vancouver, BC, Canada, pp 871–876
Google Scholar
Barrena M, Jurado E, Márquez-Neila P, Pachón C (2010) A flexible framework to ease nearest neighbor search in multidimensional data spaces. Data Knowl Eng 69(1):116–136
Article Google Scholar
Bertossi L, Chomicki J (2004) Query answering in inconsistent databases. In: Logics for emerging applications of databases. Springer, Berlin/Heidelberg, pp 43–83
Chapter Google Scholar
Boulos J, Dalvi N, Mandhani B, Mathur S, Re C, Suciu D (2005) MYSTIQ: a system for finding more answers by using probabilities. SIGMOD, June 14–16, 2005, Baltimore, MD, USA
Google Scholar
Bouquet P, Kuper G, Scoz M, Zanobini S (2004) Asking and answering semantic queries. Meaning Coordination and Negotiation (MCN-04) at ISWC-2004, Hiroshima, Japan
Google Scholar
Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM (ACM) 16(9):575–577
Article Google Scholar
Calvanese D, De Giacomo G, Lembo D, Lenzerini M, Rosati R (2007) Tractable reasoning and efficient query answering in description logics: the DL-lite family. J Autom Reason 39:385–429
Article MathSciNet Google Scholar
Chali Y, Joty SR, Hasan SA (2009) Complex question answering: unsupervised learning approaches and experiments. J Artif Intell Res 35:1–47
Article MathSciNet Google Scholar
Clark S, Curran JR (2004) Parsing the WSJ using CCG and log-linear models. In: 42nd ACL, Barcelona, Spain
Google Scholar
Collins M, Duffy N (2002) Convolution kernels for natural language. In: Proceedings of NIPS, pp 625–632
Google Scholar
Costa d, André L, Carvalho ES d M, da Silva AS, Berlt K, Bezerra A (2007) A cost-effective method for detecting web site replicas on search engine databases. Data Knowl Eng 62(3):421–437. https://doi.org/10.1016/j.datak.2006.08.010
Article Google Scholar
Curran JR, Clark S, Bos J (2007) Linguistically motivated large-scale NLP with C&C and boxer. In: Proceedings of the ACL 2007 demonstrations session (ACL-07 demo), pp 33–36
Google Scholar
Düsterhöft A, Thalheim B (2004) Linguistic based search facilities in snowflake-like database schemes. Data Knowl Eng 48(2):177–198
Article Google Scholar
Galitsky B (2003) Natural language question answering system: technique of semantic headers. Advanced Knowledge International, Magill
Google Scholar
Galitsky B (2012) Machine learning of syntactic parse trees for search and classification of text. Eng Appl AI 26(3):1072–1091
Google Scholar
Galitsky B (2017a) Improving relevance in a content pipeline via syntactic generalization. Eng Appl Artif Intell 58:1–26
Article Google Scholar
Galitsky B (2017b) Matching parse thickets for open domain question answering. Data Knowl Eng 107:24–50
Article Google Scholar
Galitsky B, Kuznetsov S (2008) Learning communicative actions of conflicting human agents. J Exp Theor Artif Intell 20(4):277–317
Article Google Scholar
Galitsky B, Lebedeva N (2015) Recognizing documents versus meta-documents by tree kernel learning. In: FLAIRS conference, pp 540–545
Google Scholar
Galitsky B, González MP, Chesñevar CI (2009) A novel approach for classifying customer complaints through graphs similarities in argumentative dialogue. Decis Support Syst 46(3):717–729
Article Google Scholar
Galitsky B, Dobrocsi G, de la Rosa JL, Kuznetsov SO (2010) From generalization of syntactic parse trees to conceptual graphs. In: Croitoru M, Ferré S, Lukose D (eds) Conceptual structures: from information to intelligence, 18th international conference on conceptual structures, ICCS 2010. Lecture notes in artificial intelligence, vol 6208, pp 185–190
Google Scholar
Galitsky B, Dobrocsi G, de la Rosa JL, Sergei O (2011) Kuznetsov: using generalization of syntactic parse trees for taxonomy capture on the web. 19th international conference on conceptual structures, ICCS 2011, pp 104–117
Google Scholar
Galitsky B, de la Rosa JL, Dobrocsi G (2012) Inferring the semantic properties of sentences by mining syntactic parse trees. Data Knowl Eng 81–82:21–45
Article Google Scholar
Galitsky B, Usikov D, Sergei O (2013) Kuznetsov: parse thicket representations for answering multi-sentence questions. In: 20th international conference on conceptual structures, ICCS 2013, Hissar, Bulgaria, pp 285–293
Google Scholar
Galitsky B, Ilvovsky D, Kuznetsov S (2015) Rhetoric map of an answer to compound queries. ACL, Beijing, China, vol 2, pp 681–686
Google Scholar
Google Code (2015) Product queries set. https://code.google.com/p/relevance-based-on-parse-trees/downloads/detail?name=Queries900set.xls
Harabagiu S, Lacatusu F, Hickl A (2006) Answering complex questions with random walk models. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR ’06). ACM, New York, NY, USA, pp 220–227
Google Scholar
Haussler D (1999) Convolution kernels on discrete structures. Technical report ucs-crl-99-10, University of California Santa Cruz
Google Scholar
Hong JL, Siew E-G, Egerton S (2010) Information extraction for search engines using fast heuristic techniques. Data Knowl Eng 69(2):169–196
Article Google Scholar
Horrocks I, Tessaris S (2002) Querying the semantic web: a formal approach. The semantic web—ISWC 2002. Springer, Berlin/Heidelberg, pp 177–191
Google Scholar
Ilvovsky D (2014) Going beyond sentences when applying tree kernels. ACL student workshop, pp 56–63
Google Scholar
Jansen P, Surdeanu M, Clark P (2014) Discourse complements lexical semantics for non-factoid answer reranking. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (ACL), 2014, Baltimore, MD, USA
Google Scholar
Joty SR, Carenini G, Ng RT, Mehdad Y (2013) Combining intra-and multi- sentential rhetorical parsing for document-level discourse analysis. In: ACL, vol 1, pp 486–496
Google Scholar
Joty S, Moschitti A (2014) Discriminative reranking of discourse parses using tree kernels. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP 2014), Doha, Qatar
Google Scholar
Kamp HA (1981) Theory of truth and semantic representation. In: Groenendijk JAG, Janssen TMV, Stokhof MBJ (eds) Formal methods in the study of language. Mathematisch Centrum, Amsterdam
Google Scholar
Kim J-J, Pezik P, Rebholz-Schuhmann D (2008) MedEvi: retrieving textual evidence of relations between biomedical concepts from Medline. Bioinformatics 24(11):1410–1412
Article Google Scholar
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence IJCAI 1995, Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
Lehrer A (1974) Semantic fields and lexical structure. Benjamins, Amsterdam
Google Scholar
Lei Y, Uren V, Motta E (2006) Semsearch: a search engine for the semantic web. In: Managing knowledge in a world of networks. Lecture notes in computer science, vol 4248, pp 238–245
Google Scholar
Li X, Roth D (2002) Learning question classifiers. In: Proceedings of the 19th international conference on computational linguistics - volume 1 (COLING ’02), vol 1. Association for Computational Linguistics, Stroudsburg, pp 1–7
Google Scholar
Mann WC, Taboada M (2015.) http://www.sfu.ca/rst/01intro/definitions.html. Last downloaded 13 June 2015
Mann WC, Thompson SA (1988) Rhetoric al structure theory: toward a functional theory of text organization. Text 8(3):243–281
Article Google Scholar
Mann WC, Matthiessen CMIM, Thompson SA (1992) Rhetorical structure theory and text analysis. In: Mann WC, Thompson SA (eds) Discourse description: diverse linguistic analyses of a fund-raising text. John Benjamins, Amsterdam, pp 39–78
Chapter Google Scholar
Mecca G, Raunich S, Pappalardo A (2007) A new algorithm for clustering search results. Data Knowl Eng 62(3):504–522
Article Google Scholar
Moschitti A (2006) Efficient convolution kernels for dependency and constituent syntactic trees. In: Proceedings of the 17th European conference on machine learning, Berlin, Germany
Google Scholar
Moschitti A, Quarteroni S (2011) Linguistic kernels for answer re-ranking in question answering systems. Inf Process Manag 47(6):825–842
Article Google Scholar
Natsev A, Milind R (2005) Naphade Jelena Tesic. Learning the semantics of multimedia queries and concepts from a small number of examples. MM’05, November 6–11, 2005, Singapore
Google Scholar
Palmer M (2009) Semlink: linking PropBank, VerbNet and FrameNet. In: Proceedings of the generative lexicon conference. September 2009, Pisa, Italy, GenLex-09
Google Scholar
Punyakanok V, Roth D, Yih W (2005) The necessity of syntactic parsing for semantic role labeling. IJCAI-05, Edinburgh, Scotland, UK, pp 1117–1123
Google Scholar
Searle J (1969) Speech acts: an essay in the philosophy of language. Cambridge University, Cambridge
Book Google Scholar
Seo J, Simmons RF (1989) Syntactic graphs: a representation for the union of all ambiguous parse trees. Comput Linguist 15:15
Google Scholar
Severyn A, Moschitti A (2012) Fast support vector machines for convolution tree kernels. Data Min Knowl Disc 25:325–357
Article MathSciNet Google Scholar
Sidorov G, Velasquez F, Stamatatos E, Gelbukh A, Chanona-Hernández L (2012) Syntactic dependency-based N-grams as classification features. LNAI 7630, pp 1–11
Google Scholar
Sidorov G, Velasquez F, Stamatatos E, Gelbukh A, Chanona-Hernández L (2013) Syntactic N-grams as machine learning features for natural language processing. Expert Syst Appl 41(3):853–860
Article Google Scholar
Steedman M (2000) The syntactic process. The MIT Press, Cambridge, MA
MATH Google Scholar
Sun J, Zhang M, Tan C (2011) Tree sequence kernel for natural language. AAAI-25
Google Scholar
Tran T, Cimiano P, Rudolph S, Studer R (2007) Ontology-based interpretation of keywords for semantic search in “The semantic web”. Lecture notes in computer science, vol 4825, pp 523–536
Google Scholar
van Eijck J, Kamp H (1997) Representing discourse in context. Handbook of logic and language. Elsevier, Amsterdam, pp 179–237
Book Google Scholar
Varlamis I, Stamou S (2009) Semantically driven snippet selection for supporting focused web searches. Data Knowl Eng 68(2):261–277. https://doi.org/10.1016/j.datak.2008.10.002
Article Google Scholar
Vo NPA, Popescu O (2016) A multi-layer system for semantic textual similarity. In: 8th international conference on knowledge discovery and information retrieval, vol 1, pp 56–67
Google Scholar
Vo NPA, Popescu O (2019) Multi-layer and co-learning systems for semantic textual similarity, semantic relatedness and recognizing textual entailment. In: 8th international joint conference, IC3K 2016, Porto, Portugal, November 9–11, 2016, Revised selected papers, pp 54–77
Google Scholar
Wu J, Xuan Z, Pan D (2011) Enhancing text representation for classification tasks with semantic graph structures. Int J Innov Comput Inf Control (ICIC) 7(5(B)):2689–2698
Google Scholar
Zhang M, Che W, Zhou G, Aw A, Tan C, Liu T, Li S (2008) Semantic role labeling using a grammar-driven convolution tree kernel. IEEE Trans Audio Speech Lang Process 16(7):1315–1329
Article Google Scholar

Download references

Author information

Authors and Affiliations

Oracle (United States), San Jose, CA, USA
Boris Galitsky

Authors

Boris Galitsky
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Galitsky, B. (2019). Learning Discourse-Level Structures for Question Answering. In: Developing Enterprise Chatbots. Springer, Cham. https://doi.org/10.1007/978-3-030-04299-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-04299-8_7
Published: 05 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04298-1
Online ISBN: 978-3-030-04299-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics