Skip to main content
Log in

Towards portable natural language interfaces based on case-based reasoning

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Natural Language Interfaces allow non-technical people to access information stored in Knowledge Bases keeping them unaware of the particular structure of the model or the underlying formal query language. Early research in the field was devoted to improve the performance of a particular system for a given Knowledge Base. Since adapting the system to new domains usually entailed considerable effort, investigating how to bring Portability to NLI became a new challenge. In this article, we investigate how Case-Based Reasoning could serve to assist the expert in porting the system so as to improve its retrieval performance. Our method HITS is based on a novel grammar learning algorithm combined with language acquisition techniques that exploit structural analogies. The learner (system) is able to engage the teacher (expert) with clarification dialogues to validate conjectures (hypotheses and deductions) about the language. Our method presents the following advantages: (i) the customization is naturally defined in the case-based cycle, (ii) the types of questions the system can deal with are not delimited in advance, and (iii) the system ‘reasons’ about precedent cases to deal with unseen questions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.w3.org/standards/semanticweb/ontology

  2. A question is considered to be tractable in Precise if none of its tokens is ambiguous in the Knowledge Base.

  3. Just for the simplicity purpose, these examples are shown as a human-like dialogue. Formal representations will be later exposed.

  4. Note that even if the answer is not present in the database, interpreting the question and informing accordingly is still valuable for the user who is now prevented from trying to reformulate the same query again and again in an unfruitful and frustrating way

  5. http://msdn.microsoft.com/en-us/library/aa198281(v=sql.80).aspx

  6. http://start.csail.mit.edu/

  7. Our method is not restricted to any particular DB or DBMS. In any case, it may be worth mentioning that it was validated on MySQL

  8. http://decsai.ugr.es/moreo/publico/NLIDB_dataSets/Datasets_NLIDB.html

  9. ErelatedToV is the QM devoted to retrieve all elements related to another element specified by a value. For example “show all rivers in Alaska”.

  10. In the first case, the lexicon contains entries (Book.Author, ’author’) and (Book.Author, ’write’). In the second case, the Lexicon contains entries (Author, ’author’), (Author, ’who’), and (Wrote, ’write’). (Recall the stemming process unifies ’write’, ’writes’, ’wrote’, etc.)

  11. EmostRelated retrieves the element in an entity that is most related to another entity type. For example “which river traverses most states?”

  12. Shallow parsing or shallow analysisusually stands for light parsers based on the identification of constituents, usually using named-entity techniques, and regardless of further syntactic or semantic considerations. They are considered to be less reliable but faster than other techniques such as formal grammar parsers.

  13. In this case, p i acts as a pivot. When more than one pivots are found, the one that produces the highest number of alignments is taken. I.e. join(a b c a, b d a), pivots first on b instead of on a

  14. Even if the improvement is quite limited in this example, note that the coverage could be significantly benefited after various iterations.

  15. We have limited the search to a maximum of two simultaneous hypotheses.

  16. Deletions do not actually represent a problem, since interpretations regarding deleted data will operate properly, returning no data.

  17. We used the exalead (http://www.exalead.com/search/) search engine to implement this method. In contrast to Turney’s notation (NEAR), this operator is called NEXT in exalead. For example: http://www.exalead.com/search/web/results/?q=Microsoft+NEXT+Company

  18. Geobase presents a rich structure to test the addition of new Query Models and to interpret compound questions. Jobdata test questions contains several unseen terms to test the hypotheses. Finally, Restbase presents a relatively simple structure, but questions show a considerable grammatical variability that helped us to test the productions refinement method.

  19. http://decsai.ugr.es/moreo/publico/NLIDB_dataSets/Datasets_NLIDB.html

  20. AoperationE resolves queries requesting for the elements of an entity that have some values satisfying certain operation. For example “what is the longest river”, where operation Greatest is applied to the numerical attribute Length in entity River

  21. An Intel(R) Core(TM)2 Quad Q8200 2.33GHz with 6GBytes RAM was used to carry out the tests.

  22. http://msdn.microsoft.com/en-us/library/aa198281(v=sql.80).aspx

  23. Results for Microsoft English Query were taken from the experimental validation reported in Popescu et al. (2003)

References

  • Aamodt, A., & Plaza, E. (1994). Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI communications, 7(1), 39–59.

    Google Scholar 

  • Acorn, T.L., & Walden, S.H. (1992). Smart: Support management automated reasoning technology for compaq customer service. In Proceedings of the fourth conference on Innovative applications of artificial intelligence (pp. 3–18): AAAI Press.

  • Adriaans, P., & Vervoort, M. (2002). The emile 4.1 grammar induction toolbox. In Grammatical Inference: Algorithms and Applications, volume 2484 of Lecture Notes in Artificial Intelligence (pp. 293–295).

  • Aha, D.W., Breslow, L.A., & Muñoz-Avila, H. (2001). Conversational case-based reasoning. Applied Intelligence, 14(1), 9–32.

    Article  MATH  Google Scholar 

  • Aha, D.W., McSherry, D., & Yang, Q. (2005). Advances in conversational case-based reasoning. The knowledge engineering review, 20(03), 247–254.

    Article  Google Scholar 

  • Alshawi, H., Carter, D., Crouch, R., & Pulman, S. (1994). Clare: A contextual reasoning and cooperative response framework for the core language engine Technical report crc-028.

  • Androutsopoulus, I., Ritchie, G., & Thanish, P. (1993). Masque/sql, an efficient and portable natural language query interface for relational databases. In Proceedings 6th International Conference on Industrial & Engineering Applications of Artificial Intelligence and Expert Systems, pages 327–330, Edinburgh, UK.

  • Androutsopoulos, I., Ritchie, G.D., & Thanish, P. (1995). Natural language interfaces to databases - an introduction. Natual Language Engineering, 1(1), 29–81.

    Google Scholar 

  • Angluin, D. (1987). Learning regular sets from queries and counterexamples. Information and Computation, 75(2), 87–106.

    Article  MathSciNet  MATH  Google Scholar 

  • Bernstein, A., Kaufmann, E., & Kaiser, C. (2005). Querying the semantic web with ginseng A guided input natural language search engine. In 15th Workshop on Information Technologies and Systems, Las Vegas, NV (pp. 112–126).

  • Carrick, C., Yang, Q., Abi-Zeid, I., & Lamontagne, L. (1999). Activating cbr systems through autonomous information gathering. In International Conference on Case-Based Reasoning (pp. 74–88): Springer.

  • Chu, W., Yang, H., Chiang, K., Minock, M., Chow, G., & Larson, C. (1996). Cobase: A scalable and extensible cooperative information system. Journal of Intelligent Information Systems, 6, 223– 259.

    Article  Google Scholar 

  • Cimiano, P., Haase, P., & Heizmann, J. (2007). Porting natural language interfaces between domains: an experimental user study with the orakel system. In Proceedings of the 12th international conference on Intelligent user interfaces, IUI ’07, pages 180–189, New York, NY, USA: ACM.

  • Cordier, A., Fuchs, B., Lieber, J., & Mille, A. (2007). Interactive knowledge acquisition in case based reasoning. In Wilson, D.C., & Khemani, D. (Eds.) Workshop on Knowledge Discovery and Similarity, a workshop of the seventh International Conference on Case-Based Reasoning (ICCBR-07): (volume editors).

  • Cullingford, E.R. (1978). Script application: Computer understanding of newspaper stories. Technical report, DTIC Document.

  • Cullot, N., Ghawi, R., & Kokou, Y. (2007). DB2OWL : A Tool For Automatic Database-to-Ontology Mapping. In SEBD (pp. 491–494).

  • Damljanović, D., & Bontcheva, K. (2009). Towards enhanced usability of natural language interfaces to knowledge bases. In Web 2.0 and Semantic Web, volume 6 of Annals of Information Systems (pp. 105–133). US: Springer.

  • Damljanovic, D., Agatonovic, M., & Cunningham, H. (2012). Freya: An interactive way of querying linked data using natural language. In Proceedings of the 8th international conference on The Semantic Web, ESWC’11, (pp. 125–138). Berlin, Heidelberg: Springer-Verlag.

  • Ferrucci, D., Levas, A., Bagchi, S., Gondek, D., & Mueller, E.T. (2013). Watson: beyond jeopardy!. Artificial Intelligence, 199, 93–105.

    Article  Google Scholar 

  • Frank, A., Krieger, H.-U., Feiyu, X., Uszkoreit, H., Crysmann, B., Jȯrg, B., & Schȧfer, U. (2007). Question answering from structured knowledge sources. Journal of Applied Logic, 5(1), 20–48.

    Article  Google Scholar 

  • Gold, M.E. (1967). Language identification in the limit. Information and Control, 10(5), 447–474.

    Article  MathSciNet  MATH  Google Scholar 

  • Grosz, B.J., Appelt, D.E., Martin, P.A., & Pereira, F.C.N. (1987). Team: an experiment in the design of transportable natural-language interfaces. Artificial Intelligence, 32(2), 173–243.

    Article  Google Scholar 

  • Hallett, C., Power, R., & Scott, D. (2007). Composing questions through conceptual authoring. Computational Linguistics, 33, 105–133.

    Article  Google Scholar 

  • Harris, Z.S. (1951). Structural linguistics. University of Chicago Press, chicago:IL, USA and London, UK 7th (1966) edition.

  • Hendrix, G., Sacerdoti, E., Sagalowicz, D., & Slocum, J. (1978). Developing a natural language interface to complex data. ACM Transactions on Database Systems, 3(2), 105–147.

    Article  Google Scholar 

  • Jones, K.S. (2005). Some points in a time. Computational Linguistics, 31(1), 1–14.

    Article  Google Scholar 

  • Kaplan, S.J. (1984). Designing a portable natural language database query system. ACM Transactions on Database Systems, 9, 1–19.

    Article  Google Scholar 

  • Kate, R.J., & Mooney, R.J. (2006). Using string-kernels for learning semantic parsers. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics.

  • Kate, R.J., Wong, Y.W., & Mooney, R.J. (2005). Learning to transform natural to formal languages. In Proceedings of the National Conference on Artificial Intelligence.

  • Kaufmann, E., & Bernstein, A. (2007). How useful are natural language interfaces to the semantic web for casual end-users?. In Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference, ISWC’07/ASWC’07 (pp. 281–294). Berlin, Heidelberg: Springer-Verlag.

  • Kaufmann, E., Bernstein, A., & Zumstein, R. (2006). Querix: A natural language interface to query ontologies based on clarification dialogs. In 5th ISWC (pp. 980–981): Springer.

  • Kaufmann, E., Bernstein, A., & Fischer, Lorenz (2007). Abraham Bernstein, and Lorenz Fischer NLP-reduce: A naive but Domain-independent Natural Language Interface for Querying Ontologies.

  • Kittredge, R. (1982). Variation and homogeneity of sublanguages. Sublanguage: Studies of Language in Restricted Semantic Domains, pages 107–137.

  • Kolodner, J.L. (1983). Reconstructive memory: A computer model. Cognitive science, 7(4), 281–328.

    Article  Google Scholar 

  • Kolodner, J. (2014). Case-based reasoning Morgan Kaufmann.

  • Leake, D.B., Kinley, A., & Wilson, D. (1996). Acquiring case adaptation knowledge: A hybrid approach. In Proceedings of the 13th National Conference on Artificial Intelligence, Menlo Park, CA (pp. 684–689): AAAI Press.

  • Lopez, V., Nikolov, A., Sabou, M., Uren, V., Motta, E., & D’Aquin, M. (2010). Scaling up question-answering to linked data. In Proceedings of the 17th international conference on Knowledge engineering and management by the masses, EKAW’10 (pp. 193–210). Berlin, Heidelberg: Springer-Verlag.

  • Lopez, V., Uren, V., Sabou, M., & Motta, E. (2011). Is question answering fit for the semantic web?: a survey. Semant. web, 2(2), 125–155.

    Google Scholar 

  • Lu, W., Ng, H.T. , Lee, W.S., & Zettlemoyer, L.S. (2008). A generative model for parsing natural language to meaning representations. In The Conference on Empirical Methods in Natural Language Processing.

  • McCord, M.C. (1990). Slot grammar: a system for simpler construction of practical natural language grammars. Technical report rc15582(d69261) IBM.

  • McCrae, J., & Spohr, D. (2011). Linking lexical resources and Ontologies on the semantic web with lemon. In Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I, ESWC’11 (pp. 245–259). Berlin, Heidelberg: Springer-Verlag.

  • McSherry, D. (2014). An algorithm for conversational case-based reasoning in classification tasks. In International Conference on Case-Based Reasoning (pp. 289–304): Springer.

  • Miller, G., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1990). Introduction to wordnet: an on-line lexical database. International Journal of Lexicography (special issue), 3(4), 235–312.

    Article  Google Scholar 

  • Moreo, A., Navarro, M., Castro, J.L., & Zurita, J.M. (2012). A high-performance faq retrieval method using minimal differentiator expressions. Knowledge-Based Systems, 36(0), 9–20.

    Article  Google Scholar 

  • Moreo, A., Eisman, E.M., Castro, J.L., & Zurita, J.M. (2013). Learning regular expressions to template-based faq retrieval systems. Knowledge-Based Systems, 53, 108–128.

    Article  Google Scholar 

  • Ogden, W., Mcdonald, J. , Bernick, P., & Chadwick, R. (2006). Habitability in question-answering systems. In Advances in Open Domain Question Answering, volume 32 of Text, Speech and Language Technology (pp. 457–473). Netherlands: Springer.

  • Ott, N. (1992). Aspects of the automatic generation of sql statements in a natural language query interface. Information Systems, (2):147–159.

  • Owei, V. (2000). Natural language querying of databases: an information extraction approach in the conceptual query language. International Journal of Human - Computer Studies, 53, 439–492.

    Article  Google Scholar 

  • Pazos, R.A., Pérez, J., González, J.J., Gelbukh, A., Sidorov, G., & Rodríguez, M.J. (2005). A domain independent natural language interface to databases capable of processing complex queries. In MICAI 2005 (pp. 833–842).

  • Popescu, A.M. , Etzioni, O., & Kautz, H. (2003). Towards a theory of natural language interfaces to databases. In 8th Intl. Conf. on Intelligent User Interfaces, pages 149–157, Miami, FL.

  • Rodolfo, A., Pazos, R., Juan, J., González, B., Marco, A., Aguirre, L., José, A., Martínez, F. , Héctor, J., & Fraire, H. (2013). Natural language interfaces to databases: An analysis of the state of the art . In Recent Advances on Hybrid Intelligent Systems, volume 451 of Studies in Computational Intelligence (pp. 463–480). Berlin, Heidelberg: Springer.

  • Sakakibara, Y. (1990). Learning context-free grammars from structural data in polynomial-time. Theoretical Computer Science, 76(2-3), 223–242.

    Article  MathSciNet  MATH  Google Scholar 

  • Schank, R.C. (1983). Dynamic memory: A theory of reminding and learning in computers and people Cambridge University Press.

  • Schank, R.C., & Abelson, R. (1977). Script, plans goals and understanding: an inquiry into human knowledge structures.

  • Simazu, H., Shibata, A., & Nihei, K. (2001). Expertguide: A conversational case-based reasoning tool for developing mentors in knowledge spaces. Applied Intelligence, 14(1), 33–48.

    Article  MATH  Google Scholar 

  • Spoerri, A. (1993). Infocrystal: A visual tool for information retrieval & management. In Proceedings of the second international conference on Information and knowledge management, CIKM ’93, pages 11–20, New York, NY, USA: ACM.

  • Tang, L., & Mooney, R.J. (2001). Using multiple clause constructors in inductive logic programming for semantic parsing. In Proceedings of the 12yh European Conference on Machine Learning (ECML-2001), pages 466–477, Freiburg, Germany.

  • Turney, P.D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania.

  • Unger, C., Hieber, F., & Cimiano, P. (2010). Generating ltag grammars from a lexicon-ontology interface. In Proceedings of the 10th International Workshop on Tree Adjoining Grammars and Related forMalisms (TAG+10), pages 61–68, Yale University 06/2010.

  • Waltz, D.L. (1978). An english language question answering system for a large relational database. Communications of the ACM, 21(7), 526–539.

    Article  MATH  Google Scholar 

  • Wang, C., Xiong, M., Zhou, Q., & Yu, Y. (2007). Panto – a portable natural language interface to ontologies. In 4th ESWC, Innsbruck (pp. 473–487): Springer-Verlag.

  • Weber, R.O., Ashley, K.D., & Brüninghaus, S. (2005). Textual case-based reasoning. The Knowledge Engineering Review, 20(03), 255–260.

    Article  Google Scholar 

  • Wong, Y.W., & Mooney, R.J. (2006). Learning for semantic parsing with statistical machine translation. In Proceedings of Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT-NAACL-06) (pp. 439–446). New york city, NY.

  • Woods, W.A., Kaplan, R.M. , & Webber, B.N. (1972). The lunar sciences natural language information system: Final report. In BBN Report 2378. Cambridge, Massachusetts: Bolt Beranek and Newman Inc.

  • Zaanen, M.V. (2001). Bootstrapping Structure into Language: Alignment-based Learning. PhD thesis, School of Computing, University of Leeds U.K.

  • Zhang, D.-M., Sheng, H.-Y., Li, F., & Yao, T.-F. (2002). The model design of a case-based reasoning multilingual natural language interface for database. In Proceedings International Conference on Machine Learning and Cybernetics, 2002, (Vol. 3 pp. 1474–1478): IEEE.

  • Zhou, L. (2007). Natural language interface for information management on mobile devices. Behaviour & Information Technology, 26(3), 197–207.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Spanish ‘Ministerio de Educación y Ciencia’ and ‘Junta de Andalucía’ that supported this research with its project P09TIC5011. Also, we would like to thank the anonymous reviewers for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Moreo.

Appendix: A: Entity relationship diagram of dataset domains

Appendix: A: Entity relationship diagram of dataset domains

Figures 1011 and 12 show the Entity-Relationship diagram for Restbase, Geobase, and Jobdata domains, respectively.

Fig. 10
figure 10

Entity relationship diagram of Restbase domain

Fig. 11
figure 11

Entity relationship diagram of Geobase domain

Fig. 12
figure 12

Entity relationship diagram of Jobdata domain

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moreo, A., Castro, J.L. & Zurita, J.M. Towards portable natural language interfaces based on case-based reasoning. J Intell Inf Syst 49, 281–314 (2017). https://doi.org/10.1007/s10844-017-0453-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-017-0453-8

Keywords

Navigation