A Content Management System for Chatbots

  • Boris Galitsky


In this chapter we describe the industrial applications of our linguistic-based relevance technology for processing, classification and delivery of a stream of texts as data sources for chatbots. We present the content pipeline for eBay entertainment domain that employs this technology, and show that text processing relevance is the main bottleneck for its performance. A number of components of the chatbot content pipeline such as content mining, thesaurus formation, aggregation from multiple sources, validation, de-duplication, opinion mining and integrity enforcement need to rely on domain-independent efficient text classification, entity extraction and relevance assessment operations.

Text relevance assessment is based on the operation of syntactic generalization (SG, Chap.  5) which finds a maximum common sub-tree for a pair of parse trees for sentences. Relevance of two portions of texts is then defined as a cardinality of this sub-tree. SG is intended to substitute keyword-based analysis for more accurate assessment of relevance that takes phrase-level and sentence-level information into account. In the partial case of SG, where short expression are commonly used terms such as Facebook likes, SG ascends to the level of categories and a reasoning technique is required to map these categories in the course of relevance assessment.

A number of content pipeline components employ web mining which needs SG to compare web search results. We describe how SG works in a number of components in the content pipeline including personalization and recommendation, and provide the evaluation results for eBay deployment. Content pipeline support is implemented as an open source contribution OpenNLP.Similarity.


  1. Aleman-Meza B, Halaschek C, Arpinar I, Sheth A (2003) A context-aware semantic association ranking. In: Proceedings of the first inernational workshop semantic web and databases (SWDB’03), pp 33–50Google Scholar
  2. Antoniou G, Billington D, Governatori G, Maher M (2001) Representation results for defeasible logic. ACM Trans Comput Log 2(2):255–287MathSciNetCrossRefGoogle Scholar
  3. Banerjee S, Mitra P (2016) WikiWrite: generating wikipedia articles automatically. IJCAI, New YorkGoogle Scholar
  4. Baralis E, Cagliero L, Cerquitelli T, Garza P (2012) Generalized association rule mining with constraints. Inf Sci 194:68–84CrossRefGoogle Scholar
  5. Baroni M, Chantree F, Kilgarriff A, Sharoff S (2008) Cleaneval: a competition for cleaning web pages. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odjik J, Piperidis S, Tapias D (eds) Proceedings of the sixth international language resources and evaluation (LREC’08)Google Scholar
  6. Bartlett FC (1932) Remembering: a study in experimental and social psychology. Cambridge University PressGoogle Scholar
  7. Barzilay R, Lee L (2004) Catching the drift: probabilistic content models, with applications to generation and summarization. HLT-NAACLGoogle Scholar
  8. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  9. Bordini RH, Braubach L (2006) A survey of programming languages and platforms for multi-agent systems. Informatica 30:33–44zbMATHGoogle Scholar
  10. Bridle JS (1990) Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Neurocomputing. Springer, pp 227–236Google Scholar
  11. Brzezinski D, Stefanowski J (2011) Accuracy updated ensemble for data streams with concept drift. In: Proceedings of HAIS 2011, Springer Verlag lecture notes in artificial intelligence 6679, pp 155–163Google Scholar
  12. Cai D, Yu S, Wen J-R, Ma W-Y (2003) Extracting content structure for web pages based on visual representation. In: Zhou X, Zhang Y, Orlowska ME (eds) APWeb, volume 2642 of LNCS, Springer, pp 406–417Google Scholar
  13. Cascading (2013) Welcome to the Cascading ecosystem.
  14. Chesñevar C, Maguitman A, González MP (2009. Empowering recommendation technologies through argumentation. In: Rahwan I, Simari G (eds) Argumentation in artificial intelligence, Springer Verlag, (505 p, in press). ISBN 978-0-387-98196-3Google Scholar
  15. Cumby C, Roth D (2003) On kernel methods for relational learning. In: ICML, pp 107–14Google Scholar
  16. Cuzzocrea A (Editorial) (2012) Intelligent knowledge-based models and methodologies for complex information systems. Inf Sci 194:1–282Google Scholar
  17. de Salvo Braz R, Girju R, Punyakanok V, Roth D, Sammons M (2005) An inference model for semantic entailment in natural language. In: Proceedings of AAAI-05Google Scholar
  18. Ding L, Finin T, Joshi A, Pan R, Cost RS, Peng Y, Reddivari P, Doshi V, Sachs J (2004) Swoogle: a search and metadata engine for the semantic web. In: Proceedings of 13th ACM international conference on information and knowledge management (CIKM’04), pp 652–659Google Scholar
  19. Erenel Z, Altınçay H (2012) Nonlinear transformation of term frequencies for term weighting in text categorization. Eng Appl Artifi Intell 25(7):1505–1514CrossRefGoogle Scholar
  20. Ferretti E, Errecalde M, García AJ, Simari GR (2007) An application of defeasible logic programming to decision making in a robotic environment. In: LPNMR, pp 297–302Google Scholar
  21. Galitsky B (2003) Natural language question answering system: technique of semantic headers. Advanced Knowledge International, AdelaideGoogle Scholar
  22. Galitsky B (2012) Machine learning of syntactic parse trees for search and classification of text. Eng Appl AI 26(3):1072–1091Google Scholar
  23. Galitsky B (2013) Transfer learning of syntactic structures for building taxonomies for search engines. Eng Appl Artif Intell 26(10):2504–2515CrossRefGoogle Scholar
  24. Galitsky B (2014) Learning parse structure of paragraphs and its applications in search. Eng Appl of AI 32:160–184CrossRefGoogle Scholar
  25. Galitsky B (2015). Finding a lattice of needles in a haystack: forming a query from a set of items of interest. In: FCA4AI@IJCAIGoogle Scholar
  26. Galitsky B (2016) A tool for efficient content compilation. In: COLING Demo C16-2042 Osaka, JapanGoogle Scholar
  27. Galitsky B (2017) Matching parse thickets for open domain question answering. Data Knowl Eng 107:24–50CrossRefGoogle Scholar
  28. Galitsky B, de la Rosa JL (2011) Concept-based learning of human behavior for customer relationship management. Spec Issue Inf Eng Appl Based on Lattices. Inf Sci 181(10):2016–2035Google Scholar
  29. Galitsky B, Ilvovsky D (2017) Chatbot with a discourse structure-driven dialogue management. In: EACL Demo E17-3022, Valencia, SpainGoogle Scholar
  30. Galitsky B, Kovalerchuk B (2014) Improving web search relevance with learning structure of domain concepts. In: Clusters, orders, and trees: methods and applications, pp 341–376Google Scholar
  31. Galitsky B, Kuznetsov SO (2013) A web mining tool for assistance with creative writing. In: ECIR 2013: advances in information retrieval, pp 828–831Google Scholar
  32. Galitsky B, Levene M (2007) Providing rating services and subscriptions with web portal infrastructures. In: Encyclopedia of portal technologies and applications, pp 855–862Google Scholar
  33. Galitsky B, Usikov D (2008) Programming spatial algorithms in natural language. In: AAAI workshop technical report WS-08-11, Palo Alto, pp 16–24Google Scholar
  34. Galitsky B, Kuznetsov SO, Samokhin MV (2005) Analyzing conflicts with concept-based learning. In: International conference on conceptual structures, pp 307–322Google Scholar
  35. Galitsky B, Kuznetsov SO, Kovalerchuk B (2008) Argumentation vs meta-argumentation for the assessment of multi-agent conflict. Proc. of the AAAI Workshop on MetareasoningGoogle Scholar
  36. Galitsky B, Chen H, Du S (2009) Inversion of Forum Content Based on Authors’ Sentiments on Product Usability. AAAI Spring Symposium: Social Semantic Web: Where Web 2.0 Meets Web 3.0, pp 33–38Google Scholar
  37. Galitsky B, Dobrocsi G, de la Rosa JL (2010) Inverting semantic structure under open domain opinion mining twenty-third international FLAIRS conferenceGoogle Scholar
  38. Galitsky B Dobrocsi G, de la Rosa JL, Kuznetsov SO (2011) Using generalization of syntactic parse trees for taxonomy capture on the web. In: ICCS, pp 104–117Google Scholar
  39. Galitsky B, Dobrocsi G, de la Rosa JL (2012) Inferring the semantic properties of sentences by mining syntactic parse trees. Data Knowl Eng 81:21–45CrossRefGoogle Scholar
  40. Galitsky B, Usikov D, Kuznetsov SO (2013) Parse thicket representations for answering multi-sentence questions. In: 20th international conference on conceptual structures, ICCSGoogle Scholar
  41. Galitsky B, Ilvovsky D, Kuznetsov SO (2015) Text classification into abstract classes based on discourse structure. In: Proceedings of recent advances in natural language processing, Hissar, Bulgaria, Sep 7–9 2015, pp 200–207Google Scholar
  42. Garcia A, Simari G (2004) Defeasible logic programming: an argumentative approach. Theory Pract Logic Program 4:95–138MathSciNetCrossRefGoogle Scholar
  43. Gartner (2018) Gartner says 25 percent of customer service operations will use virtual customer assistants by 2020.
  44. Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. Technical Report. Stanford UniversityGoogle Scholar
  45. Gomez SA, Chesñevar CI, Simari GR (2010) Reasoning with inconsistent ontologies through argumentation. Appl Artif Intell 24(1 & 2):102–148CrossRefGoogle Scholar
  46. Gomez H, Vilariño D, Pinto D, Sidorov G (2015) CICBUAPnlp: graph-based approach for answer selection in community question answering task. In: Sem Eavl-2015, pp 18–22Google Scholar
  47. Google (2018) Search using autocomplete.
  48. Harris Z (1982) Discourse and sublanguage. In: Kittredge R, Lehrberger J (eds) Sublanguage: studies of language in restricted semantic domains. Walter de Gruyter, Berlin, New York, pp 231–236Google Scholar
  49. Hendrikx M, Meijer S, Van Der Velden J, Iosup A (2013) Procedural content generation for games: a survey. ACM Trans Multimed Comput Commun Appl 9(1), Article 1, 22 pagesGoogle Scholar
  50. Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11:37–50CrossRefGoogle Scholar
  51. Janusz A, Ślęzak D, Nguyen HS (2012) Unsupervised similarity learning from textual data. Fundam Inform 119(3):319–336MathSciNetzbMATHGoogle Scholar
  52. Jindal R, Taneja S (2017) A novel weighted classification approach using linguistic text mining. Int J Comput Appl 180(2):9–15Google Scholar
  53. Johnson MR (2016) Procedural generation of linguistics, dialects, naming conventions and spoken sentences. In: Proceedings of 1st international joint conference of DiGRA and FDGGoogle Scholar
  54. Kong F, Zhou G (2011) Improving tree kernel-based event pronoun resolution with competitive information. In: Proceedings of the twenty-second international joint conference on artificial intelligence, vol 3, pp 1814–1819Google Scholar
  55. Krippendorff K (2004) Reliability in content analysis: some common misconceptions and recommendations. Hum Commun Res 30(3):411–433Google Scholar
  56. Kuncheva LI (2004) Classier ensembles for changing environments. In: Roli F, Kittler J, Windeatt T (eds) Multiple classifier systems, LNCS, vol 3077. Springer, Heidelberg, p 1CrossRefGoogle Scholar
  57. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning – Volume 32 (ICML’14), Eric P. Xing and Tony Jebara (Eds.), Vol 32Google Scholar
  58. Leouski AV, Croft WB (1996) An evaluation of techniques for clustering search results. UMass Tech Report #76.
  59. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10(8):707–710MathSciNetGoogle Scholar
  60. Liapis A, Yannakakis GN, Togelius J (2013) Sentient sketchbook: computer-aided game level authoring. In: InFDG, pp 213–220Google Scholar
  61. Makhalova T, Ilvovsky DA, Galitsky B (2015) News clustering approach based on discourse text structure. In: Proceedings of the First Workshop on Computing News Storylines @ACLGoogle Scholar
  62. Mann WC, Thompson SA (1988) Rhetoric al structure theory: toward a functional theory of text organization. Text 8(3):243–281CrossRefGoogle Scholar
  63. Manning CD, Raghavan P, Schütze H (2008) Introduction to Information Retrieval. Cambridge University Press, Cambridge UKCrossRefGoogle Scholar
  64. Marcu D (1997) The rhetorical parsing, summarization, and generation of natural language texts. Unpublished Ph.D. dissertation, University of Toronto, Toronto, CanadaGoogle Scholar
  65. Mavridis T, Symeonidis AL (2014) Semantic analysis of web documents for the generation of optimal content. Eng Appl Artif Intell 35:114–130CrossRefGoogle Scholar
  66. Moschitti A (2008) Kernel methods, syntax and semantics for relational text categorization. In: Proceeding of ACM 17th conference on information and knowledge management (CIKM). Napa Valley, CaliforniaGoogle Scholar
  67. McKeown KR (1985) Text generation: using discourse strategies and focus constraints to generate natural language text. Cambridge University Press, Cambridge, UKCrossRefGoogle Scholar
  68. Nagarajan V, Chandrasekar P (2014) Pivotal sentiment tree classifier. Int J Sci Technol Res 3(11):190Google Scholar
  69. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Nicoletta Calzolari N (ed) LREC’Google Scholar
  70. Pasternack J, Roth D (2009) Extracting article text from the web with maximum subsequence segmentation. In: WWW ‘09: proceedings of the 18th international conference on world wide web, ACM, New York, pp 971–980Google Scholar
  71. Rahwan I, Amgoud L (2006) An argumentation based approach for practical reasoning. In: International joint conference on autonomous agents and multi agent systems, pp 347–354Google Scholar
  72. Rédey G (1993) Conformal text representation. Eng Appl Artif Intell 6(1):65–71CrossRefGoogle Scholar
  73. Rosenthal S, Ritter A, Nakov P, Stoyanov V (2014) SemEval-2014 task 9: sentiment analysis in Twitter. In: SemEval-2014Google Scholar
  74. Rubiolo M, Caliusco ML, Stegmayer G, Coronel M, Gareli Fabrizi M (2012) Knowledge discovery through ontology matching: an approach based on an artificial neural network model. Inf Sci 194:107–119CrossRefGoogle Scholar
  75. Sagui F, Maguitman A, Chesñevar C, Simari G (2009) Modeling news trust: a defeasible logic programming approach. Iberoam J Artif Intell 12(40):63–72. Edited by AEPIA (Spanish Association of Artificial Intelligence), Madrid, Spain, ISSN 1137-3601Google Scholar
  76. Sauper C, Barzilay R (2000) Automatically generating wikipedia articles: a structure-aware approach, Proceedings of ACLGoogle Scholar
  77. Sauper C, Barzilay R (2009) Automatically generating wikipedia articles: a structure-aware approach. In: Proceedings of ACL. Suntec, Singapore, pp 2008–2016Google Scholar
  78. Sidorov G (2013) Syntactic dependency based N-grams in rule based automatic English as second language grammar correction. Int J Comput Linguist Appl 4(2):169–188Google Scholar
  79. Sidorov G (2014) Should syntactic N-grams contain names of syntactic relations? Int J Comput Linguist Appl 5(1):139–158Google Scholar
  80. Simplea (2018) AI Marketing, Chatbots, and Your CMS.
  81. Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on empirical methods in natural language processing (EMNLP 2013)Google Scholar
  82. Suykens JAK, Horvath G, Basu S, Micchelli C, Vandewalle J (eds) (2003) Advances in learning theory: methods, models and applications, NATO-ASI series III: computer and systems sciences, vol 190. IOS Press, AmsterdamGoogle Scholar
  83. Tneogi (2018) Conversational interfaces need a different content management system. Chatbot Magazine.
  84. Tunkelang D (2018) Search results clustering.
  85. Varshavsky R, Moshe T, Yuval P, Wilson DB (2010) Group recommendations in social networks. US Patent App 20110270774, MicrosoftGoogle Scholar
  86. Vo NPA, Popescu O (2016) A multi-layer system for semantic textual similarity. In: 8th international conference on knowledge discovery and information RetrievalGoogle Scholar
  87. Vo NPA, Magnolini S, Popescu O (2015) FBK-HLT: a new framework for semantic textual similarity. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval-2015), NAACL-HLT 2015, At Denver, USAGoogle Scholar
  88. Wade M (2018) 5 ways chatbots are revolutionizing knowledge management. AtBot.
  89. Wenyin L, Quan X, Feng M, Qiu B (2010) A short text modeling method combining semantic and statistical information. Inf Sci 180(20):4031–4041CrossRefGoogle Scholar
  90. Wray A (2002) Formulaic language and the lexicon. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  91. Zarrella G, Henderson J, Merkhofer EM, Strickhart L. (2015) MITRE: seven systems for semantic similarity in tweets. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Boris Galitsky
    • 1
  1. 1.Oracle (United States)San JoseUSA

Personalised recommendations