LiveDoc: Showing Contextual Information Using Topic Modeling Techniques

Deshmukh, Jayati; Annervaz, K. M.; Sengupta, Shubhashis; Pathak, Neetu

doi:10.1007/978-3-319-41920-6_37

LiveDoc: Showing Contextual Information Using Topic Modeling Techniques

Jayati Deshmukh¹⁴,
K. M. Annervaz¹⁴,
Shubhashis Sengupta¹⁴ &
…
Neetu Pathak¹⁴

Conference paper
First Online: 28 June 2016

3034 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9729))

Abstract

We present a solution named LiveDoc, which augments natural language text documents with relevant contextual background information. This background information helps readers to understand the context of the discourse better by fetching relevant information from other sources such as Wikipedia. Often the readers do not possess all background and supplementary information required for comprehending the purport of a narrative such as a news op-ed article. At the same time, it is not possible for authors to provide all contextual information while addressing a particular topic. LiveDoc processes the information in a document; uses extracted entities to fetch relevant background information in the context of the document from various sources (as defined by user) using semantic matching and topic modeling techniques like Latent Dirichlet Allocation and Hierarchical Dirichlet Process; and presents the background information to the user by augmenting the original document with the fetched information. Reader is then equipped better to understand the document with this additional background information. We present the effectiveness of our solution through extensive experimentation and associated results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72. Association for Computational Linguistics (2006)
Google Scholar
Bishop, C.M.: Pattern recognition and machine learning. Springer (2006)
Google Scholar
Blei, D.M.: Probabilistic topic models. Communications of the ACM 55(4), 77–84 (2012)
Article MathSciNet Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Budanitsky, A., Hirst, G.: Semantic distance in wordnet: An experimental, application-oriented evaluation of five measures (2001)
Google Scholar
Bunescu, R.C., Mooney, R.J.: A shortest path dependency kernel for relation extraction. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 724–731. Association for Computational Linguistics (2005)
Google Scholar
Cassidy, T., Ji, H., Ratinov, L.A., Zubiaga, A., Huang, H.: Analysis and enhancement of wikification for microblogs with context expansion. In: COLING, vol. 12, pp. 441–456 (2012)
Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007)
Google Scholar
Dumais, S.T.: Latent semantic analysis. Annual Review of Information Science and Technology 38(1), 188–230 (2004)
Article Google Scholar
Feldman, R., Sanger, J.: The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press (2007)
Google Scholar
Ferragina, P., Scaiella, U.: Tagme: On-the-fly annotation of short text fragments (by wikipedia entities). In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 1625–1628 (2010). http://doi.acm.org/10.1145/1871437.1871689
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)
Google Scholar
Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proc. 23rd International Conference on Machine learning (ICML 2006), pp. 377–384. ACM Press (2006)
Google Scholar
GuoDong, Z., Jian, S., Jie, Z., Min, Z.: Exploring various knowledge in relation extraction. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 427–434. Association for Computational Linguistics (2005)
Google Scholar
Hearst, M.A.: Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1), 33–64 (1997)
Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. CoRR cmp-lg/9709008 (1997)
Google Scholar
Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 457–466 (2009). http://doi.acm.org/10.1145/1557019.1557073
Liu, X., Li, Y., Wu, H., Zhou, M., Wei, F., Lu, Y.: Entity linking for tweets. In: ACL (1), pp. 1304–1311 (2013)
Google Scholar
Mihalcea, R.: Using wikipedia for automatic word sense disambiguation. In: HLT-NAACL, pp. 196–203 (2007)
Google Scholar
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 233–242. ACM (2007)
Google Scholar
Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM (2008)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Mining Text Data, pp. 43–76. Springer (2012)
Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta. http://is.muni.cz/publication/884893/en
Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. Introduction to Statistical Relational Learning, 93–128 (2006)
Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. Journal of the American Statistical Association 101(476) (2006)
Google Scholar
Wang, T., Li, Y., Bontcheva, K., Cunningham, H., Wang, J.: Automatic extraction of hierarchical relations from text. Springer (2006)
Google Scholar
Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. The Journal of Machine Learning Research 3, 1083–1106 (2003)
MathSciNet MATH Google Scholar
Zhou, Y., Nie, L., Rouhani-Kalleh, O., Vasile, F., Gaffney, S.: Resolving surface forms to wikipedia topics. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1335–1343. Association for Computational Linguistics (2010)
Google Scholar
Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.R.: Statsnowball: a statistical approach to extracting entity relationships. In: Proceedings of the 18th International Conference on World Wide Web, pp. 101–110. ACM (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Accenture Technology Labs, Bangalore, India
Jayati Deshmukh, K. M. Annervaz, Shubhashis Sengupta & Neetu Pathak

Authors

Jayati Deshmukh
View author publications
You can also search for this author in PubMed Google Scholar
K. M. Annervaz
View author publications
You can also search for this author in PubMed Google Scholar
Shubhashis Sengupta
View author publications
You can also search for this author in PubMed Google Scholar
Neetu Pathak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jayati Deshmukh .

Editor information

Editors and Affiliations

IBaI, Inst of Comp Vision and applied Comp Sci, Leipzig, Sachsen, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deshmukh, J., Annervaz, K.M., Sengupta, S., Pathak, N. (2016). LiveDoc: Showing Contextual Information Using Topic Modeling Techniques. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2016. Lecture Notes in Computer Science(), vol 9729. Springer, Cham. https://doi.org/10.1007/978-3-319-41920-6_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-41920-6_37
Published: 28 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41919-0
Online ISBN: 978-3-319-41920-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics