Skip to main content

Injecting Structured Data to Generative Topic Model in Enterprise Settings

  • Conference paper
Advances in Machine Learning (ACML 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5828))

Included in the following conference series:

Abstract

Enterprises have accumulated both structured and unstructured data steadily as computing resources improve. However, previous research on enterprise data mining often treats these two kinds of data independently and omits mutual benefits. We explore the approach to incorporate a common type of structured data (i.e. organigram) into generative topic model. Our approach, the Partially Observed Topic model (POT), not only considers the unstructured words, but also takes into account the structured information in its generation process. By integrating the structured data implicitly, the mixed topics over document are partially observed during the Gibbs sampling procedure. This allows POT to learn topic pertinently and directionally, which makes it easy tuning and suitable for end-use application. We evaluate our proposed new model on a real-world dataset and show the result of improved expressiveness over traditional LDA. In the task of document classification, POT also demonstrates more discriminative power than LDA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lu, Y., Zhai, C.: Opinion integration through semi-supervised topic modeling. In: Proceedings of WWW International World Wide Web Conference, pp. 121–130 (2008)

    Google Scholar 

  2. Bhattacharya, I., Godbole, S., Joshi, S.: Structured entity identification and document categorization: two tasks with one joint model. In: Proceedings of 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008)

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    Article  MATH  Google Scholar 

  4. Wang, X., McCallum, A.: Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends. In: Proceedings of 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)

    Google Scholar 

  5. Griffiths, T., Steyvers, M., Blei, D., Tenenbaum, J.: Integrating topics and syntax. In: Advances in Neural Information Processing Systems 17, pp. 537–544. MIT Press, Cambridge (2005)

    Google Scholar 

  6. Mimno, D., McCallum, A.: Expertise modeling for matching papers with reviewers. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)

    Google Scholar 

  7. Andrieu, C., de Freitas, N., Doucet, A., Jordan, M.: An introduction to MCMC for machine learning. Machine Learning 50, 5–43 (2003)

    Article  MATH  Google Scholar 

  8. Downey, D., Broadhead, M., Etzioni, O.: Locating complex named entities in web text. In: Proceedings of IJCAI International Joint Conferences on Artificial Intelligence (2007)

    Google Scholar 

  9. Cohen, W.W., Ravikumar, P., Fienberg, S.: A Comparison of String Metrics for Matching Names and Records. In: Proceedings of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Data Cleaning and Object Consolidation (2003)

    Google Scholar 

  10. Blei, D.M., McAuliffe, J.D.: Supervised Topic Models. In: Proceedings of NIPS Neural Information Processing Systems (2007)

    Google Scholar 

  11. Lacoste-Julien, S., Sha, F., Jordan, M.I.: DiscLDA: Discriminative learning for dimensionality reduction and classification. In: Blei, D.M., McAuliffe, J.D. (eds.) Proceedings of NIPS Neural Information Processing Systems (2008)

    Google Scholar 

  12. McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization. Tech. rep. WS-98-05. AAAI Press, Stanford (1998), http://www.cs.cmu.edu/~mccallum

  13. van der Maaten, L.J.P., Hinton, G.E.: Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9, 2579–2605 (2008)

    Google Scholar 

  14. Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, August 6-7, pp. 248–256 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xiao, H., Wang, X., Du, C. (2009). Injecting Structured Data to Generative Topic Model in Enterprise Settings. In: Zhou, ZH., Washio, T. (eds) Advances in Machine Learning. ACML 2009. Lecture Notes in Computer Science(), vol 5828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05224-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05224-8_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05223-1

  • Online ISBN: 978-3-642-05224-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics