Skip to main content

A Hierarchical Statistical Framework for the Extraction of Semantically Related Words in Textual Documents

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8171))

Abstract

Nowadays there exist a lot of documents in electronic format on the Internet, such as daily news and blog articles. Most of them are related, organized and archived into categories according to their themes. In this paper, we propose a statistical technique to analyze collections of documents, characterized by a hierarchical structure, to extract information hidden into them. Our approach is based on an extension of the log-bilinear model. Experimental results on real data illustrate the merits of the proposed statistical hierarchical model and its efficiency.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Denning, P.J., Denning, D.E.: Discussing cyber attack. Communications of the ACM 53(9), 29–31 (2010)

    Article  Google Scholar 

  2. Franklin, J., Paxson, V., Perrig, A., Savage, S.: An inquiry into the nature and causes of the wealth of internet miscreants. In: Proc. of the 14th ACM Conference on Computer and Communications Security (CCS), pp. 375–388. ACM (2007)

    Google Scholar 

  3. Sanjay, G.: Cyberwarfare: connecting the dots in cyber intelligence. Communications of the ACM 54(8), 132–140 (2011)

    Article  Google Scholar 

  4. Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 50–57 (1999)

    Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  6. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)

    Article  Google Scholar 

  7. Dietz, L., Bickel, S., Scheffer, T.: Unsupervised prediction of citation influences. In: Proc. of the 24th International Conference on Machine Learning (ICML), pp. 233–240. ACM (2007)

    Google Scholar 

  8. Mimno, D., McCallum, A.: Mining a digital library for influential authors. In: Proc. of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), pp. 105–106. ACM (2007)

    Google Scholar 

  9. Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 141–188 (2010)

    Google Scholar 

  10. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)

    Google Scholar 

  11. Lin, D., Pantel, P.: Dirt - discovery of inference rules from text. In: Proc. of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 323–328 (2001)

    Google Scholar 

  12. Turney, P.D.: Similarity of semantic relations. Computational Linguistics 32(3), 379–416 (2006)

    Article  MATH  Google Scholar 

  13. Nakov, P.I., Hearst, M.A.: Ucb: System description for semeval task 4. In: Proc. of the Fourth International Workshop on Semantic Evaluations (2007)

    Google Scholar 

  14. Dumais, S., Chen, H.: Hierarchical classification of web content. In: Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 256–263. ACM (2000)

    Google Scholar 

  15. Zhang, D., Lee, W.S.: Web taxonomy integration using support vector machines. In: Proc. of the 13th International Conference on WWW, pp. 472–481 (2004)

    Google Scholar 

  16. Ruiz, M.E., Srinivasan, P.: Hierarchical text categorization using neural networks. Information Retrieval 5(1), 87–118 (2002)

    Article  MATH  Google Scholar 

  17. Hofmann, T., Cai, L., Ciaramita, M.: Learning with taxonomies: Classifying documents and words. In: Proc. of Synatx, Semantics and Statistics NIPS Workshop (2003)

    Google Scholar 

  18. Maas, A., Ng, A.: A probabilistic model for semantic word vectors. In: Proc. of the Deep Learning and Unsupervised Feature Learning Workshop NIPS 2010 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Su, W., Ziou, D., Bouguila, N. (2013). A Hierarchical Statistical Framework for the Extraction of Semantically Related Words in Textual Documents. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds) Rough Sets and Knowledge Technology. RSKT 2013. Lecture Notes in Computer Science(), vol 8171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41299-8_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41299-8_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41298-1

  • Online ISBN: 978-3-642-41299-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics