A Hierarchical Statistical Framework for the Extraction of Semantically Related Words in Textual Documents

Su, Weijia; Ziou, Djemel; Bouguila, Nizar

doi:10.1007/978-3-642-41299-8_34

A Hierarchical Statistical Framework for the Extraction of Semantically Related Words in Textual Documents

Weijia Su²⁴,
Djemel Ziou²⁵ &
Nizar Bouguila²⁴

Conference paper

1508 Accesses
1 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8171))

Abstract

Nowadays there exist a lot of documents in electronic format on the Internet, such as daily news and blog articles. Most of them are related, organized and archived into categories according to their themes. In this paper, we propose a statistical technique to analyze collections of documents, characterized by a hierarchical structure, to extract information hidden into them. Our approach is based on an extension of the log-bilinear model. Experimental results on real data illustrate the merits of the proposed statistical hierarchical model and its efficiency.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Denning, P.J., Denning, D.E.: Discussing cyber attack. Communications of the ACM 53(9), 29–31 (2010)
Article Google Scholar
Franklin, J., Paxson, V., Perrig, A., Savage, S.: An inquiry into the nature and causes of the wealth of internet miscreants. In: Proc. of the 14th ACM Conference on Computer and Communications Security (CCS), pp. 375–388. ACM (2007)
Google Scholar
Sanjay, G.: Cyberwarfare: connecting the dots in cyber intelligence. Communications of the ACM 54(8), 132–140 (2011)
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 50–57 (1999)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)
Article Google Scholar
Dietz, L., Bickel, S., Scheffer, T.: Unsupervised prediction of citation influences. In: Proc. of the 24th International Conference on Machine Learning (ICML), pp. 233–240. ACM (2007)
Google Scholar
Mimno, D., McCallum, A.: Mining a digital library for influential authors. In: Proc. of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), pp. 105–106. ACM (2007)
Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 141–188 (2010)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Google Scholar
Lin, D., Pantel, P.: Dirt - discovery of inference rules from text. In: Proc. of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 323–328 (2001)
Google Scholar
Turney, P.D.: Similarity of semantic relations. Computational Linguistics 32(3), 379–416 (2006)
Article MATH Google Scholar
Nakov, P.I., Hearst, M.A.: Ucb: System description for semeval task 4. In: Proc. of the Fourth International Workshop on Semantic Evaluations (2007)
Google Scholar
Dumais, S., Chen, H.: Hierarchical classification of web content. In: Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 256–263. ACM (2000)
Google Scholar
Zhang, D., Lee, W.S.: Web taxonomy integration using support vector machines. In: Proc. of the 13th International Conference on WWW, pp. 472–481 (2004)
Google Scholar
Ruiz, M.E., Srinivasan, P.: Hierarchical text categorization using neural networks. Information Retrieval 5(1), 87–118 (2002)
Article MATH Google Scholar
Hofmann, T., Cai, L., Ciaramita, M.: Learning with taxonomies: Classifying documents and words. In: Proc. of Synatx, Semantics and Statistics NIPS Workshop (2003)
Google Scholar
Maas, A., Ng, A.: A probabilistic model for semantic word vectors. In: Proc. of the Deep Learning and Unsupervised Feature Learning Workshop NIPS 2010 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, Canada
Weijia Su & Nizar Bouguila
Department of Computer Science, University of Sherbrooke, Sherbrooke, QC, Canada
Djemel Ziou

Authors

Weijia Su
View author publications
You can also search for this author in PubMed Google Scholar
Djemel Ziou
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Bouguila
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Saint Mary’s University, B3H 3C3, Halifax, NS, Canada
Pawan Lingras
Maria Curie-Skłodowska University, Lublin, Poland
Marcin Wolski
University of Granada, Spain
Chris Cornelis
Indian Statistical Institute, 700108, Kolkata, India
Sushmita Mitra
University of Warsaw, 02-097, Warsaw, Poland
Piotr Wasilewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Su, W., Ziou, D., Bouguila, N. (2013). A Hierarchical Statistical Framework for the Extraction of Semantically Related Words in Textual Documents. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds) Rough Sets and Knowledge Technology. RSKT 2013. Lecture Notes in Computer Science(), vol 8171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41299-8_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-41299-8_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41298-1
Online ISBN: 978-3-642-41299-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics