Abstract
Nowadays there exist a lot of documents in electronic format on the Internet, such as daily news and blog articles. Most of them are related, organized and archived into categories according to their themes. In this paper, we propose a statistical technique to analyze collections of documents, characterized by a hierarchical structure, to extract information hidden into them. Our approach is based on an extension of the log-bilinear model. Experimental results on real data illustrate the merits of the proposed statistical hierarchical model and its efficiency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Denning, P.J., Denning, D.E.: Discussing cyber attack. Communications of the ACM 53(9), 29–31 (2010)
Franklin, J., Paxson, V., Perrig, A., Savage, S.: An inquiry into the nature and causes of the wealth of internet miscreants. In: Proc. of the 14th ACM Conference on Computer and Communications Security (CCS), pp. 375–388. ACM (2007)
Sanjay, G.: Cyberwarfare: connecting the dots in cyber intelligence. Communications of the ACM 54(8), 132–140 (2011)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 50–57 (1999)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)
Dietz, L., Bickel, S., Scheffer, T.: Unsupervised prediction of citation influences. In: Proc. of the 24th International Conference on Machine Learning (ICML), pp. 233–240. ACM (2007)
Mimno, D., McCallum, A.: Mining a digital library for influential authors. In: Proc. of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), pp. 105–106. ACM (2007)
Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 141–188 (2010)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Lin, D., Pantel, P.: Dirt - discovery of inference rules from text. In: Proc. of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 323–328 (2001)
Turney, P.D.: Similarity of semantic relations. Computational Linguistics 32(3), 379–416 (2006)
Nakov, P.I., Hearst, M.A.: Ucb: System description for semeval task 4. In: Proc. of the Fourth International Workshop on Semantic Evaluations (2007)
Dumais, S., Chen, H.: Hierarchical classification of web content. In: Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 256–263. ACM (2000)
Zhang, D., Lee, W.S.: Web taxonomy integration using support vector machines. In: Proc. of the 13th International Conference on WWW, pp. 472–481 (2004)
Ruiz, M.E., Srinivasan, P.: Hierarchical text categorization using neural networks. Information Retrieval 5(1), 87–118 (2002)
Hofmann, T., Cai, L., Ciaramita, M.: Learning with taxonomies: Classifying documents and words. In: Proc. of Synatx, Semantics and Statistics NIPS Workshop (2003)
Maas, A., Ng, A.: A probabilistic model for semantic word vectors. In: Proc. of the Deep Learning and Unsupervised Feature Learning Workshop NIPS 2010 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Su, W., Ziou, D., Bouguila, N. (2013). A Hierarchical Statistical Framework for the Extraction of Semantically Related Words in Textual Documents. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds) Rough Sets and Knowledge Technology. RSKT 2013. Lecture Notes in Computer Science(), vol 8171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41299-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-41299-8_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41298-1
Online ISBN: 978-3-642-41299-8
eBook Packages: Computer ScienceComputer Science (R0)