Person Retrieval on XML Documents by Coreference Analysis Utilizing Structural Features

Yonei, Yumi; Iwaihara, Mizuho; Yoshikawa, Masatoshi

doi:10.1007/978-3-540-85654-2_47

Person Retrieval on XML Documents by Coreference Analysis Utilizing Structural Features

Yumi Yonei¹,
Mizuho Iwaihara¹ &
Masatoshi Yoshikawa¹

Conference paper

1138 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5181))

Abstract

Keyword retrieval of the present day exploits frequencies and positions of search keywords in target documents. As for retrieval by two or more keywords, semantic relation between keywords is important. For retrieving information about a person, it is common to search by a pair of keywords consisting of person’s name and his/her attribute of the interest. By using dependency analysis and coreference analysis, correct occurrences of pairs of person and his/her attributes can be retrieved. However, existing natural language analysis does not consider the factor that logical structures of the documents strongly influence probabilistic patterns of coreference. In this paper, we propose a new way of person retrieval by computing a maximum entropy model from linguistic features and structural features, where structural features are learned from probabilistic distribution of coreference over XML document structures. Our method can utilize strong correlation between XML document structures and coreference, thus having superior accuracy than existing methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amer-Yahia, S., Lalmas, M.: XML search: languages, INEX and scoring. ACM SIGMOD Record 35(4) (2006)
Google Scholar
Berger, A.L., Pietra, S.D., Pietra, V.J.D.: A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics 22(1), 39–71 (1996)
Google Scholar
Chen, H., Tsai, S., Tsai, J.: Mining Tables from Large Scale HTML Texts. In: 18th International Conference on Computational Linguistics, pp. 166–172 (2000)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Google Scholar
Idehara, H., Fujimoto, N., Takeno, H., Hagihara, K.: A Sentence Extraction Technique Based on HTML Parsing Tree Structures around Images for WWW Image Retrieval. IEICE technical report. Dependable computing 105(340), 19–24 (2005) (in Japanese)
Google Scholar
Iida, R., Inui, K., Matsumoto, Y., Sekine, S.: Noun Phrase Coreference Resolution in Japanese Based on Most Likely Antecedent Candidates. Transactions of Information Processing Society of Japan 46(3), 831–844 (2005) (in Japanese)
Google Scholar
Kuboyama, T., Shin, K., Kashima, H.: Flexible Tree Kernels based on Counting the Number of Tree Mappings. In: Workshop on Mining and Learning held with ECML/PKDD (2006)
Google Scholar
Kehler, A.: Probabilistic Coreference in Information Extraction. Association for Computational Linguistics, 163–173 (1997)
Google Scholar
Kudo, T., Matsumoto, Y.: Chunking with Support Vector Machines. IPSJ SIG Notes 2000(107), 9–16 (2000) (in Japanese)
Google Scholar
Kobayashi, N., CIida, R., CInui, K., Matsumoto, Y.: Opinion Extraction Using a Learning-Based Anaphora Resolution Technique. In: The Second International Joint Conference on Natural Language Processing, pp. 175–180 (2005)
Google Scholar
Le, Z.: Maximum Entropy Modeling Toolkid for Python and C++. http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html
Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H., Takaoka, K., Asahara, M.: Morphological Analysis System ChaSen version 2.2.9 Manual. Nara Institute of Science and Technology (2002)
Google Scholar
Theobald, M., Bast, H., Majumdar, D., Schenkel, R., Weikum, G.: TopX: efficient and versatile top-k query processing for semistructured data. The VLDB Journal 17(1) (2008)
Google Scholar
Yokoi, T.: The EDR electronic dictionary. Communications of the ACM 38 (1995)
Google Scholar
SVM^light, http://dit.unitn.it/~moschitt/Tree-Kernel.htm
Yoshida, M., Torisawa, K., Tsujii, J.: Extracting ontologies from World Wide Web via HTML tables. Pacific Association for Computational Linguistics, 332–341 (2001)
Google Scholar
Zettsu, K., Tanaka, K.: Extraction and Visualization of Image Contexts from Web. In: DEWS, 6-p-05 (2003) (in Japanese)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Social Informatics, Graduate School of Informatics, Kyoto University, Yoshidahonmachi, Sakyo-ku, Kyoto, 606-8501, Japan
Yumi Yonei, Mizuho Iwaihara & Masatoshi Yoshikawa

Authors

Yumi Yonei
View author publications
You can also search for this author in PubMed Google Scholar
Mizuho Iwaihara
View author publications
You can also search for this author in PubMed Google Scholar
Masatoshi Yoshikawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sourav S. Bhowmick Josef Küng Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yonei, Y., Iwaihara, M., Yoshikawa, M. (2008). Person Retrieval on XML Documents by Coreference Analysis Utilizing Structural Features. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2008. Lecture Notes in Computer Science, vol 5181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85654-2_47

Download citation

DOI: https://doi.org/10.1007/978-3-540-85654-2_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85653-5
Online ISBN: 978-3-540-85654-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics