Abstract
Motivated from related entity finding problem, in this paper, we introduce a novel approach to query answering called “NMiner.” NMiner takes advantage of heuristics to find answers to complex semantic queries. It uses a combination of natural language processing techniques to parse sentences and extract entities, hypertext structure of the documents to derive relational information, and semantic web data to extract relevant entities as search result candidates. Further, a bimodal network of sentences and entities is created from the search result candidates. Content Centric Ranking (CCR) and Cumulative Structural Similarity (CSS), are proposed to rank the candidate entities. Our empirical study on the ClueWeb09 corpus (with approximately 25 terabytes of web documents) shows that both CSS and CCR outperform PageRank and HITS. Moreover, NMiner proved to be significant in solving the problem of answering complex queries performed against a largely unstructured corpus of text documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
NIST Special Publication 500-207: The 1st Text REtrieval Conference (TREC-1), http://trec.nist.gov/pubs/trec1/t1_proceedings.html
Chakrabarti, S.: Mining the Web. Discovering Knowledge from Hypertext Data. Morgen and Kaufmann Publishers (2003)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: 43rd Annual Meeting of the ACL, pp. 363–370 (2005)
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: SCAN: a structural clustering algorithm for networks. In: KDD 2007, pp. 824–833 (2007)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: The Seventh Int’l Conference on WWW7, pp. 107–117 (1998)
Bonnefoy, L., Bellot, P., Benoit, M.: The Web as a source of evidence for filtering candidate answers to natural language questions. In: 2011 IEEE/WIC/ACM, WI-IAT 2011, vol. 1, pp. 63–66 (2011)
Mirizzi, R., Ragone, A., Di Noia, T., Di Sciascio, E.: Ranking the Linked Data: The Case of DBpedia. In: Benatallah, B., Casati, F., Kappel, G., Rossi, G. (eds.) ICWE 2010. LNCS, vol. 6189, pp. 337–354. Springer, Heidelberg (2010)
Dehmer, M., Streib, F.E., Mehler, A., Kilian, J.: Measuring the Structural Similarity of Web-based Documents: A novel Approach. World Academy of Science, Engineering and Technology (2007)
Elbassuoni, S., Ramanath, M., Schenkel, R., Sydow, M., Weikum, G.: Language-model-based Ranking for Queries on RDF-Graphs. In: CIKM 2009, pp. 977–986 (2009)
Goker, A., McCluskey, T.L.: Towards an Adaptive Information Retrieval System. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1991. LNCS, vol. 542, pp. 348–357. Springer, Heidelberg (1991)
Grishman, R., Sundheim, B.: Message Understanding Conference-6: A Brief History. In: Proceedings of the Int’l Conference on Computational Linguistics (1996)
http://htmlparser.sourceforge.net/javadoc/overview-summary.html
http://incubator.apache.org/opennlp/documentation/manual/opennlp.html
Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: 9th ACMSIAM Symposium on Discrete Algorithms and IBM Research Report RJ 10076 (1998)
Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia Spotlight: Shedding Light on the Web of Documents. In: I-SEMANTICS (2011)
Li, Y., Cunningham, H.: Geometric and Quantum Methods for Information Retrieval. ACM SIGIR Forum 42, 2 (2008)
Mehler, A., Dehmer, M., Gleim, R.: Towards Logical Hypertext Structure: A Graph-Theoretic Perspective. In: Böhme, T., Larios Rosillo, V.M., Unger, H., Unger, H. (eds.) IICS 2004. LNCS, vol. 3473, pp. 136–150. Springer, Heidelberg (2006)
Meij, E., Bron, M., Hollink, L., Huurnink, B., Rijke, M.: Mapping queries to the Linking Open Data cloud: A case study using DBpedia. In: Web Semantics: Science, Services and Agents on the WWW, vol. 9(4), pp. 418–433.
Minno, M., Palmisano, D., Mostarda, M.: Slicing Linked Data by Extracting Significant, Self-describing Subsets: The DBpedia Case. In: Daniel, F., Facca, F.M. (eds.) ICWE 2010. LNCS, vol. 6385, pp. 223–231. Springer, Heidelberg (2010)
Moffat, A., Zobel, J., Hawking, D.: Recommended Reading for IR Research Students. ACM SIGIR Forum 39(2) (2005)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1)
Navarro, E., Sajous, F., Gaume, B., Prévot, L., ShuKai, H., Tzu-Yi, K., Magistry, P., Chu-Ren, H.: Wiktionary and NLP: Improving synonymy networks. In: Workshop on the People Web Meets NLP, ACL-IJCNLP, pp. 19–27 (2009)
Weikem, G., Theobald, M.: From Information to Knowledge: Harvesting Entities and Relationships from Web Sources. ACM SIGMOD PODS 2010 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Martha, V., Wallace, S., Bisgin, H., Xu, X., Agarwal, N., Joshi, H. (2012). NMiner: A System for Finding Related Entities by Mining a Bimodal Network. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-35341-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35340-6
Online ISBN: 978-3-642-35341-3
eBook Packages: Computer ScienceComputer Science (R0)