NMiner: A System for Finding Related Entities by Mining a Bimodal Network

Martha, VenkataSwamy; Wallace, Stephen; Bisgin, Halil; Xu, Xiaowei; Agarwal, Nitin; Joshi, Hemant

doi:10.1007/978-3-642-35341-3_30

VenkataSwamy Martha²¹,
Stephen Wallace²¹,
Halil Bisgin²¹,
Xiaowei Xu²¹,
Nitin Agarwal²¹ &
…
Hemant Joshi²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7675))

Included in the following conference series:

Asia Information Retrieval Symposium

1207 Accesses

Abstract

Motivated from related entity finding problem, in this paper, we introduce a novel approach to query answering called “NMiner.” NMiner takes advantage of heuristics to find answers to complex semantic queries. It uses a combination of natural language processing techniques to parse sentences and extract entities, hypertext structure of the documents to derive relational information, and semantic web data to extract relevant entities as search result candidates. Further, a bimodal network of sentences and entities is created from the search result candidates. Content Centric Ranking (CCR) and Cumulative Structural Similarity (CSS), are proposed to rank the candidate entities. Our empirical study on the ClueWeb09 corpus (with approximately 25 terabytes of web documents) shows that both CSS and CCR outperform PageRank and HITS. Moreover, NMiner proved to be significant in solving the problem of answering complex queries performed against a largely unstructured corpus of text documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

NIST Special Publication 500-207: The 1st Text REtrieval Conference (TREC-1), http://trec.nist.gov/pubs/trec1/t1_proceedings.html
Chakrabarti, S.: Mining the Web. Discovering Knowledge from Hypertext Data. Morgen and Kaufmann Publishers (2003)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: 43rd Annual Meeting of the ACL, pp. 363–370 (2005)
Google Scholar
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: SCAN: a structural clustering algorithm for networks. In: KDD 2007, pp. 824–833 (2007)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: The Seventh Int’l Conference on WWW7, pp. 107–117 (1998)
Google Scholar
Bonnefoy, L., Bellot, P., Benoit, M.: The Web as a source of evidence for filtering candidate answers to natural language questions. In: 2011 IEEE/WIC/ACM, WI-IAT 2011, vol. 1, pp. 63–66 (2011)
Google Scholar
Mirizzi, R., Ragone, A., Di Noia, T., Di Sciascio, E.: Ranking the Linked Data: The Case of DBpedia. In: Benatallah, B., Casati, F., Kappel, G., Rossi, G. (eds.) ICWE 2010. LNCS, vol. 6189, pp. 337–354. Springer, Heidelberg (2010)
Chapter Google Scholar
Dehmer, M., Streib, F.E., Mehler, A., Kilian, J.: Measuring the Structural Similarity of Web-based Documents: A novel Approach. World Academy of Science, Engineering and Technology (2007)
Google Scholar
Elbassuoni, S., Ramanath, M., Schenkel, R., Sydow, M., Weikum, G.: Language-model-based Ranking for Queries on RDF-Graphs. In: CIKM 2009, pp. 977–986 (2009)
Google Scholar
Goker, A., McCluskey, T.L.: Towards an Adaptive Information Retrieval System. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1991. LNCS, vol. 542, pp. 348–357. Springer, Heidelberg (1991)
Chapter Google Scholar
Grishman, R., Sundheim, B.: Message Understanding Conference-6: A Brief History. In: Proceedings of the Int’l Conference on Computational Linguistics (1996)
Google Scholar
http://boston.lti.cs.cmu.edu/Data/clueweb09/
http://htmlparser.sourceforge.net/javadoc/overview-summary.html
http://incubator.apache.org/opennlp/documentation/manual/opennlp.html
http://wiki.DBpedia.org/Downloads37
http://www.lemurproject.org/
Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: 9th ACMSIAM Symposium on Discrete Algorithms and IBM Research Report RJ 10076 (1998)
Google Scholar
Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia Spotlight: Shedding Light on the Web of Documents. In: I-SEMANTICS (2011)
Google Scholar
Li, Y., Cunningham, H.: Geometric and Quantum Methods for Information Retrieval. ACM SIGIR Forum 42, 2 (2008)
Article Google Scholar
Mehler, A., Dehmer, M., Gleim, R.: Towards Logical Hypertext Structure: A Graph-Theoretic Perspective. In: Böhme, T., Larios Rosillo, V.M., Unger, H., Unger, H. (eds.) IICS 2004. LNCS, vol. 3473, pp. 136–150. Springer, Heidelberg (2006)
Chapter Google Scholar
Meij, E., Bron, M., Hollink, L., Huurnink, B., Rijke, M.: Mapping queries to the Linking Open Data cloud: A case study using DBpedia. In: Web Semantics: Science, Services and Agents on the WWW, vol. 9(4), pp. 418–433.
Google Scholar
Minno, M., Palmisano, D., Mostarda, M.: Slicing Linked Data by Extracting Significant, Self-describing Subsets: The DBpedia Case. In: Daniel, F., Facca, F.M. (eds.) ICWE 2010. LNCS, vol. 6385, pp. 223–231. Springer, Heidelberg (2010)
Chapter Google Scholar
Moffat, A., Zobel, J., Hawking, D.: Recommended Reading for IR Research Students. ACM SIGIR Forum 39(2) (2005)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1)
Google Scholar
Navarro, E., Sajous, F., Gaume, B., Prévot, L., ShuKai, H., Tzu-Yi, K., Magistry, P., Chu-Ren, H.: Wiktionary and NLP: Improving synonymy networks. In: Workshop on the People Web Meets NLP, ACL-IJCNLP, pp. 19–27 (2009)
Google Scholar
Weikem, G., Theobald, M.: From Information to Knowledge: Harvesting Entities and Relationships from Web Sources. ACM SIGMOD PODS 2010 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Arkansas at Little Rock, Little Rock, AR, USA
VenkataSwamy Martha, Stephen Wallace, Halil Bisgin, Xiaowei Xu & Nitin Agarwal
DataMinr Inc., NY, USA
Hemant Joshi

Authors

VenkataSwamy Martha
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Wallace
View author publications
You can also search for this author in PubMed Google Scholar
Halil Bisgin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Nitin Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Hemant Joshi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of computer Science and Technology, Tianjin University, Tianjin, 300072, China
Yuexian Hou
DIRO, University of Montreal, CP. 6128, succursale Centre-ville, H3C 3J7, Montreal, QC, Canada
Jian-Yun Nie
Institute of Software, Storage & Information Retrieval Laboratory, Chinese Academy of Sciences, 100190, Beijing, China
Le Sun
School of Computer Science and Technology, Tianjin University, 300072, Tianjin, China
Bo Wang
School of Computing, Robert Gordon University, St Andrew Street, AB25 1HG, Aberdeen, UK
Peng Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martha, V., Wallace, S., Bisgin, H., Xu, X., Agarwal, N., Joshi, H. (2012). NMiner: A System for Finding Related Entities by Mining a Bimodal Network. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-35341-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35340-6
Online ISBN: 978-3-642-35341-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics