Skip to main content

NMiner: A System for Finding Related Entities by Mining a Bimodal Network

  • Conference paper
Information Retrieval Technology (AIRS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7675))

Included in the following conference series:

  • 1207 Accesses

Abstract

Motivated from related entity finding problem, in this paper, we introduce a novel approach to query answering called “NMiner.” NMiner takes advantage of heuristics to find answers to complex semantic queries. It uses a combination of natural language processing techniques to parse sentences and extract entities, hypertext structure of the documents to derive relational information, and semantic web data to extract relevant entities as search result candidates. Further, a bimodal network of sentences and entities is created from the search result candidates. Content Centric Ranking (CCR) and Cumulative Structural Similarity (CSS), are proposed to rank the candidate entities. Our empirical study on the ClueWeb09 corpus (with approximately 25 terabytes of web documents) shows that both CSS and CCR outperform PageRank and HITS. Moreover, NMiner proved to be significant in solving the problem of answering complex queries performed against a largely unstructured corpus of text documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. NIST Special Publication 500-207: The 1st Text REtrieval Conference (TREC-1), http://trec.nist.gov/pubs/trec1/t1_proceedings.html

  2. Chakrabarti, S.: Mining the Web. Discovering Knowledge from Hypertext Data. Morgen and Kaufmann Publishers (2003)

    Google Scholar 

  3. Finkel, J.R., Grenager, T., Manning, C.: Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: 43rd Annual Meeting of the ACL, pp. 363–370 (2005)

    Google Scholar 

  4. Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: SCAN: a structural clustering algorithm for networks. In: KDD 2007, pp. 824–833 (2007)

    Google Scholar 

  5. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: The Seventh Int’l Conference on WWW7, pp. 107–117 (1998)

    Google Scholar 

  6. Bonnefoy, L., Bellot, P., Benoit, M.: The Web as a source of evidence for filtering candidate answers to natural language questions. In: 2011 IEEE/WIC/ACM, WI-IAT 2011, vol. 1, pp. 63–66 (2011)

    Google Scholar 

  7. Mirizzi, R., Ragone, A., Di Noia, T., Di Sciascio, E.: Ranking the Linked Data: The Case of DBpedia. In: Benatallah, B., Casati, F., Kappel, G., Rossi, G. (eds.) ICWE 2010. LNCS, vol. 6189, pp. 337–354. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Dehmer, M., Streib, F.E., Mehler, A., Kilian, J.: Measuring the Structural Similarity of Web-based Documents: A novel Approach. World Academy of Science, Engineering and Technology (2007)

    Google Scholar 

  9. Elbassuoni, S., Ramanath, M., Schenkel, R., Sydow, M., Weikum, G.: Language-model-based Ranking for Queries on RDF-Graphs. In: CIKM 2009, pp. 977–986 (2009)

    Google Scholar 

  10. Goker, A., McCluskey, T.L.: Towards an Adaptive Information Retrieval System. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1991. LNCS, vol. 542, pp. 348–357. Springer, Heidelberg (1991)

    Chapter  Google Scholar 

  11. Grishman, R., Sundheim, B.: Message Understanding Conference-6: A Brief History. In: Proceedings of the Int’l Conference on Computational Linguistics (1996)

    Google Scholar 

  12. http://boston.lti.cs.cmu.edu/Data/clueweb09/

  13. http://htmlparser.sourceforge.net/javadoc/overview-summary.html

  14. http://incubator.apache.org/opennlp/documentation/manual/opennlp.html

  15. http://wiki.DBpedia.org/Downloads37

  16. http://www.lemurproject.org/

  17. Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: 9th ACMSIAM Symposium on Discrete Algorithms and IBM Research Report RJ 10076 (1998)

    Google Scholar 

  18. Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia Spotlight: Shedding Light on the Web of Documents. In: I-SEMANTICS (2011)

    Google Scholar 

  19. Li, Y., Cunningham, H.: Geometric and Quantum Methods for Information Retrieval. ACM SIGIR Forum 42, 2 (2008)

    Article  Google Scholar 

  20. Mehler, A., Dehmer, M., Gleim, R.: Towards Logical Hypertext Structure: A Graph-Theoretic Perspective. In: Böhme, T., Larios Rosillo, V.M., Unger, H., Unger, H. (eds.) IICS 2004. LNCS, vol. 3473, pp. 136–150. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Meij, E., Bron, M., Hollink, L., Huurnink, B., Rijke, M.: Mapping queries to the Linking Open Data cloud: A case study using DBpedia. In: Web Semantics: Science, Services and Agents on the WWW, vol. 9(4), pp. 418–433.

    Google Scholar 

  22. Minno, M., Palmisano, D., Mostarda, M.: Slicing Linked Data by Extracting Significant, Self-describing Subsets: The DBpedia Case. In: Daniel, F., Facca, F.M. (eds.) ICWE 2010. LNCS, vol. 6385, pp. 223–231. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  23. Moffat, A., Zobel, J., Hawking, D.: Recommended Reading for IR Research Students. ACM SIGIR Forum 39(2) (2005)

    Google Scholar 

  24. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1)

    Google Scholar 

  25. Navarro, E., Sajous, F., Gaume, B., Prévot, L., ShuKai, H., Tzu-Yi, K., Magistry, P., Chu-Ren, H.: Wiktionary and NLP: Improving synonymy networks. In: Workshop on the People Web Meets NLP, ACL-IJCNLP, pp. 19–27 (2009)

    Google Scholar 

  26. Weikem, G., Theobald, M.: From Information to Knowledge: Harvesting Entities and Relationships from Web Sources. ACM SIGMOD PODS 2010 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Martha, V., Wallace, S., Bisgin, H., Xu, X., Agarwal, N., Joshi, H. (2012). NMiner: A System for Finding Related Entities by Mining a Bimodal Network. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35341-3_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35340-6

  • Online ISBN: 978-3-642-35341-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics