Skip to main content

A Pattern-Based Voting Approach for Concept Discovery on the Web

  • Conference paper
Web Technologies Research and Development - APWeb 2005 (APWeb 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3399))

Included in the following conference series:

Abstract

Automatically discovering concepts is not only a fundamental task in knowledge capturing and ontology engineering processes, but also a key step of many applications in information retrieval. For such a task, pattern-based approaches and statistics-based approaches are widely used, between which the former ones eventually turned out to be more precise. However, the effective patterns in such approaches are usually defined manually. It involves much time and human labor, and considers only a limited set of effective patterns. In our research, we accomplish automatically obtaining patterns through frequent sequence mining. A voting approach is then presented that can determine whether a sentence contains a concept and accurately identify it. Our algorithm includes three steps: pattern mining, pattern refining and concept discovery. In our experimental study, we use several traditional measures, precision, recall and F1 value, to evaluate the performance of our approach. The experimental results not only verify the validity of the approach, but also illustrate the relationship between performance and the parameters of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. LexiQuest Product White Paper, http://www.lexiquest.fr/products/LexiQuestGuideWhitePaper.pdf

  2. Sakurai, S., Suyama, A.: Rule discovery from textual data based on key phrase patterns. In: Proceedings of the 2004 ACM symposium on Applied computing, pp. 606–612 (2004)

    Google Scholar 

  3. Liu, B., Chin, C.W., Ng, H.T.: Mining Topic-specific Concepts and Definitions on the Web. In: WWW 2003, pp. 251–260 (2003)

    Google Scholar 

  4. Woods, W.: Conceptual indexing: A better way to organize knowledge. Technical Report, Sun Microsystems Laboratories (April 1997)

    Google Scholar 

  5. Loh, S., Wives, L.K., de Oliveira, J.P.M.: Concept-Based Knowledge Discovery in Texts Extracted from the Web. SIGKDD Explorations 2(1), 29–39 (2000)

    Article  Google Scholar 

  6. Bennett, N.A., He, Q., Powell, K., Schatz, B.R.: Extracting noun phrases for all of MEDLINE. In: Proc. American Medical Informatics Assoc. (1999)

    Google Scholar 

  7. Klavans, J., Muresan, S.: DEFINDER: Rule-based Methods for the Extraction of Medical Terminology and their Associated Definitions from On-line Text. In: Proc. AMIA, CA, pp. 201–202 (2000)

    Google Scholar 

  8. Turney, P.: Learning to extract keyphrases from text. Technical Report, National Research Council, Institute for Information Technology (1999)

    Google Scholar 

  9. Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-Specific Keyphrase Extraction. In: IJCAI, pp. 668–673 (1999)

    Google Scholar 

  10. Haav, H.-M., Lubi, T.-L.: A Survey of Concept-based Information Retrieval Tools on the Web. In: Caplinkas, A., Eder, J. (eds.) Advances in Databases and Information Systems, Proc. of 5th East-European Conference ADBIS*2001, vol. 2, pp. 29–41 (2001)

    Google Scholar 

  11. Brin, S., Page, L.: The Anatomy of a Large-scale Hypertextual Web Search Engine. In: WWW7 (1998)

    Google Scholar 

  12. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Eleventh International Conference on Data Engineering, Taiwan, pp. 3–14 (1995)

    Google Scholar 

  13. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix Projected Pattern Growth. In: Proc. of the 17th Int. Conf. on Data Eng., pp. 215–226 (2001)

    Google Scholar 

  14. Cooper, R.J., Ruger, S.M.: A simple question answering system. In: Proceedings of TREC, vol. 9 (2000)

    Google Scholar 

  15. Harabagiu, S., Moldovan, D., Pasca, M., Mihalcea, R., Surdeanu, M., Bunescu, R., Girju, R., Rus, V., Morarescu, P.: FALCON: Boosting knowledge for answer engines. In: Proceedings of TREC, vol. 9 (2000)

    Google Scholar 

  16. Lu, F., Johnsten, T.D., Raghavan, V.V., Traylor, D.: Enhancing internet search engines to achieve concept-based retrieval. In: InForum 1999, Oakridge (May 1999)

    Google Scholar 

  17. Qiu, Y., Frei, H.-P.: Concept-based query expansion. In: Proceedings of SIGIR-93, 16th ACM International Conference on Research and Development in Information Retrieval, pp. 160–169 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, J., Zhang, Z., Li, Q., Li, X. (2005). A Pattern-Based Voting Approach for Concept Discovery on the Web. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds) Web Technologies Research and Development - APWeb 2005. APWeb 2005. Lecture Notes in Computer Science, vol 3399. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31849-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31849-1_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25207-8

  • Online ISBN: 978-3-540-31849-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics