Skip to main content

PKU at INEX 2010 XML Mining Track

  • Conference paper
  • 401 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6932))

Abstract

This paper presents our participation in the INEX 2010 XML Mining track. Our classification and clustering solutions for XML documents have used both the structure and content information, where the frequent subtrees as structural units are used for content extraction from the XML document. In addition, we used the WordNet and the link information for better performance, and applied the structured link vector model for classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yang, J., Chen, X.: A semi_structured document model for text mining. Journal of Computer Science and Technology 17(5), 603–610 (2002)

    Article  MATH  Google Scholar 

  2. Salton, G., McGill, M.J.: Introduction to Modern information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  3. Robertson, S.E., Spark Jones, K.: Relevance weighting of search terms. JASIST 27(3), 129–146 (1976)

    Article  Google Scholar 

  4. Chi, Y., Nijssen, S., Muntz, R.R., Kok, J.N.: Frequent Subtree Mining – An Overview. Fundamenta Information (2005)

    Google Scholar 

  5. Chi, Y., Yang, Y., Xia, Y., Muntz, R.R.: CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. In: The Eighth Pacific Asia Conference on Knowledge Discover and Data Mining (2004)

    Google Scholar 

  6. Xie, W., Manmadov, M., Yearwood, J.: Using Links to Aid Web Classification. In: ICIS 2007 (2007) 0-7695-2841-4/07

    Google Scholar 

  7. Yang, J., Zhang, F.: XML Document Classification using Extended VSM. In: Pre-Proceedings of the Sixth Workshop of Initiative for the Evaluation of XML Retrieval, Dagstuhl, Germany (2007)

    Google Scholar 

  8. Berry, M.: Survey of Text Mining: Clustering, Classification, and Retrieval. Springer, Heidelberg (2003)

    Google Scholar 

  9. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  10. Yang, J., Wang, S.: Extended VSM for XML document classification using frequent subtrees. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009. LNCS, vol. 6203, pp. 408–415. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Shin, K., Han, S.Y.: Fast clustering algorithm for information organization. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 619–622. Springer, Heidelberg (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, S., Liang, F., Yang, J. (2011). PKU at INEX 2010 XML Mining Track. In: Geva, S., Kamps, J., Schenkel, R., Trotman, A. (eds) Comparative Evaluation of Focused Retrieval. INEX 2010. Lecture Notes in Computer Science, vol 6932. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23577-1_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23577-1_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23576-4

  • Online ISBN: 978-3-642-23577-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics