Skip to main content

Data Mining Using Links in Open Hypermedia

  • Conference paper
  • First Online:
  • 172 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2641))

Abstract

We present a survey of data mining using web links. The XLink standard provides new possibilities to mine the web but also poses complex new problems. In this paper, we analyze the new challenges posed by the Xlink standard and propose a model to mine XLink information on the web. Our model combines local and global information in a distributed web environment along with a dynamic approach for XLink paths in separated documents.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, K. M. 2001. Using structural computing to support information integration. Proceedings of the Third Workshop on Structural Computing (Århus, Denmark, Aug) (S. Reich et al., eds), Springer Verlag LNCS vol. 2266, 151–159.

    Google Scholar 

  2. Bharat, K., Chang, B.-W., Henzinger, M. and Ruhl, M. 2001. Who links to whom: mining linkage between web sites. IEEE International Conference on Data Mining ICDM’ 01 (San Jose, Nov).

    Google Scholar 

  3. Bogelt, C. and Kruse, R. 2002. Graphical Models: Methods for Data Analysis and Mining, John Wiley & Sons.

    Google Scholar 

  4. Castillo, E., Gutierrez, J. M. and Hadi, A. S. 1997. Expert Systems and Probabilistic Network Models, Springer Verlag, New York.

    Google Scholar 

  5. Chakrabarti, S., Dom, B. and Indyk, P. 1998. Enhanced hypertext categorization using hyperlinks. Proceedings of SIGMOD-98, ACM International Conference on Management of Data.

    Google Scholar 

  6. Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S. and Tomkins, A. 1999. Mining the link structure of the World Wide Web. IEEE Computer.

    Google Scholar 

  7. Chen, R., Sivakumar, K. and Kargupta., H. 2001. Distributed web mining using bayesian networks from multiple data streams. Proceedings of the 2001 IEEE International Conference on Data Mining (San Jose, CA, Nov).

    Google Scholar 

  8. Dean, J. and Henzinger, M. R. 1999. Finding Related Pages in the World Wide Web. Computer Networks 31 Amsterdam, Netherlands, 1467–1479.

    Article  Google Scholar 

  9. Ghani, R., Slattery, S. and Yang, Y. 2001. Hypertext categorization using hyperlink patterns and meta data. Proceedings of ICML-01, 18th International Conference on Machine Learning.

    Google Scholar 

  10. Gibson, D., Kleinberg, J. and Raghavan, P. 1998. Inferring web communities from link topology. Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (Pittsburgh, PA), 225–234.

    Google Scholar 

  11. Haake, J. 2000. Structural computing in the collaborative work domain? Proceedings of the Second Workshop on Structural Computing (San Antonio, TX, May) (S. Reich, K. Anderson., eds), Springer Verlag LNCS vol. 1903. 108–119.

    Google Scholar 

  12. Hand, J., Mannila, H. and Smyth, P. 2001. Principles of Data Mining, MIT Press.

    Google Scholar 

  13. Hsu, J. Y-J. and Yih, W.-T. 1997. Template-Based Information Mining from HTML Documents, AAAI/IAAI, 256–262.

    Google Scholar 

  14. Kleinberg, J. M., Kumar, R., Raghavan, P., Rajagopalan, S. and Tomkins, A. S. 1999. The Web as a graph: measurements, models and methods Lecture Notes in Computer Science, Vol. 1627.

    Google Scholar 

  15. Lazarou, V. S. and Clark, K. L. 1998. Agents for hypermedia information discovery. Lecture Notes in Computer Science, Vol. 1435.

    Book  Google Scholar 

  16. Moh, C-H., Lim, E-P. and Ng, W-K. 2000. DTD-Miner: A Tool for Mining DTD from XML Documents. 2nd IEEE Workshop on Advanced Issues of e-Commerce and Web-based Information Systems (Milpitas, CA).

    Google Scholar 

  17. Ng, A. Y., Zheng, A. X. and Jordan, M. I. 2001. Stable algorithms for link analysis. Proc. 24th Annual Intl. ACM SIGIR Conference, ACM.

    Google Scholar 

  18. Nürnberg, P. J., Leggett, J. J., and Schneider, E. R. 1997. As we should have thought. Proceedings of the 1997 ACM Hypertext Conference (Southampton, UK, Apr), ACM Press, 96–101.

    Google Scholar 

  19. Nürnberg, P. J., Schneider, E. R., and Leggett, J. J. 1996. Designing digital libraries for the post-literate age. Journal of Universal Computer Science 2(9) (Sep).

    Google Scholar 

  20. Pearl, A. 1989. Sun’s Link Service: a protocol for open linking. Proceedings of the 1989 ACM Conference on Hypertext (Pittsburgh, PA, Nov), ACM Press, 137–146.

    Google Scholar 

  21. Punin, J., Krishnamoorthy, M., Zaki, M. J. 2001. Web usage mining: Languages and algorithms. Studies in Classification, Data Analysis, and Knowledge Organization, Springer-Verlag.

    Google Scholar 

  22. Punin, J. and Krishnamoorthy, M. 2001. Digital library portal using semantic tools in WWPal. Semantic Web Working Symposium (San Francisco, CA).

    Google Scholar 

  23. Sarukkai, R. R 2000. Link prediction and path analysis using Markov chains. Proceedings of the Ninth International World Wide Web Conference, Amsterdam.

    Google Scholar 

  24. Segal, E., Getoor, L., Taskar B. and Koller, D. 2001. Probabilistic models of text and link structure for hypertext classification. IJCAI Workshop on “Text Learning: Beyond Supervision” (Seattle, WA, Aug).

    Google Scholar 

  25. Wiil, U., Nürnberg, P., Hicks, D. and Reich, S. 2000. A development environment for building component-based open hypermedia systems. Proceedings of the 2000 ACM Hypertext Conference (San Antonio, TX, May), ACM Pess.

    Google Scholar 

  26. Ypma, A. and Heskes, T. 2002. Categorization of web pages and user clustering with mixture of hidden markov models. WEBKDD 2002, (Canada).

    Google Scholar 

  27. XML Linking Language (XLink) Version 1.0. http://www.w3.org/TR/xlink/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Arotaritei, D., Nürnberg, P.J. (2003). Data Mining Using Links in Open Hypermedia. In: Nürnberg, P.J. (eds) Metainformatics. MIS 2002. Lecture Notes in Computer Science, vol 2641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44872-1_15

Download citation

  • DOI: https://doi.org/10.1007/3-540-44872-1_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40218-3

  • Online ISBN: 978-3-540-44872-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics