Abstract
We present a survey of data mining using web links. The XLink standard provides new possibilities to mine the web but also poses complex new problems. In this paper, we analyze the new challenges posed by the Xlink standard and propose a model to mine XLink information on the web. Our model combines local and global information in a distributed web environment along with a dynamic approach for XLink paths in separated documents.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Anderson, K. M. 2001. Using structural computing to support information integration. Proceedings of the Third Workshop on Structural Computing (Århus, Denmark, Aug) (S. Reich et al., eds), Springer Verlag LNCS vol. 2266, 151–159.
Bharat, K., Chang, B.-W., Henzinger, M. and Ruhl, M. 2001. Who links to whom: mining linkage between web sites. IEEE International Conference on Data Mining ICDM’ 01 (San Jose, Nov).
Bogelt, C. and Kruse, R. 2002. Graphical Models: Methods for Data Analysis and Mining, John Wiley & Sons.
Castillo, E., Gutierrez, J. M. and Hadi, A. S. 1997. Expert Systems and Probabilistic Network Models, Springer Verlag, New York.
Chakrabarti, S., Dom, B. and Indyk, P. 1998. Enhanced hypertext categorization using hyperlinks. Proceedings of SIGMOD-98, ACM International Conference on Management of Data.
Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S. and Tomkins, A. 1999. Mining the link structure of the World Wide Web. IEEE Computer.
Chen, R., Sivakumar, K. and Kargupta., H. 2001. Distributed web mining using bayesian networks from multiple data streams. Proceedings of the 2001 IEEE International Conference on Data Mining (San Jose, CA, Nov).
Dean, J. and Henzinger, M. R. 1999. Finding Related Pages in the World Wide Web. Computer Networks 31 Amsterdam, Netherlands, 1467–1479.
Ghani, R., Slattery, S. and Yang, Y. 2001. Hypertext categorization using hyperlink patterns and meta data. Proceedings of ICML-01, 18th International Conference on Machine Learning.
Gibson, D., Kleinberg, J. and Raghavan, P. 1998. Inferring web communities from link topology. Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (Pittsburgh, PA), 225–234.
Haake, J. 2000. Structural computing in the collaborative work domain? Proceedings of the Second Workshop on Structural Computing (San Antonio, TX, May) (S. Reich, K. Anderson., eds), Springer Verlag LNCS vol. 1903. 108–119.
Hand, J., Mannila, H. and Smyth, P. 2001. Principles of Data Mining, MIT Press.
Hsu, J. Y-J. and Yih, W.-T. 1997. Template-Based Information Mining from HTML Documents, AAAI/IAAI, 256–262.
Kleinberg, J. M., Kumar, R., Raghavan, P., Rajagopalan, S. and Tomkins, A. S. 1999. The Web as a graph: measurements, models and methods Lecture Notes in Computer Science, Vol. 1627.
Lazarou, V. S. and Clark, K. L. 1998. Agents for hypermedia information discovery. Lecture Notes in Computer Science, Vol. 1435.
Moh, C-H., Lim, E-P. and Ng, W-K. 2000. DTD-Miner: A Tool for Mining DTD from XML Documents. 2nd IEEE Workshop on Advanced Issues of e-Commerce and Web-based Information Systems (Milpitas, CA).
Ng, A. Y., Zheng, A. X. and Jordan, M. I. 2001. Stable algorithms for link analysis. Proc. 24th Annual Intl. ACM SIGIR Conference, ACM.
Nürnberg, P. J., Leggett, J. J., and Schneider, E. R. 1997. As we should have thought. Proceedings of the 1997 ACM Hypertext Conference (Southampton, UK, Apr), ACM Press, 96–101.
Nürnberg, P. J., Schneider, E. R., and Leggett, J. J. 1996. Designing digital libraries for the post-literate age. Journal of Universal Computer Science 2(9) (Sep).
Pearl, A. 1989. Sun’s Link Service: a protocol for open linking. Proceedings of the 1989 ACM Conference on Hypertext (Pittsburgh, PA, Nov), ACM Press, 137–146.
Punin, J., Krishnamoorthy, M., Zaki, M. J. 2001. Web usage mining: Languages and algorithms. Studies in Classification, Data Analysis, and Knowledge Organization, Springer-Verlag.
Punin, J. and Krishnamoorthy, M. 2001. Digital library portal using semantic tools in WWPal. Semantic Web Working Symposium (San Francisco, CA).
Sarukkai, R. R 2000. Link prediction and path analysis using Markov chains. Proceedings of the Ninth International World Wide Web Conference, Amsterdam.
Segal, E., Getoor, L., Taskar B. and Koller, D. 2001. Probabilistic models of text and link structure for hypertext classification. IJCAI Workshop on “Text Learning: Beyond Supervision” (Seattle, WA, Aug).
Wiil, U., Nürnberg, P., Hicks, D. and Reich, S. 2000. A development environment for building component-based open hypermedia systems. Proceedings of the 2000 ACM Hypertext Conference (San Antonio, TX, May), ACM Pess.
Ypma, A. and Heskes, T. 2002. Categorization of web pages and user clustering with mixture of hidden markov models. WEBKDD 2002, (Canada).
XML Linking Language (XLink) Version 1.0. http://www.w3.org/TR/xlink/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arotaritei, D., Nürnberg, P.J. (2003). Data Mining Using Links in Open Hypermedia. In: Nürnberg, P.J. (eds) Metainformatics. MIS 2002. Lecture Notes in Computer Science, vol 2641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44872-1_15
Download citation
DOI: https://doi.org/10.1007/3-540-44872-1_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40218-3
Online ISBN: 978-3-540-44872-3
eBook Packages: Springer Book Archive