Skip to main content
Book cover

Data Mining pp 589–617Cite as

Mining Web Data

  • Chapter
  • First Online:
  • 327k Accesses

Abstract

The Web is an unique phenomenon in many ways, in terms of its scale, the distributed and uncoordinated nature of its creation, the openness of the underlying platform, and the resulting diversity of applications it has enabled. Examples of such applications include e-commerce, user collaboration, and social network analysis. Because of the distributed and uncoordinated nature in which the Web is both created and used, it is a rich treasure trove of diverse types of data. This data can be either a source of knowledge about various subjects, or personal information about users.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    A formal mathematical treatment characterizes this in terms of the ergodicity of the underlying Markov chains. In ergodic Markov chains, a necessary requirement is that it is possible to reach any state from any other state using a sequence of one or more transitions. This condition is referred to as strong connectivity. An informal description is provided here to facilitate understanding.

  2. 2.

    In some applications such as bibliographic networks, the edge \((i, j)\) may have a weight denoted by \(w_{ij}\). The transition probability \(p_{ij}\) is defined in such cases by \( \frac {w_{ij}}{ \sum _{j \in Out(i)} w_{ij}}\).

  3. 3.

    An alternative way to achieve this goal is to modify \(G\) by multiplying existing edge transition probabilities by the factor \((1- \alpha )\) and then adding \(\alpha /n\) to the transition probability between each pair of nodes in \(G\). As a result \(G\) will become a directed clique with bidirectional edges between each pair of nodes. Such strongly connected Markov chains have unique steady-state probabilities. The resulting graph can then be treated as a Markov chain without having to separately account for the teleportation component. This model is equivalent to that discussed in the chapter.

  4. 4.

    The left eigenvector \(\overline {X}\) of \(P\) is a row vector satisfying \(\overline {X} P = \lambda \overline {X}\). The right eigenvector \(\overline {Y}\) is a column vector satisfying \(P \overline {Y}= \lambda \overline {Y}\). For asymmetric matrices, the left and right eigenvectors are not the same. However, the eigenvalues are always the same. The unqualified term “eigenvector” refers to the right eigenvector by default.

  5. 5.

    http://www.dmoz.org.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charu C. Aggarwal .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Aggarwal, C. (2015). Mining Web Data. In: Data Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-14142-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14142-8_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14141-1

  • Online ISBN: 978-3-319-14142-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics