Abstract
The Web is an unique phenomenon in many ways, in terms of its scale, the distributed and uncoordinated nature of its creation, the openness of the underlying platform, and the resulting diversity of applications it has enabled. Examples of such applications include e-commerce, user collaboration, and social network analysis. Because of the distributed and uncoordinated nature in which the Web is both created and used, it is a rich treasure trove of diverse types of data. This data can be either a source of knowledge about various subjects, or personal information about users.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
A formal mathematical treatment characterizes this in terms of the ergodicity of the underlying Markov chains. In ergodic Markov chains, a necessary requirement is that it is possible to reach any state from any other state using a sequence of one or more transitions. This condition is referred to as strong connectivity. An informal description is provided here to facilitate understanding.
- 2.
In some applications such as bibliographic networks, the edge \((i, j)\) may have a weight denoted by \(w_{ij}\). The transition probability \(p_{ij}\) is defined in such cases by \( \frac {w_{ij}}{ \sum _{j \in Out(i)} w_{ij}}\).
- 3.
An alternative way to achieve this goal is to modify \(G\) by multiplying existing edge transition probabilities by the factor \((1- \alpha )\) and then adding \(\alpha /n\) to the transition probability between each pair of nodes in \(G\). As a result \(G\) will become a directed clique with bidirectional edges between each pair of nodes. Such strongly connected Markov chains have unique steady-state probabilities. The resulting graph can then be treated as a Markov chain without having to separately account for the teleportation component. This model is equivalent to that discussed in the chapter.
- 4.
The left eigenvector \(\overline {X}\) of \(P\) is a row vector satisfying \(\overline {X} P = \lambda \overline {X}\). The right eigenvector \(\overline {Y}\) is a column vector satisfying \(P \overline {Y}= \lambda \overline {Y}\). For asymmetric matrices, the left and right eigenvectors are not the same. However, the eigenvalues are always the same. The unqualified term “eigenvector” refers to the right eigenvector by default.
- 5.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Aggarwal, C. (2015). Mining Web Data. In: Data Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-14142-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-14142-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14141-1
Online ISBN: 978-3-319-14142-8
eBook Packages: Computer ScienceComputer Science (R0)