Mining Web Data

Aggarwal, Charu C.

doi:10.1007/978-3-319-14142-8_18

Mining Web Data

Charu C. Aggarwal²

Chapter
First Online: 01 January 2015

327k Accesses

Abstract

The Web is an unique phenomenon in many ways, in terms of its scale, the distributed and uncoordinated nature of its creation, the openness of the underlying platform, and the resulting diversity of applications it has enabled. Examples of such applications include e-commerce, user collaboration, and social network analysis. Because of the distributed and uncoordinated nature in which the Web is both created and used, it is a rich treasure trove of diverse types of data. This data can be either a source of knowledge about various subjects, or personal information about users.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
A formal mathematical treatment characterizes this in terms of the ergodicity of the underlying Markov chains. In ergodic Markov chains, a necessary requirement is that it is possible to reach any state from any other state using a sequence of one or more transitions. This condition is referred to as strong connectivity. An informal description is provided here to facilitate understanding.
2.
In some applications such as bibliographic networks, the edge \((i, j)\) may have a weight denoted by \(w_{ij}\). The transition probability \(p_{ij}\) is defined in such cases by \( \frac {w_{ij}}{ \sum _{j \in Out(i)} w_{ij}}\).
3.
An alternative way to achieve this goal is to modify \(G\) by multiplying existing edge transition probabilities by the factor \((1- \alpha )\) and then adding \(\alpha /n\) to the transition probability between each pair of nodes in \(G\). As a result \(G\) will become a directed clique with bidirectional edges between each pair of nodes. Such strongly connected Markov chains have unique steady-state probabilities. The resulting graph can then be treated as a Markov chain without having to separately account for the teleportation component. This model is equivalent to that discussed in the chapter.
4.
The left eigenvector \(\overline {X}\) of \(P\) is a row vector satisfying \(\overline {X} P = \lambda \overline {X}\). The right eigenvector \(\overline {Y}\) is a column vector satisfying \(P \overline {Y}= \lambda \overline {Y}\). For asymmetric matrices, the left and right eigenvectors are not the same. However, the eigenvalues are always the same. The unqualified term “eigenvector” refers to the right eigenvector by default.
5.
http://www.dmoz.org.

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Charu C. Aggarwal

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charu C. Aggarwal .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C. (2015). Mining Web Data. In: Data Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-14142-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-14142-8_18
Published: 14 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14141-1
Online ISBN: 978-3-319-14142-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics