Extracting Local Web Communities Using Lexical Similarity

Zhang, Xianchao; Xu, Wen; Liang, Wenxin

doi:10.1007/978-3-642-14589-6_33

Xianchao Zhang²²,
Wen Xu²² &
Wenxin Liang²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6193))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

706 Accesses
2 Citations

Abstract

The World Wide Web contains rich textual contents that are interconnected via complex hyperlinks. Most studies on web community extraction only focus on graph structures. Consequently, web communities are discovered purely in terms of explicit link information without considering textual properties of web pages. This paper proposes an improved algorithm based on Flake’s method using the maximum flow algorithm. The improved algorithm considers the differences between edges in terms of importance, and assigns a well-designed capacity to each edge via the lexical similarity of web pages. Given a specific query, it also lends itself to a new and efficient ranking scheme for members in the extracted community. The experimental results indicate that our approach efficiently handles a variety of data sets across a novel optimization strategy of similarity computation.

This work was partially supported by NSFC under grant No. 60873180, and by the start-up funding (#1600-893313) for newly appointed academic staff of Dalian University of Technology, China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andeson, R., Lang, K.J.: Community from seed sets. In: 15th International Conference on WWW, New York, USA, pp. 223–232 (2006)
Google Scholar
Angelova, R., Weikum, G.: Graph-based Text classification: learn from your neighbors. In: 29th ACM Conference on Research and Development in Information Retrieval, Seattle, Washington, pp. 485–492 (2006)
Google Scholar
Asano, Y., Nishizeki, T., Toyoda, M., Kitsuregaw, A.M.: Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework. IEICE Trans. on Information and Systems (2006)
Google Scholar
DeRose, P., Shen, W., Chen, F.: Building Structured Web Community Portals: A Top-down, Compositional, and Incremental Approach. In: 33rd International Conference on VLDB, Vienna, Austria, pp. 399–410 (2007)
Google Scholar
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient Identification of Web Communities. In: sixth ACM International Conference on KDD, pp. 150–160. ACM Press, Boston (2000)
Google Scholar
Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.M.: Self-Organization and Identification of Web Communities. Computer (2002)
Google Scholar
Ford, L.R., Fulkson, D.R.: Maximal Flow through A Network. Canadian Journal of Mathematics 8, 399–404 (1956)
MATH Google Scholar
Girven, M., Newman, M.E.J.: Community Structure in Social and Biological Networks. Proc. Nati. Acad. 99, 7821–7826 (2002)
Article Google Scholar
Imafuji, N., Kitsuregawa, M.: Finding Web Communities by Maximum Flow Algorithm Using Well Desinged Edge Capacities. IEICE Trans. on Information and Systems (2004)
Google Scholar
Kernighan, B.W., Lin, S.: Tech. J. 49, 291 (1970)
Google Scholar
Lee, H.C., Borodin, A., Goldsmith, L.: Extracting and Ranking Viral Communities Using Seed and Content Similarity. In: 19th ACM Conference on Hypertext, Pittsburgh, PA, pp. 139–148 (2008)
Google Scholar
Pothen, A., Simon, H., Liou, K.P.: Matrix Anal. Appl. 11, 430 (1990)
MATH MathSciNet Google Scholar
Scott, J.: Social Network Analysis: A Handbook, 2nd edn. Sage, London (2000)
Google Scholar
Strehl, A.: Relationship-based Clustering and Cluster Ensembles for High-Dimensional Data Mining. Phd thesis, Univ. of Texas at Austin (2002)
Google Scholar
Voorhees, E.M.: Using WordNet to disambiguate word senses for text retrieval. In: 16th ACM Conference on Research and Development in Information Retrieval, New York, USA, pp. 171–180 (1993)
Google Scholar
Xu, G., Ma, W.Y.: Building Implicit Links From Content For Forum Search. In: 29th ACM Conference on Research and Development in IR, Seattle, Washington, pp. 300–307 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Dalian University of Technology, China
Xianchao Zhang, Wen Xu & Wenxin Liang

Authors

Xianchao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Liang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida Honmachi, Sakyo, 606-8501, Kyoto, Japan
Masatoshi Yoshikawa
Information School, Renmin University of China, 100872, Beijing, China
Xiaofeng Meng
Graduate School of Engineering, University of Hyogo, 2167 Shosha, Himeji, 671-2280, Hyogo, Japan
Takayuki Yumoto
Graduate School of Informatics, Kyoto University, Yoshidahonmachi, Sakyo, 606-8501, Kyoto, Japan
Qiang Ma
Institute of HCI and Media Integration, Tsinghua University, 100084, Bejing, China
Lifeng Sun
Department of Information Science, Ochanomizu University, 2-1-1, Otsuka, Bunkyo-ku, 112-8610, Tokyo, Japan
Chiemi Watanabe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Xu, W., Liang, W. (2010). Extracting Local Web Communities Using Lexical Similarity. In: Yoshikawa, M., Meng, X., Yumoto, T., Ma, Q., Sun, L., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 6193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14589-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-14589-6_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14588-9
Online ISBN: 978-3-642-14589-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics