Skip to main content

Extracting Local Web Communities Using Lexical Similarity

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6193))

Abstract

The World Wide Web contains rich textual contents that are interconnected via complex hyperlinks. Most studies on web community extraction only focus on graph structures. Consequently, web communities are discovered purely in terms of explicit link information without considering textual properties of web pages. This paper proposes an improved algorithm based on Flake’s method using the maximum flow algorithm. The improved algorithm considers the differences between edges in terms of importance, and assigns a well-designed capacity to each edge via the lexical similarity of web pages. Given a specific query, it also lends itself to a new and efficient ranking scheme for members in the extracted community. The experimental results indicate that our approach efficiently handles a variety of data sets across a novel optimization strategy of similarity computation.

This work was partially supported by NSFC under grant No. 60873180, and by the start-up funding (#1600-893313) for newly appointed academic staff of Dalian University of Technology, China.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andeson, R., Lang, K.J.: Community from seed sets. In: 15th International Conference on WWW, New York, USA, pp. 223–232 (2006)

    Google Scholar 

  2. Angelova, R., Weikum, G.: Graph-based Text classification: learn from your neighbors. In: 29th ACM Conference on Research and Development in Information Retrieval, Seattle, Washington, pp. 485–492 (2006)

    Google Scholar 

  3. Asano, Y., Nishizeki, T., Toyoda, M., Kitsuregaw, A.M.: Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework. IEICE Trans. on Information and Systems (2006)

    Google Scholar 

  4. DeRose, P., Shen, W., Chen, F.: Building Structured Web Community Portals: A Top-down, Compositional, and Incremental Approach. In: 33rd International Conference on VLDB, Vienna, Austria, pp. 399–410 (2007)

    Google Scholar 

  5. Flake, G.W., Lawrence, S., Giles, C.L.: Efficient Identification of Web Communities. In: sixth ACM International Conference on KDD, pp. 150–160. ACM Press, Boston (2000)

    Google Scholar 

  6. Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.M.: Self-Organization and Identification of Web Communities. Computer (2002)

    Google Scholar 

  7. Ford, L.R., Fulkson, D.R.: Maximal Flow through A Network. Canadian Journal of Mathematics 8, 399–404 (1956)

    MATH  Google Scholar 

  8. Girven, M., Newman, M.E.J.: Community Structure in Social and Biological Networks. Proc. Nati. Acad. 99, 7821–7826 (2002)

    Article  Google Scholar 

  9. Imafuji, N., Kitsuregawa, M.: Finding Web Communities by Maximum Flow Algorithm Using Well Desinged Edge Capacities. IEICE Trans. on Information and Systems (2004)

    Google Scholar 

  10. Kernighan, B.W., Lin, S.: Tech. J. 49, 291 (1970)

    Google Scholar 

  11. Lee, H.C., Borodin, A., Goldsmith, L.: Extracting and Ranking Viral Communities Using Seed and Content Similarity. In: 19th ACM Conference on Hypertext, Pittsburgh, PA, pp. 139–148 (2008)

    Google Scholar 

  12. Pothen, A., Simon, H., Liou, K.P.: Matrix Anal. Appl. 11, 430 (1990)

    MATH  MathSciNet  Google Scholar 

  13. Scott, J.: Social Network Analysis: A Handbook, 2nd edn. Sage, London (2000)

    Google Scholar 

  14. Strehl, A.: Relationship-based Clustering and Cluster Ensembles for High-Dimensional Data Mining. Phd thesis, Univ. of Texas at Austin (2002)

    Google Scholar 

  15. Voorhees, E.M.: Using WordNet to disambiguate word senses for text retrieval. In: 16th ACM Conference on Research and Development in Information Retrieval, New York, USA, pp. 171–180 (1993)

    Google Scholar 

  16. Xu, G., Ma, W.Y.: Building Implicit Links From Content For Forum Search. In: 29th ACM Conference on Research and Development in IR, Seattle, Washington, pp. 300–307 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, X., Xu, W., Liang, W. (2010). Extracting Local Web Communities Using Lexical Similarity. In: Yoshikawa, M., Meng, X., Yumoto, T., Ma, Q., Sun, L., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 6193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14589-6_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14589-6_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14588-9

  • Online ISBN: 978-3-642-14589-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics