Refining the Results of Automatic e-Textbook Construction by Clustering

Chen, Jing; Li, Qing; Feng, Ling

doi:10.1007/11528043_31

Jing Chen²⁰,
Qing Li²⁰ &
Ling Feng²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3583))

Included in the following conference series:

International Conference on Web-Based Learning

871 Accesses

Abstract

The abundance of knowledge-rich information on the World Wide Web makes compiling an online e-textbook both possible and necessary. The authors of [7] proposed an approach to automatically generate an e-textbook by mining the ranking lists of the search engine. However, the performance of the approach was degraded by Web pages that were relevant but not actually discussing the desired concept. In this paper, we extend the work in [7] by applying a clustering approach before the mining process. The clustering approach serves as a post-processing stage to the original results retrieved by the search engine, and aims to reach an optimum state in which all Web pages assigned to a concept are discussing that exact concept.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brin, S., Page, L.: The Anatomy of a Large-scale Hypertextual Web Search Engine. In: Proceedings of International Conference on World Wide Web (1998)
Google Scholar
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers, San Francisco (2002)
Google Scholar
Zamir, O., Etzioni, O.: Grouper: A Dynamic Clustering Interface to Web Search Results. Computer Networks 31(11-16), 1361–1374 (1999)
Article Google Scholar
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y.: Learning To Cluster Web Search Results. In: Proceedings of the 27th annual international conference on research and development in information retrieval (SIGIR 2004), Sheffield, United Kingdom, pp. 210–217 (July 2004)
Google Scholar
Ferragina, P., Gullí, A.: The Anatomy of a Hierarchical Clustering Engine for Web-page, News and Book Snippets. In: Perner, P. (ed.) ICDM 2004. LNCS (LNAI), vol. 3275, pp. 395–398. Springer, Heidelberg (2004)
Google Scholar
Vivisimo, http://vivisimo.com/html/index
Chen, J., Li, Q., Wang, L., Jia, W.: Automatically Generating an e-Textbook on the Web. In: Liu, W., Shi, Y., Li, Q. (eds.) ICWL 2004. LNCS, vol. 3143, pp. 35–42. Springer, Heidelberg (2004)
Chapter Google Scholar
Liu, B., Chin, C.-W., Ng, H.-T.: Mining Topic-specific Concepts and Definitions on the Web. In: Proceedings of International Conference on World Wide Web, 2003, pp. 251–260 (2003)
Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw Hill, New York (1983)
MATH Google Scholar
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing 18(6), 1245–1262 (1989)
Article MATH MathSciNet Google Scholar
Wang, Y., DeWitt, D.J., Cai, J.-y.: X-Diff: An Effective Change Detection Algorithm for XML Documents. In: ICDE 2003, pp. 519–530 (2003)
Google Scholar
Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: WebDB 2002, pp. 61–66 (2002)
Google Scholar
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An Efficient k-Means Clustering Algorithm: Analysis and Implementation. IEEE Transaction on Pattern Analysis and Machine Intelligence 24(7), 881–892 (2002)
Article Google Scholar
de Castro Reis, D., Golgher, P.B., da Silva, A.S., Laender, A.H.F.: Automatic web news extraction using tree edit distance. In: WWW 2004, pp. 502–511 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong
Jing Chen & Qing Li
Department of Computer Science, University of Twente, PO Box 217, 7500, Enschede, The Netherlands
Ling Feng

Authors

Jing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qing Li
View author publications
You can also search for this author in PubMed Google Scholar
Ling Feng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Durham, South Road, DH1 3LE, Durham, UK
Rynson W. H. Lau
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Qing Li
Department of Computing, Hong Kong Polytechnic University, P.O. Box
Ronnie Cheung
Department of Computer Science, City University of Hong Kong, Hong Kong, China
Wenyin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Li, Q., Feng, L. (2005). Refining the Results of Automatic e-Textbook Construction by Clustering. In: Lau, R.W.H., Li, Q., Cheung, R., Liu, W. (eds) Advances in Web-Based Learning – ICWL 2005. ICWL 2005. Lecture Notes in Computer Science, vol 3583. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11528043_31

Download citation

DOI: https://doi.org/10.1007/11528043_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27895-5
Online ISBN: 978-3-540-31716-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics