Abstract
Digital libraries allow organizing, classifying and publishing collections of electronic contents that are available in computers or networks. Also, digital libraries are easy to use and configure and they offer a user interface with access to fast searching and browsing over a repository of documents using a graphical interface. This article presents a digital library prototype for retrieving, indexing and clustering documents published on a website. The website may include unstructured, semi-structured and structured documents such as: web pages, scientific papers, news and documents in several formats that contain essentially text. The proposed prototype includes a clustering process that uses a conceptual algorithm and an a priori process of cluster labeling. Preliminary results correspond to tests made with different sets of documents published in a real website.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Witten, I.H., Boddie, S.J., Bainbridge, D., McNab, R.J.: Greenstone: a comprehensive open-source digital library software system. In: Proceedings of the Fifth ACM Conference on Digital Libraries, San Antonio, Texas, United States, pp. 113–121. ACM, New York (2000)
Levy, D., Marshall, C.: Going digital: A look at assumptions underlying digital libraries. Communications of the ACM 38, 77–84 (1995)
Lesk, M.: Understanding Digital Libraries, 2nd edn. Morgan Kaufmann, New Jersey (2004)
Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval. Addison-Wesley, Harlow (1999)
Lagoze, C., Payette, S., Shin, E., Wilper, C.: Fedora: an architecture for complex objects and their relationships. International Journal on Digital Libraries 6(2), 124–138 (2006), http://dx.doi.org/10.1007/s00799-005-0130-3
Tansley, R., Bass, M., Smith, M.: DSpace as an open archival information system: Current status and future directions. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 446–460. Springer, Heidelberg (2003)
Gumpenberger, C.: The eprints story: Southampton as the cradle of institutional self-archiving. GMS Medizin - Bibliothek - Information 9, 1–6 (2009)
Castelli, D., Pagano, P.: OpenDLib: a digital library service system. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, pp. 327–340. Springer, Heidelberg (2002)
Gonçalves, M., France, R., Fox, E.: MARIAN: Flexible interoperability for federated digital libraries. In: Constantopoulos, P., Sølvberg, I.T. (eds.) ECDL 2001. LNCS, vol. 2163, pp. 173–186. Springer, Heidelberg (2001)
Bloehdorn, S., Cimiano, P., Duke, A., Haase, P., Heizmann, J., Thurlow, I., Völker, J.: Ontology-based question answering for digital libraries. In: Kovács, L., Fuhr, N., Meghini, C. (eds.) ECDL 2007. LNCS, vol. 4675, pp. 14–25. Springer, Heidelberg (2007), http://dx.doi.org/10.1007/978-3-540-74851-9_2
Rauber, A., Merkl, D.: Text mining in the SOMLib digital library system: The representation of topics and genres. Applied Intelligence 18(3), 271–293 (2003), http://dx.doi.org/10.1023/A:1023297920966
Finn, A., Kushmerick, N., Smyth, B.: Fact or fiction: Content classification for digital libraries. In: DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries (2001)
Sanchez, D., Moreno, A.: Bringing taxonomic structure to large digital libraries. International Journal of Metadata, Semantics and Ontologies 2(2), 112–122 (2007), http://dx.doi.org/10.1504/IJMSO.2007.016805
Software Foundation, T.A.: Nutch. Technical report, The Apache Software Foundation (2007), http://wiki.apache.org/nutch/
Mahecha-Nieto, I., León Guzmán, E.: An exploratory study of open source search engines: Evaluation of supportability, usability, functionality and performance. In: Quinto Congreso Colombiano de Computación 2010 (2010)
Cafarella, M., Cutting, D.: Building Nutch: Open Source Search. Queue 2(2), 54–61 (2004), http://dol.acm.org/101145/g88392.g88408
Osiriski, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on singular value decomposition. In: Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM 2004 Conference Held in Zakopane, Poland, May 17-20, pp. 359–368. Springer, Heidelberg (2004)
Osiriski, S., Weiss, D.: Conceptual clustering using lingo algorithm: Evaluation on open directory project data. In: Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM 2004 Conference Held in Zakopane, Poland, May 17-20, p. 369. Springer, Heidelberg (2004)
Salton, G.: Automatic text processing: the transformation, analysis and retrieval of information by computer. Addison-Wesley Longman Publishing Co., Inc., Boston (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mahecha-Nieto, I., León, E. (2010). Digital Web Library of a Website with Document Clustering. In: Kuri-Morales, A., Simari, G.R. (eds) Advances in Artificial Intelligence – IBERAMIA 2010. IBERAMIA 2010. Lecture Notes in Computer Science(), vol 6433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16952-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-16952-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16951-9
Online ISBN: 978-3-642-16952-6
eBook Packages: Computer ScienceComputer Science (R0)