Abstract
An efficient adaptive document classification and categorization approach is proposed for personal file creation corresponding to user’s specific needs and profile. This kind of approach is needed because the search engines are often too general to offer a precise answer to the user request. As we cannot act directly on the search engines methodology, we propose to rather act on the documents retrieved by classifying and ranking them properly. A classifier combination approach is considered. These classifiers are chosen very complementary in order to treat all the query aspects and to present to the user at the end a readable and comprehensible result. The application performed corresponds to the law articles stemmed from the European Union data base. The law texts are always entangled with cross-references and accompanied by some updating files (for application dates, for new terms and formulations). Our approach found here a real application offering to the specialist (jurist, lawyer, etc. ) a synthetic vision of the law related to the topic requested.
Chapter PDF
References
Rangoni, Y., Belaïd, A.: Data categorization for a context return applied to logical document structure recognition. In: ICDAR, Seoul, Korea, pp. 297–301 (2005)
Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In: Actes of ACM/SIGIR Conference on Research and Development in Information Retrieval, Zurich, Suisse, pp. 76–84 (1996)
Lam, W., Lai, K.Y.: A meta-learning approach for text categorization. In: Proceedings of SIGIR 2001, New Orleans, US, pp. 303–309 (2001)
Bennett, P.N., Dumais, S.T., Horvitz, E.: Probabilistic combination of text classifiers using reliability indicators: Models and results. In: Proceedings of SIGIR 2002, Tampere, Finland, pp. 207–215 (2002)
Voorhees, E.M.: Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Information Processing and Management 22, 465–476 (1986)
Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 46–54 (1998)
Osinski, S.: An Algorithm for clustering of Web Search results”, Master thesis, Poznan Universitv of technology (2003)
Lamirel, J.C., Francois, C., Al Shehadi, S., Hoffman, M.: Multi-Topographic new classification quality estimators for analysis of documentary information: Application to patent analysis and web mapping. Scientometrics international Journal 60(3), 445–462 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Belaïd, A., Alusse, A. (2006). Toward File Consolidation by Document Categorization. In: Bunke, H., Spitz, A.L. (eds) Document Analysis Systems VII. DAS 2006. Lecture Notes in Computer Science, vol 3872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11669487_39
Download citation
DOI: https://doi.org/10.1007/11669487_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32140-8
Online ISBN: 978-3-540-32157-6
eBook Packages: Computer ScienceComputer Science (R0)