Toward File Consolidation by Document Categorization

Belaïd, Abdel; Alusse, André

doi:10.1007/11669487_39

Toward File Consolidation by Document Categorization

Abdel Belaïd¹⁸ &
André Alusse¹⁸

Conference paper

1564 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3872))

Abstract

An efficient adaptive document classification and categorization approach is proposed for personal file creation corresponding to user’s specific needs and profile. This kind of approach is needed because the search engines are often too general to offer a precise answer to the user request. As we cannot act directly on the search engines methodology, we propose to rather act on the documents retrieved by classifying and ranking them properly. A classifier combination approach is considered. These classifiers are chosen very complementary in order to treat all the query aspects and to present to the user at the end a readable and comprehensible result. The application performed corresponds to the law articles stemmed from the European Union data base. The law texts are always entangled with cross-references and accompanied by some updating files (for application dates, for new terms and formulations). Our approach found here a real application offering to the specialist (jurist, lawyer, etc. ) a synthetic vision of the law related to the topic requested.

Download to read the full chapter text

Chapter PDF

References

Rangoni, Y., Belaïd, A.: Data categorization for a context return applied to logical document structure recognition. In: ICDAR, Seoul, Korea, pp. 297–301 (2005)
Google Scholar
Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In: Actes of ACM/SIGIR Conference on Research and Development in Information Retrieval, Zurich, Suisse, pp. 76–84 (1996)
Google Scholar
Lam, W., Lai, K.Y.: A meta-learning approach for text categorization. In: Proceedings of SIGIR 2001, New Orleans, US, pp. 303–309 (2001)
Google Scholar
Bennett, P.N., Dumais, S.T., Horvitz, E.: Probabilistic combination of text classifiers using reliability indicators: Models and results. In: Proceedings of SIGIR 2002, Tampere, Finland, pp. 207–215 (2002)
Google Scholar
Voorhees, E.M.: Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Information Processing and Management 22, 465–476 (1986)
Article Google Scholar
Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of the 19^th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 46–54 (1998)
Google Scholar
Osinski, S.: An Algorithm for clustering of Web Search results”, Master thesis, Poznan Universitv of technology (2003)
Google Scholar
Lamirel, J.C., Francois, C., Al Shehadi, S., Hoffman, M.: Multi-Topographic new classification quality estimators for analysis of documentary information: Application to patent analysis and web mapping. Scientometrics international Journal 60(3), 445–462 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LORIA, Campus Scientifique, B.P. 236, Vandoeuvre-Lès-Nancy, France
Abdel Belaïd & André Alusse

Authors

Abdel Belaïd
View author publications
You can also search for this author in PubMed Google Scholar
André Alusse
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Applied Mathematics, University of Bern, Neubrückstrasse 10, CH-3012, Bern, Switzerland
Horst Bunke
DocRec Ltd, 34 Strathaven Place, 7001, Atawhai, Nelson, New Zealand
A. Lawrence Spitz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Belaïd, A., Alusse, A. (2006). Toward File Consolidation by Document Categorization. In: Bunke, H., Spitz, A.L. (eds) Document Analysis Systems VII. DAS 2006. Lecture Notes in Computer Science, vol 3872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11669487_39

Download citation

DOI: https://doi.org/10.1007/11669487_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32140-8
Online ISBN: 978-3-540-32157-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)