Abstract
Marking-up of documents that only contain a layout-oriented structure (e.g. documents created by an ordinary word-processor) becomes more and more important for the future of information management in modern companies. That’s because only after the document has been marked up with logical elements, those additional information can be used for example to implement single-source publishing or to enable content-oriented retrieval. Today the process of marking-up layout-oriented documents usually has to be done manually what leads to high costs for the companies.
In the project “Adaptive READ” the Institute for Human Factors and Technology Management (IAT) of the University of Stuttgart has developed a semi-automatic approach to solve this problem of marking-up documents that only contain a layout-oriented structure. The main issues of this development are discussed in the following article.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
http://www.mintert.com/xml/mlweb/MarkUpLang.html last visited 2003-08-05
Behme, H., Minnert, S.: XML in der Praxis. In: Professionelles Web-Publishing mit der Extensible Markup Language. Addison-Wesley Verlag, München (2000)
Altenhofen, C.: Document Reengineering: Der Pfad der Altbestände in eine strukturierte Zukunft. In: Vortrag im Rahmen der T.I.E.M. 1997, Wart, Juni 1997, pp. 11–13 (1997)
Lobin, H.: Informationsmodellierung in XML und SGML. Springer, Berlin (2001)
http://xml.coverpages.org/xmlApplications.html (last visited 2003-02-21)
Soto, P.: Text Mining: Beyond Search Technology, DB2 magazine online, available at http://www.db2mag.com/db_area/archives/1998/q3/98fsoto.shtml (last visited 2003-01-30)
Ahonen, H.: Automatic generation of SGML content models, Electronic Publishing – Origination. Dissemination and Design 8(2\&3), 195–206 (1996), http://www.cs.helsinki.fi/~hahonen/helena_ep96.ps (last visited 2003-01-30)
Ahonen, H.: Disambiguation of SGML content models. In: Munson, E.V., Nicholas, C., Wood, D. (eds.) PODDP 1998 and PODP 1998. LNCS, vol. 1481, p. 24. Springer, Heidelberg (1996); available at http://www.cs.helsinki.fi/~hahonen/ahonen_podp96.ps (last visited 2003-01-30)
Ahonen, H.: Generating Grammars for Structured Documents Using Grammatical Inference Methods, PhD-Thesis, Series of Publications A, Report A-1996-4, Department of Computer Science, University of Helsinki, (November 1996), available at http://www.cs.Helsinki.FI/u/hahonen/fogram.ps.gz (last visited 2003-01-30)
Ahonen, H., Heikkinen, B., Heinonen, O., Klemettinen, M.: Improving the Accessibility of SGML-Documents - A Content-analytical Approach. In: SGML Europe 9́7, CGA S.321-327 (Mai 1997), available at http://www.cs.helsinki.fi/u/oheinone/publications/Improving_the_Accessibility_of_SGML_Documents_-_A_Content-analytical_Approach.ps.gz (last visited 2003-01-30)
Klein, B., Fankhauser, P.: Error tolerant Document Structure Analysis, GMD-IPSI Darmstadt, P-97-18. International Journal on Digital Libraries 1(4), 344–357 (1997)
Ahonen, H., Heinonen, O., Klemettinen, M., Verkamo, A.I.: Applying Data Mining Techniques in Text Analysis, Report C-1997-23, Department of Computer Science, University of Helsinki (1997), available at http://www.cs.helsinki.fi/u/oheinone/publications/Applying_Data_Mining_Techniques_in_Text_Analysis.ps.gz (last visited 2003-01-30)
Heikkinen, B.: Generalization of Document Structures and Document Assembly, PhDThesis, Series of Publications A, Report A-2000-2, Department of Computer Science, University of Helsinki (April 2000), available at http://www.cs.Helsinki.FI/u/bheikkin/bh_thesis.zip (last visited 2003-01-30)
Zeigermann, O.: Strukturierte Transformation, Diploma thesis at the University of Hamburg, Department of Computer Science (February 2000)
http://www.vftis.com (last visited 2003-01-30)
http://www.docconsult.de (last visited 2003-02-28)
http://www.stellent.com (last visited 2003-02-07)
Autonomy Technology White Paper, Autonomy Corporation (2000), http://www.autonomy.com/echo/userfile/germanwhitepaper.pdf (last download 2000-11- 13)
http://www-3.ibm.com/software/data/iminer/ (last visited 2003-02-07)
http://www.temis-group.com/ (last visited 2003-02-07)
http://www.inxight.com (last visited 2003-02-07)
Ludemann, P.: Enhancing Searching and Content Management with XML Tags and Linguistic Processing, WhitePaper of Inxight Software, Inc. (2000), available at, http://www.firstworld.net/~ludemann/XML.html (last visited 2003-02-07)
http://www.inxight.com/products/oem/linguistx/index.php (last visited 2003-02-07)
http://www.omnimark.com/home/home.html (last visited 2003-02-07)
http://www.zeigermann.de/xtal (last visited 2003-02-07)
ftp://ftp.ifi.uio.no/pub/SGML/Rainbow/ (last visited 2003-01-31)
ftp://ftp.ifi.uio.no/pub/SGML/Rainbow/ (last visited 2003-01-31)
Bullinger, H.-J(I.), Weisbecker, A.: Aufbereitung unstrukturierter Dokumentinhalte. In: Content Management - Digitale Inhalte als Bausteine einer vernetzten Welt, pp. S.1–7. Fraunhofer IRB Verlag, Stuttgart (2002)
Altenhofen, C.: Semi-automatische Informationsstrukturierung in ‘Adaptive-READ’. In: Proceedings of the conference, presentation in the XML user panel of the tekom annual conference, pp. 61–63 (2002-11-20)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Drawehn, J., Altenhofen, C., Stanišić-Petrović, M., Weisbecker, A. (2004). A Tool for Semi-automatic Document Reengineering. In: Dengel, A., Junker, M., Weisbecker, A. (eds) Reading and Learning. Lecture Notes in Computer Science, vol 2956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24642-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-24642-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21904-0
Online ISBN: 978-3-540-24642-8
eBook Packages: Springer Book Archive