Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

An automatic mark-up approach for structured document retrieval in engineering design


Information and knowledge retrieval has been recognized as a key issue in engineering design. A great deal of design-related information used and generated within engineering companies is formally recorded in documents. These documents become more useful if they are structured in a consistent way so that they can be retrieved and their contents accessed more effectively. Achieving useful structure in electronic documents relies on embedding some sort of mark-up or coding that is computer-understandable. Manual mark-up is time-consuming and costly. This paper proposes a knowledge engineering approach to automatic document mark-up employing XML (the eXtensible Mark-up Language) to ’tag’ explicitly the structural information. The focus here is on long and complex engineering documents. A three-level model is explored to achieve automatic semantic mark-up using a set of document decomposition schemes. The model includes a strategic level which identifies document typographical features based on such things as styles, inference or templates; a tactical level to define the rules to realize semantic mark-up according to the document features; and an operational level to perform the computational implementation of the mark-up rules. By making document structure explicit, information retrieval can be made more focused by returning not just whole documents but the document components that are most relevant or of most interest to the engineering designer, and information relevant to the designer’s need both with respect to document structure and content, not content alone. In addition, interpretation of useful structure by the human user can be hardwired into documents, which allows us to move closer to true semantic level retrieval.

This is a preview of subscription content, log in to check access.


  1. 1.

    Lowe A, McMahon CA, Culley SJ (2004) Characterising the requirements of engineering information systems. Int J Inf Manage 24:401–422

  2. 2.

    Gardoni M, Frank C, Vernadat F (2005) Knowledge capitalisation based on textual and graphical semi-structured and non-structured information: case study in an industry research centre at EADS. Comput Ind 56:231–243

  3. 3.

    Liu S, McMahon CA, Culley SJ (2008) A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management. Comput Ind 59(1):3–16

  4. 4.

    Taghva K, Beckley R, Cooms J (2006) The effects of OCR on the extraction of private information. Document Analysis Systems VII. Proceedings Lect Notes Comput Sci 3872:348–357

  5. 5.

    Feldman R, Rosenfeld B, Fresko M (2006) TEG - a hybrid approach to information extraction. Knowl Inform Syst 9(1):1–18

  6. 6.

    Akhtar S, Reilly RG, Dunnion J (2003) Auto-tagging of text documents into XML. Text, Speech and Dialogue. Proceedings Lecture Notes in Artificial Intelligent 2807:20–26

  7. 7.

    Wild PJ, McMahon CA, Culley SJ, Darlington MJ, Liu S (2006) Towards a method for profiling engineering documentation. Proceeding of the 9th International Conference of Design, Dubrovnik, May 15–18th

  8. 8.

    Liu S, McMahon CA, Darlington MJ, Culley SJ, Wild PJ (2006) A computational framework for retrieval of document fragments based on decomposition schemes in engineering information management. Adv Eng Informat 20:401–413

  9. 9.

    Liu S, McMahon CA, Darlington MJ, Culley SJ, Wild PJ (2007) EDCMS: a content management system for engineering documents. Int J Autom Comput 5(1):56–70

  10. 10.

    Liu S, McMahon CA, Darlington MJ, Culley SJ, Wild PJ (2006) An approach for document fragment retrieval and its formatting issues in engineering information management. Lect Notes Comput Sci 3981:279–287

  11. 11.

    IDEF0, Integrated DEFinition methods,

  12. 12.


  13. 13.

    McMahon CA, Lowe A, Culley SJ, Corderoy M, Crossland R, Shan T and Stewart D (2004) Waypoint - an integrated search and retrieval system for engineering documents. J Comput Inform Sci Eng 4(4):329–338

  14. 14.

    Altova XMLSpy,

  15. 15.

    Cui H (2005) MARTT: a general approach to automatic mark-up of taxonomic descriptions with XML. Communications of the AIS. Also available on

  16. 16.

    Friedman C, Hripcsak G, Shagina L, Liu HF (1999) Representing information using natural language processing and XML. J Am Med Inform Assoc 6(1):76–87

  17. 17.

    Abolhassani M, Fuhr N, Govert N (2003) Information extraction and automatic mark-up for XML documents. Intelligent Search on XML Data, Lect Notes Comput Sci 2818:159–174

Download references

Author information

Correspondence to S. Liu.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Liu, S., McMahon, C.A., Darlington, M.J. et al. An automatic mark-up approach for structured document retrieval in engineering design. Int J Adv Manuf Technol 38, 418–425 (2008).

Download citation


  • Knowledge engineering approach
  • Automatic mark-up
  • Structured document retrieval
  • Engineering design
  • Document decomposition
  • XML