Skip to main content

A Semantics Enabled Intelligent Semi-structured Document Processor

  • Conference paper
  • First Online:
Book cover Trustworthy Computing and Services (ISCTCS 2013)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 426))

Included in the following conference series:

Abstract

Recent years, the amount of semi-structured documents available electrically has increased dramatically. Semi-structured documents usually are difficult to reuse due to the lack of explicit metadata. To enable integration and retrieval over semi-structured documents, the essential aspects in the documents should be described by metadata explicitly. The metadata could be assigned to documents and present part of their information content using various IE techniques. This paper also provides flexible user interaction mechanism to achieve better performance over less training sample documents. In semantic view extraction, by using similarity based rule induction, we have been able to improve the rule learning procedure. Experimental results show that our approach can significantly outperform most of the existing wrapper methods. We make use of the semantics that resides in document logical structure to help find relations between semantic entities. After semantic annotations of the documents, TIPSI allows those to be indexed with respect to the extracted text entities. To answer the query, TIPSI applies semantic restrictions over the entities in the KB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.isi.edu/info-agents/RISE/repository.html

References

  1. Abiteboul, S.: Querying semi-structured data. In: Proceedings of the International Conference on Database Theory, Delphi, Greece, January 1997

    Google Scholar 

  2. Summers, K.: Toward a taxonomy of logical document structures. In: Electronic Publishing and the Information Superhighway: Proceedings of the Dartmouth Institute for Advanced Graduate Studies (DAGS), pp. 124–133 (1995)

    Google Scholar 

  3. Tang, J., Li, J., Lu, H., Liang, B., Huang, X., Wang, K.-H.: iASA: Learning to Annotate the Semantic Web. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 110–145. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Califf, M.E.: Relational learning techniques for natural language information extraction. Ph.D. thesis. University of Texas, Austin (1998)

    Google Scholar 

  5. Soo, V.W., Lee, C.Y., Li, C.-C., Chen, S.L., Chen, C.: Automated semantic annotation and retrieval based on sharable ontology and case-based learning techniques. In: Proceedings of the 2003 Joint Conference on Digital Libraries. IEEE (2003)

    Google Scholar 

  6. Schaffer, C.: Selecting a classification method by cross-validation. Mach. Learn. 13(1), 135–143 (1993)

    Google Scholar 

  7. Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings of 17th National Conference on Artificial Intelligence (2000)

    Google Scholar 

  8. Lavelli, A., Califf, M., Ciravegna, F., Freitag, F., Giuliano, D., Kushmerick, C., Romano, N.: A critical survey of the methodology for IE evaluation. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)

    Google Scholar 

  9. Kahan, J., Koivunen, M.R.: Annotea: an open RDF infrastructure for shared web annotations. In: Proceedings of World Wide Web, pp. 623–632 (2001)

    Google Scholar 

  10. Fensel, D., Decker, S., Erdmann, M., Studer, R.: Ontobroker: or how to enable intelligent access to the WWW. In: Proceedings of 11th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada (1998)

    Google Scholar 

  11. Mukherjee, S., Yang, G., Ramakrishnan, I.V.: Automatic annotation of content-rich HTML documents: structural and semantic analysis. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 533–549. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM – Semi-automatic CREAtion of metadata. In: Gómez-Pérez, A., Benjamins, V. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 358–372. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  13. Vargas-Vera, M., Motta, E., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: ontology driven semiautomatic and automatic support for semantic markup. In: Proceedings of the 13th International Conference on Knowledge Engineering and Management (EKAW 2002), Siguenza, Spain (2002)

    Google Scholar 

  14. Ciravegna, F., Dingli, A., Iria, J., Wilks, Y.: Multi-strategy definition of annotation services in Melita. In: ISWC’03 Workshop on Human Language Technology for the Semantic Web and Web Services, pp. 97–107 (2003)

    Google Scholar 

  15. Kogut, P., Holmes, W.: AeroDAML: applying information extraction to generate DAML annotations from web pages (2001)

    Google Scholar 

  16. Popov, B., Kiryakov, A., Manov, D., Kirilov, A., Ognyanoff, D., Goranov, M.: Towards semantic web information extraction. In: Proceedings of the ISWC’03 Workshop on Human Language Technology for the Semantic Web and Web Services, pp. 1−21 (2003)

    Google Scholar 

  17. Hammond, B., Sheth, A., Kochut, K.: Semantic enhancement engine: a modular document enhancement platform for semantic applications over heterogeneous content. In: Kashyap, V., Shklar, L. (eds.) Real World Semantic Web Applications. IOS Press, pp. 29–49, December 2002

    Google Scholar 

  18. Li, J., Yu, Y.: Learning to generate semantic annotation for domain specific sentences. In: Proceedings of the Knowledge Markup and Semantic Annotation Workshop in K-CAP 2001, Victoria, BC (2001)

    Google Scholar 

  19. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K.S., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: A case for automated large- scale semantic annotation. J. Web Semant. Sci., Serv. Agents World Wide Web 1, 115–132 (2003)

    Article  Google Scholar 

  20. Buitelaar, P., Declerck, T.: Linguistic annotation for the semantic web. In: Annotation for the Semantic Web, Frontiers in Artificial Intelligence and Applications Series, Vol. 96. IOS Press (2003)

    Google Scholar 

  21. Handschuh, S., Staab, S.: Annotation for the Semantic Web. Frontiers in Artificial Intelligence and Applications, vol. 96. New IOS Publication (2003)

    Google Scholar 

Download references

Acknowledgement

Thanks to anonymous reviewers for their valuable comments. This work was supported by National High Technology Research and Development (863) Program (2011AA01A205).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuo Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, K., Li, J., Hong, M., Yan, X., Song, Q. (2014). A Semantics Enabled Intelligent Semi-structured Document Processor. In: Yuan, Y., Wu, X., Lu, Y. (eds) Trustworthy Computing and Services. ISCTCS 2013. Communications in Computer and Information Science, vol 426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43908-1_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43908-1_41

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43907-4

  • Online ISBN: 978-3-662-43908-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics