A Semantics Enabled Intelligent Semi-structured Document Processor

Zhang, Kuo; Li, JuanZi; Hong, MingCai; Yan, XueDong; Song, Qiang

doi:10.1007/978-3-662-43908-1_41

Kuo Zhang⁴,
JuanZi Li⁵,
MingCai Hong⁵,
XueDong Yan⁵ &
…
Qiang Song⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 426))

Included in the following conference series:

International Conference on Trustworthy Computing and Services

1169 Accesses
2 Citations

Abstract

Recent years, the amount of semi-structured documents available electrically has increased dramatically. Semi-structured documents usually are difficult to reuse due to the lack of explicit metadata. To enable integration and retrieval over semi-structured documents, the essential aspects in the documents should be described by metadata explicitly. The metadata could be assigned to documents and present part of their information content using various IE techniques. This paper also provides flexible user interaction mechanism to achieve better performance over less training sample documents. In semantic view extraction, by using similarity based rule induction, we have been able to improve the rule learning procedure. Experimental results show that our approach can significantly outperform most of the existing wrapper methods. We make use of the semantics that resides in document logical structure to help find relations between semantic entities. After semantic annotations of the documents, TIPSI allows those to be indexed with respect to the extracted text entities. To answer the query, TIPSI applies semantic restrictions over the entities in the KB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.isi.edu/info-agents/RISE/repository.html

References

Abiteboul, S.: Querying semi-structured data. In: Proceedings of the International Conference on Database Theory, Delphi, Greece, January 1997
Google Scholar
Summers, K.: Toward a taxonomy of logical document structures. In: Electronic Publishing and the Information Superhighway: Proceedings of the Dartmouth Institute for Advanced Graduate Studies (DAGS), pp. 124–133 (1995)
Google Scholar
Tang, J., Li, J., Lu, H., Liang, B., Huang, X., Wang, K.-H.: iASA: Learning to Annotate the Semantic Web. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 110–145. Springer, Heidelberg (2005)
Chapter Google Scholar
Califf, M.E.: Relational learning techniques for natural language information extraction. Ph.D. thesis. University of Texas, Austin (1998)
Google Scholar
Soo, V.W., Lee, C.Y., Li, C.-C., Chen, S.L., Chen, C.: Automated semantic annotation and retrieval based on sharable ontology and case-based learning techniques. In: Proceedings of the 2003 Joint Conference on Digital Libraries. IEEE (2003)
Google Scholar
Schaffer, C.: Selecting a classification method by cross-validation. Mach. Learn. 13(1), 135–143 (1993)
Google Scholar
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings of 17th National Conference on Artificial Intelligence (2000)
Google Scholar
Lavelli, A., Califf, M., Ciravegna, F., Freitag, F., Giuliano, D., Kushmerick, C., Romano, N.: A critical survey of the methodology for IE evaluation. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)
Google Scholar
Kahan, J., Koivunen, M.R.: Annotea: an open RDF infrastructure for shared web annotations. In: Proceedings of World Wide Web, pp. 623–632 (2001)
Google Scholar
Fensel, D., Decker, S., Erdmann, M., Studer, R.: Ontobroker: or how to enable intelligent access to the WWW. In: Proceedings of 11th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada (1998)
Google Scholar
Mukherjee, S., Yang, G., Ramakrishnan, I.V.: Automatic annotation of content-rich HTML documents: structural and semantic analysis. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 533–549. Springer, Heidelberg (2003)
Chapter Google Scholar
Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM – Semi-automatic CREAtion of metadata. In: Gómez-Pérez, A., Benjamins, V. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 358–372. Springer, Heidelberg (2002)
Chapter Google Scholar
Vargas-Vera, M., Motta, E., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: ontology driven semiautomatic and automatic support for semantic markup. In: Proceedings of the 13th International Conference on Knowledge Engineering and Management (EKAW 2002), Siguenza, Spain (2002)
Google Scholar
Ciravegna, F., Dingli, A., Iria, J., Wilks, Y.: Multi-strategy definition of annotation services in Melita. In: ISWC’03 Workshop on Human Language Technology for the Semantic Web and Web Services, pp. 97–107 (2003)
Google Scholar
Kogut, P., Holmes, W.: AeroDAML: applying information extraction to generate DAML annotations from web pages (2001)
Google Scholar
Popov, B., Kiryakov, A., Manov, D., Kirilov, A., Ognyanoff, D., Goranov, M.: Towards semantic web information extraction. In: Proceedings of the ISWC’03 Workshop on Human Language Technology for the Semantic Web and Web Services, pp. 1−21 (2003)
Google Scholar
Hammond, B., Sheth, A., Kochut, K.: Semantic enhancement engine: a modular document enhancement platform for semantic applications over heterogeneous content. In: Kashyap, V., Shklar, L. (eds.) Real World Semantic Web Applications. IOS Press, pp. 29–49, December 2002
Google Scholar
Li, J., Yu, Y.: Learning to generate semantic annotation for domain specific sentences. In: Proceedings of the Knowledge Markup and Semantic Annotation Workshop in K-CAP 2001, Victoria, BC (2001)
Google Scholar
Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K.S., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: A case for automated large- scale semantic annotation. J. Web Semant. Sci., Serv. Agents World Wide Web 1, 115–132 (2003)
Article Google Scholar
Buitelaar, P., Declerck, T.: Linguistic annotation for the semantic web. In: Annotation for the Semantic Web, Frontiers in Artificial Intelligence and Applications Series, Vol. 96. IOS Press (2003)
Google Scholar
Handschuh, S., Staab, S.: Annotation for the Semantic Web. Frontiers in Artificial Intelligence and Applications, vol. 96. New IOS Publication (2003)
Google Scholar

Download references

Acknowledgement

Thanks to anonymous reviewers for their valuable comments. This work was supported by National High Technology Research and Development (863) Program (2011AA01A205).

Author information

Authors and Affiliations

Beijing Sogou Technology Development Co, Ltd, Beijing, China
Kuo Zhang
Knowledge Engineering Lab, Department of Computer Science, Tsinghua University, Beijing, China
JuanZi Li, MingCai Hong, XueDong Yan & Qiang Song

Authors

Kuo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
JuanZi Li
View author publications
You can also search for this author in PubMed Google Scholar
MingCai Hong
View author publications
You can also search for this author in PubMed Google Scholar
XueDong Yan
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kuo Zhang .

Editor information

Editors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Yuyu Yuan
Beijing University of Posts and Telecommunications, Beijing, China
Xu Wu
Beijing University of Posts and Telecommunications, Beijing, China
Yueming Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, K., Li, J., Hong, M., Yan, X., Song, Q. (2014). A Semantics Enabled Intelligent Semi-structured Document Processor. In: Yuan, Y., Wu, X., Lu, Y. (eds) Trustworthy Computing and Services. ISCTCS 2013. Communications in Computer and Information Science, vol 426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43908-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-662-43908-1_41
Published: 27 June 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43907-4
Online ISBN: 978-3-662-43908-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics