Skip to main content

Analysis and Interpretation of Semantic HTML Tables

  • Conference paper
  • 1004 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5854))

Abstract

Table is an effective manifestation of structural knowledge, on which the semantic analysis is a very important part in semantic document analysis. To interpret the structure and the semantic relations of the HTML documents, definitions of normalized table and tabular coordinate system are proposed according to database relation theory. This paper classifies cells into normalized cells and visual cells, indicates that row or column and its combined cell are the primary semantic expression forms of table and nested tables are the further expansion of a certain table cell. Finally, a table analyzing algorithm is given based on tabular coordinate system. Practice shows that the algorithm is simple, fast and having certain practical significance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jung, S.W., Kwon, H.C.: Hybrid Approach to Extracting Information from Web-Tables. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 109–119. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Jung, S.W., Kwon, H.C.: A Machine Learning Based Approach for Separating Head from Body in Web-Tables. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 524–535. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Yoshida, M., Torisawa, K., Tsujii, J.: Extracting attributes and their values from web pages. In: Proceedings of the ACL Student Research Workshop, Japan, pp. 72–77 (2002)

    Google Scholar 

  4. Tanaka, M., Ishida, T.: Ontology extraction from tables on the web. In: Proceedings of the 2005 Symposium on Application and the Internet (SAINT 2006), pp. 284–290. IEEE, Los Alamitos (2006)

    Chapter  Google Scholar 

  5. Jiexue, L., Zhuoyun, A., Park, H.H., et al.: An XML approach to semantically extract data from HTML tables. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 696–705. Springer, Heidelberg (2005)

    Google Scholar 

  6. Kim, Y.S., Lee, K.H.: Extracting logical structures from HTML tables. Computer Standards & Interfaces, 296–308 (2007)

    Google Scholar 

  7. Li, S., Peng, Z., Liu, M.: Extraction and Integration Information in HTML Tables. In: Proc. CIT 2004, IEEE Computer Society digital library, pp. 315–320. IEEE, Wuhan (2004)

    Google Scholar 

  8. Chen, H., Tsai, S., Tsai, J.: Mining Tables from Large Scale HTML Texts. In: Proceedings of the 18th International Conference on Computational Linguistics, pp. 166–172. Association for Computational Linguistics, New Jersey (2000)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yin, W., Guo, F., Xu, F., Chen, X. (2009). Analysis and Interpretation of Semantic HTML Tables. In: Liu, W., Luo, X., Wang, F.L., Lei, J. (eds) Web Information Systems and Mining. WISM 2009. Lecture Notes in Computer Science, vol 5854. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05250-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05250-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05249-1

  • Online ISBN: 978-3-642-05250-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics