Abstract
Table is an effective manifestation of structural knowledge, on which the semantic analysis is a very important part in semantic document analysis. To interpret the structure and the semantic relations of the HTML documents, definitions of normalized table and tabular coordinate system are proposed according to database relation theory. This paper classifies cells into normalized cells and visual cells, indicates that row or column and its combined cell are the primary semantic expression forms of table and nested tables are the further expansion of a certain table cell. Finally, a table analyzing algorithm is given based on tabular coordinate system. Practice shows that the algorithm is simple, fast and having certain practical significance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Jung, S.W., Kwon, H.C.: Hybrid Approach to Extracting Information from Web-Tables. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 109–119. Springer, Heidelberg (2006)
Jung, S.W., Kwon, H.C.: A Machine Learning Based Approach for Separating Head from Body in Web-Tables. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 524–535. Springer, Heidelberg (2006)
Yoshida, M., Torisawa, K., Tsujii, J.: Extracting attributes and their values from web pages. In: Proceedings of the ACL Student Research Workshop, Japan, pp. 72–77 (2002)
Tanaka, M., Ishida, T.: Ontology extraction from tables on the web. In: Proceedings of the 2005 Symposium on Application and the Internet (SAINT 2006), pp. 284–290. IEEE, Los Alamitos (2006)
Jiexue, L., Zhuoyun, A., Park, H.H., et al.: An XML approach to semantically extract data from HTML tables. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 696–705. Springer, Heidelberg (2005)
Kim, Y.S., Lee, K.H.: Extracting logical structures from HTML tables. Computer Standards & Interfaces, 296–308 (2007)
Li, S., Peng, Z., Liu, M.: Extraction and Integration Information in HTML Tables. In: Proc. CIT 2004, IEEE Computer Society digital library, pp. 315–320. IEEE, Wuhan (2004)
Chen, H., Tsai, S., Tsai, J.: Mining Tables from Large Scale HTML Texts. In: Proceedings of the 18th International Conference on Computational Linguistics, pp. 166–172. Association for Computational Linguistics, New Jersey (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yin, W., Guo, F., Xu, F., Chen, X. (2009). Analysis and Interpretation of Semantic HTML Tables. In: Liu, W., Luo, X., Wang, F.L., Lei, J. (eds) Web Information Systems and Mining. WISM 2009. Lecture Notes in Computer Science, vol 5854. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05250-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-05250-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05249-1
Online ISBN: 978-3-642-05250-7
eBook Packages: Computer ScienceComputer Science (R0)