Analysis and Interpretation of Semantic HTML Tables

Yin, Wensheng; Guo, Feifei; Xu, Fan; Chen, Xiuguo

doi:10.1007/978-3-642-05250-7_8

Analysis and Interpretation of Semantic HTML Tables

Wensheng Yin¹⁹,
Feifei Guo¹⁹,
Fan Xu¹⁹ &
…
Xiuguo Chen¹⁹

Conference paper

1004 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5854))

Abstract

Table is an effective manifestation of structural knowledge, on which the semantic analysis is a very important part in semantic document analysis. To interpret the structure and the semantic relations of the HTML documents, definitions of normalized table and tabular coordinate system are proposed according to database relation theory. This paper classifies cells into normalized cells and visual cells, indicates that row or column and its combined cell are the primary semantic expression forms of table and nested tables are the further expansion of a certain table cell. Finally, a table analyzing algorithm is given based on tabular coordinate system. Practice shows that the algorithm is simple, fast and having certain practical significance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jung, S.W., Kwon, H.C.: Hybrid Approach to Extracting Information from Web-Tables. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 109–119. Springer, Heidelberg (2006)
Chapter Google Scholar
Jung, S.W., Kwon, H.C.: A Machine Learning Based Approach for Separating Head from Body in Web-Tables. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 524–535. Springer, Heidelberg (2006)
Chapter Google Scholar
Yoshida, M., Torisawa, K., Tsujii, J.: Extracting attributes and their values from web pages. In: Proceedings of the ACL Student Research Workshop, Japan, pp. 72–77 (2002)
Google Scholar
Tanaka, M., Ishida, T.: Ontology extraction from tables on the web. In: Proceedings of the 2005 Symposium on Application and the Internet (SAINT 2006), pp. 284–290. IEEE, Los Alamitos (2006)
Chapter Google Scholar
Jiexue, L., Zhuoyun, A., Park, H.H., et al.: An XML approach to semantically extract data from HTML tables. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 696–705. Springer, Heidelberg (2005)
Google Scholar
Kim, Y.S., Lee, K.H.: Extracting logical structures from HTML tables. Computer Standards & Interfaces, 296–308 (2007)
Google Scholar
Li, S., Peng, Z., Liu, M.: Extraction and Integration Information in HTML Tables. In: Proc. CIT 2004, IEEE Computer Society digital library, pp. 315–320. IEEE, Wuhan (2004)
Google Scholar
Chen, H., Tsai, S., Tsai, J.: Mining Tables from Large Scale HTML Texts. In: Proceedings of the 18th International Conference on Computational Linguistics, pp. 166–172. Association for Computational Linguistics, New Jersey (2000)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China
Wensheng Yin, Feifei Guo, Fan Xu & Xiuguo Chen

Authors

Wensheng Yin
View author publications
You can also search for this author in PubMed Google Scholar
Feifei Guo
View author publications
You can also search for this author in PubMed Google Scholar
Fan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiuguo Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong, China
Wenyin Liu & Fu Lee Wang &
Key Laboratory of Grid Technology, Digital Content Analysis and Semantic Grid Group, Shanghai University, 200072, Shanghai, China
Xiangfeng Luo
College of Information Science and Technology, Hainan University, 570228, Haikou, China
Jingsheng Lei

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, W., Guo, F., Xu, F., Chen, X. (2009). Analysis and Interpretation of Semantic HTML Tables. In: Liu, W., Luo, X., Wang, F.L., Lei, J. (eds) Web Information Systems and Mining. WISM 2009. Lecture Notes in Computer Science, vol 5854. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05250-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-05250-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05249-1
Online ISBN: 978-3-642-05250-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics