Spatial Information Recognition in Web Documents Using a Semi-supervised Machine Learning Method

Lie, Hendi; Nayak, Richi; Wyeth, Gordon

doi:10.1007/978-3-319-68783-4_11

Hendi Lie²⁴,
Richi Nayak²⁴ &
Gordon Wyeth²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10569))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1374 Accesses

Abstract

Web documents are a promising source of spatial information. With information recognition and extraction, this information can be used in various applications such as building semantic maps and indoor robotic navigation. In this paper, we present a novel methodology to identify spatial information in web documents using semi-supervised trained machine learning classifiers. The semi-supervised models trained with the half amount of data available yield only the F-score of 4% and 9% inferior to the supervised models trained with complete data on classifying spatial entities and relationships respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

References

Walter, M.R., Hemachandra, S., Homberg, B., Tellex, S., Teller, S.: A framework for learning semantic maps from grounded natural language descriptions. Int. J. Robot. Res. 33(9), 1167–1190 (2014)
Article Google Scholar
Talbot, B., Schulz, R., Upcroft, B., Wyeth, G.: Reasoning about natural language phrases for semantic goal driven exploration. In: Proceedings of the Australasian Conference on Robotics and Automation 2015 (2015)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)
Article Google Scholar
Hou, J., Schulz, R., Wyeth, G., Nayak, R.: Finding within-organisation spatial information on the Web. In: Pfahringer, B., Renz, J. (eds.) AI 2015. LNCS, vol. 9457, pp. 242–248. Springer, Cham (2015). doi:10.1007/978-3-319-26350-2_21
Chapter Google Scholar
Kolomiyets, O., Kordjamshidi, P., Bethard, S., Moens, M.-F.: Semeval-2013 task 3: spatial role labeling. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 255–266 (2013)
Google Scholar
Bastianelli, E., Croce, D., Nardi, D., Basili, R.: UNITOR-HMM-TK: Structured kernel-based learning for spatial role labeling. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), vol. 2, pp. 573–579 (2013)
Google Scholar
Roberts, K., Harabagiu, S.M.: UTD-SpRL: a joint approach to spatial role labeling. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 419–424 (2012)
Google Scholar
Prakash, V.J., Nithya, L.M.: A survey on semi-supervised learning techniques. Int. J. Comput. Trends Technol. 8(1), 25–29 (2014)
Article Google Scholar
Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book Google Scholar
Cho, H.-C., Okazaki, N., Miwa, M., Tsujii, J.: Named entity recognition with multiple segment representations. Inf. Process. Manag. 49(4), 954–965 (2013)
Article Google Scholar
Sutton, C., McCallum, A.: An introduction to conditional random fields. Found. Trends Mach. Learn. 4(4), 267–373 (2011)
Article Google Scholar
Mani, I., et al.: SpatialML: annotation scheme, resources, and evaluation. Lang. Resour. Eval. 44(3), 263–280 (2010)
Article Google Scholar
Kaggle: Normalized Discounted Cumulative Gain
Google Scholar
Okazaki, N.: CRFsuite: a fast implementation of Conditional Random Fields (CRFs) (2007)
Google Scholar
Joachims, T.: SVM-HMM: sequence tagging with SVMs (2008)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370 (2005)
Google Scholar
Tkachenko, M., Simanovsky, A.: Named entity recognition: exploring features. Proc. KONVENS 2012, 118–127 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia
Hendi Lie, Richi Nayak & Gordon Wyeth

Authors

Hendi Lie
View author publications
You can also search for this author in PubMed Google Scholar
Richi Nayak
View author publications
You can also search for this author in PubMed Google Scholar
Gordon Wyeth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hendi Lie .

Editor information

Editors and Affiliations

University of Sydney, Darlington, NSW, Australia
Athman Bouguettaya
Zhejiang University, Hangzhou, China
Yunjun Gao
Institute of Computing for Physics and Technology, Protvino, Russia
Andrey Klimenko
Nanyang Technological University, Singapore, Singapore
Lu Chen
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Xiangliang Zhang
Institute of Computing for Physics and Technology, Protvino, Russia
Fedor Dzerzhinskiy
Shanghai Jiao Tong University, Minhang Qu, China
Weijia Jia
Institute of Computing for Physics and Technology, Protvino, Russia
Stanislav V. Klimenko
City University of Hong Kong, Kowloon, Hong Kong
Qing Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lie, H., Nayak, R., Wyeth, G. (2017). Spatial Information Recognition in Web Documents Using a Semi-supervised Machine Learning Method. In: Bouguettaya, A., et al. Web Information Systems Engineering – WISE 2017. WISE 2017. Lecture Notes in Computer Science(), vol 10569. Springer, Cham. https://doi.org/10.1007/978-3-319-68783-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-68783-4_11
Published: 04 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68782-7
Online ISBN: 978-3-319-68783-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics