Skip to main content

Spatial Information Recognition in Web Documents Using a Semi-supervised Machine Learning Method

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2017 (WISE 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10569))

Included in the following conference series:

  • 1374 Accesses

Abstract

Web documents are a promising source of spatial information. With information recognition and extraction, this information can be used in various applications such as building semantic maps and indoor robotic navigation. In this paper, we present a novel methodology to identify spatial information in web documents using semi-supervised trained machine learning classifiers. The semi-supervised models trained with the half amount of data available yield only the F-score of 4% and 9% inferior to the supervised models trained with complete data on classifying spatial entities and relationships respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    http://lostoncampus.com.au.

  2. 2.

    http://www.columbia.edu/content/self-guided-walking-tour.html.

  3. 3.

    http://www.unimelb.edu.au/campustour/self-guided.

  4. 4.

    http://www.bristol.ac.uk/university/visit/walking-tour.html.

  5. 5.

    http://www.nltk.org/api/nltk.tokenize.html#module-nltk.tokenize.punkt.

  6. 6.

    https://www.cs.cornell.edu/people/tj/svm_light/svm_hmm.html.

  7. 7.

    http://nlp.stanford.edu/projects/project-ner.shtml.

References

  1. Walter, M.R., Hemachandra, S., Homberg, B., Tellex, S., Teller, S.: A framework for learning semantic maps from grounded natural language descriptions. Int. J. Robot. Res. 33(9), 1167–1190 (2014)

    Article  Google Scholar 

  2. Talbot, B., Schulz, R., Upcroft, B., Wyeth, G.: Reasoning about natural language phrases for semantic goal driven exploration. In: Proceedings of the Australasian Conference on Robotics and Automation 2015 (2015)

    Google Scholar 

  3. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)

    Article  Google Scholar 

  4. Hou, J., Schulz, R., Wyeth, G., Nayak, R.: Finding within-organisation spatial information on the Web. In: Pfahringer, B., Renz, J. (eds.) AI 2015. LNCS, vol. 9457, pp. 242–248. Springer, Cham (2015). doi:10.1007/978-3-319-26350-2_21

    Chapter  Google Scholar 

  5. Kolomiyets, O., Kordjamshidi, P., Bethard, S., Moens, M.-F.: Semeval-2013 task 3: spatial role labeling. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 255–266 (2013)

    Google Scholar 

  6. Bastianelli, E., Croce, D., Nardi, D., Basili, R.: UNITOR-HMM-TK: Structured kernel-based learning for spatial role labeling. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), vol. 2, pp. 573–579 (2013)

    Google Scholar 

  7. Roberts, K., Harabagiu, S.M.: UTD-SpRL: a joint approach to spatial role labeling. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 419–424 (2012)

    Google Scholar 

  8. Prakash, V.J., Nithya, L.M.: A survey on semi-supervised learning techniques. Int. J. Comput. Trends Technol. 8(1), 25–29 (2014)

    Article  Google Scholar 

  9. Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  10. Cho, H.-C., Okazaki, N., Miwa, M., Tsujii, J.: Named entity recognition with multiple segment representations. Inf. Process. Manag. 49(4), 954–965 (2013)

    Article  Google Scholar 

  11. Sutton, C., McCallum, A.: An introduction to conditional random fields. Found. Trends Mach. Learn. 4(4), 267–373 (2011)

    Article  Google Scholar 

  12. Mani, I., et al.: SpatialML: annotation scheme, resources, and evaluation. Lang. Resour. Eval. 44(3), 263–280 (2010)

    Article  Google Scholar 

  13. Kaggle: Normalized Discounted Cumulative Gain

    Google Scholar 

  14. Okazaki, N.: CRFsuite: a fast implementation of Conditional Random Fields (CRFs) (2007)

    Google Scholar 

  15. Joachims, T.: SVM-HMM: sequence tagging with SVMs (2008)

    Google Scholar 

  16. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370 (2005)

    Google Scholar 

  17. Tkachenko, M., Simanovsky, A.: Named entity recognition: exploring features. Proc. KONVENS 2012, 118–127 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hendi Lie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lie, H., Nayak, R., Wyeth, G. (2017). Spatial Information Recognition in Web Documents Using a Semi-supervised Machine Learning Method. In: Bouguettaya, A., et al. Web Information Systems Engineering – WISE 2017. WISE 2017. Lecture Notes in Computer Science(), vol 10569. Springer, Cham. https://doi.org/10.1007/978-3-319-68783-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68783-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68782-7

  • Online ISBN: 978-3-319-68783-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics