Skip to main content

Unsupervised Named Entity Normalization for Supporting Information Fusion for Big Bridge Data Analytics

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10864))

Abstract

The large amount of multi-type and multi-source bridge data open unprecedented opportunities to big data analytics for better bridge deterioration prediction. Information fusion is needed prior to the analytics to transform the heterogeneous data from different sources into a unified representation. Resolving the ambiguities in the named entities extracted from bridge inspection reports is one of the most important fusion tasks. The ambiguity stems from the use of different and ambiguous surface forms to the same target named entity. There is, thus, a need for named entity normalization (NEN) methods that can map these ambiguous surface forms into their canonical form – an identifier concept. However, existing NEN methods are limited in this regard. This is because they mostly require pre-established knowledge (e.g., dictionaries or Wikipedia) and/or training data, and mostly ignore the impact of the normalization on data analytics. To address this need, this paper proposes an unsupervised NEN method. It includes two main components: candidate identifier concept generation based on multi-grams of each named entity set, and candidate identifier concept ranking based on a proposed ranking function. The function uses the TF-IDF (term frequency–inverse document frequency) weight and is further improved by considering the impacts of gram lengths and positions on the ranking. It aims to balance the abstractness and detailedness of the identifier concepts, so as to ensure that the resulting data are neither too dense nor too sparse for the analytics. A set of experiments were conducted to evaluate the performance of the proposed method. It achieved an accuracy of 84.5%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. McLinn, J.: Major bridge collapses in the US, and around the world. IEEE Trans. Reliab. 59(3), 449–482 (2010)

    Article  Google Scholar 

  2. Pearson-Kirk, D.: The benefits of bridge condition monitoring. In: Proceedings of the Institution of Civil Engineers – Bridge Engineers, vol. 161, no. 3, pp. 151–185 (2008)

    Article  Google Scholar 

  3. Commonwealth Bureau of Roads: The condition of bridges on interstate highways. Commonwealth Bureau of Roads, Canberra (1986)

    Google Scholar 

  4. Organisation for Economic Cooperation and Development: The durability of concrete road bridges. Road Transport Research Program, Organisation for Economic Cooperation and Development, Paris (1988)

    Google Scholar 

  5. American Society of Civil Engineers: Report card for America’s infrastructure. https://www.infrastructurereportcard.org/cat-item/bridges. Accessed 09 Jul 2017

  6. National Transportation Safety Board: Highway accident report interstate 35W over the Mississippi River Minneapolis, Minnesota. National Transportation Safety Board, Washington, D.C. (2008)

    Google Scholar 

  7. Morcous, G., Lounis, Z., Cho, Y.: An integrated system for bridge management using probabilistic and mechanistic deterioration models: application to bridge decks. KSCE J. Civil Eng. 14(4), 527–537 (2010)

    Article  Google Scholar 

  8. Huang, Y.: Artificial neural network model of bridge deterioration. J. Perform. Constr. Facil. 24(6), 597–602 (2010)

    Article  Google Scholar 

  9. Liu, H., Madanat, S.: Adaptive optimisation methods in system-level bridge management. Struct. Infrastruct. Eng. 11(7), 884–896 (2015)

    Article  Google Scholar 

  10. Saeed, T.U., Moomen, M., Ahmed, A., Murillo-Hoyos, J., Volovski, M., Labi, S.: Performance evaluation and life prediction of highway concrete bridge superstructure across design types. J. Perform. Constr. Facil. 31(5) (2017)

    Article  Google Scholar 

  11. Liu, K., El-Gohary, N.: Semantic modeling of bridge deterioration knowledge for supporting big bridge data analytics. In: Proceedings of the 2016 ASCE Construction Research Congress, pp. 930–939. American Society of Civil Engineers, Reston (2016)

    Google Scholar 

  12. Liu, K., El-Gohary, N.: Similarity-based dependency parsing for extracting dependency relations from bridge inspection reports. In: Proceedings of the 2017 ASCE International Workshop on Computing in Civil Engineering, pp. 316–323. American Society of Civil Engineers, Reston (2017)

    Google Scholar 

  13. Liu, K., El-Gohary, N.: Feature discretization and selection methods for supporting bridge deterioration prediction. In: Proceedings of the 2018 ASCE Construction Research Congress. American Society of Civil Engineers, Reston (2018, in press)

    Google Scholar 

  14. Liu, K., El-Gohary, N.: Ontology-based semi-supervised conditional random fields for automated information extraction from bridge inspection reports. Autom. Constr. 81, 313–323 (2017)

    Article  Google Scholar 

  15. Liu, K., El-Gohary, N.: Semantic neural network ensemble for automated dependency relation extraction from bridge inspection reports. Automation in Construction (2017, Submitted)

    Google Scholar 

  16. Liu, K., El-Gohary, N.: Hierarchical spectral clustering for unsupervised linking of data extracted from bridge inspection reports. Advanced Engineering Informatics (2017, Submitted)

    Google Scholar 

  17. Federal Highway Administration: Developing advanced methods of assessing bridge performance. http://www.fhwa.dot.gov/publications/publicroads/09novdec/04.cfm. Accessed 13 Mar 2018

  18. Popov, A.M., Adaskina Y.V., Andreyeva, D.A., Charabet, J.K., Moskvina, A.D., Protopopova, E.V., Yushina, T.A.: Named entity normalization for fact extraction task. In: Proceedings of the International Conference “Dialogue 2016”. Computational Linguistics and Intellectual Technologies, Moscow, Russia (2016)

    Google Scholar 

  19. Liu, X., Zhou, M., Wei, F., Fu, Z., Zhou, X.: Joint inference of named entity recognition and normalization for tweets. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, pp. 526–535. Association for Computational Linguistics, Stroudsburg (2002)

    Google Scholar 

  20. Cho, H., Choi, W., Lee, H.: A method for named entity normalization in biomedical articles: application to diseases and plants. BMC Bioinform. 18(1) (2017)

    Google Scholar 

  21. Li, H., Srihari, R.K., Niu, C., Li, W.: Location normalization for information extraction. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7. Association for Computational Linguistics, Stroudsburg (2002)

    Google Scholar 

  22. Zhou, W., Yu, C., Smalheiser, N., Torvik, V., Hong, J.: Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 655–662. Association for Computing Machinery, New York (2007)

    Google Scholar 

  23. Cohen, A.M.: Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. In: Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 17–24. Association for Computational Linguistics, Stroudsburg (2005)

    Google Scholar 

  24. Hanisch, D., Fundel, K., Mevissen, H.T., Zimmer, R., Fluck, J.: ProMiner: rule-based protein and gene entity recognition. BMC Bioinform. 6(1) (2005)

    Article  Google Scholar 

  25. Wei, C.H., Kao, H.Y.: Cross-species gene normalization by species inference. BMC Bioinform. 12(8) (2011)

    Article  Google Scholar 

  26. Campos, D., Matos, S., Oliveira, J.L.: A modular framework for biomedical concept recognition. BMC Bioinform. 14(1) (2013)

    Article  Google Scholar 

  27. Jijkoun, V., Khalid, M.A., Marx, M., Rijke, M.D.: Named entity normalization in user generated content. In: Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data, pp. 23–30. Association for Computing Machinery, New York (2008)

    Google Scholar 

  28. Khalid, M.A., Jijkoun, V., de Rijke, M.: The impact of named entity normalization on information retrieval for question answering. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 705–710. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_83

    Chapter  Google Scholar 

  29. Magdy, W., Darwish, K., Emam, O., Hassan, H.: Arabic cross-document person name normalization. In: Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, pp. 25–32. Association for Computational Linguistics, Stroudsburg (2007)

    Google Scholar 

  30. Chrupala, G.: Normalizing tweets with edit scripts and recurrent neural embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 680–686. Association for Computational Linguistics, Stroudsburg (2014)

    Google Scholar 

  31. Liu, F., Weng, F., Jiang, X.: A broad-coverage normalization system for social media language. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, pp. 1035–1044. Association for Computational Linguistics, Stroudsburg (2012)

    Google Scholar 

  32. Hassan, H., Menezes, A.: Social text normalization using contextual graph random walks. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1577–1586. Association for Computational Linguistics, Stroudsburg (2013)

    Google Scholar 

  33. Cook, P., Stevenson, S.: An unsupervised model for text message normalization. In: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pp. 71–78. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  34. Leaman, R., Lu, Z.: TaggerOne: joint named entity recognition and normalization with semi-Markov models. Bioinformatics 32(18), 2839–2846 (2016)

    Article  Google Scholar 

  35. Bird, S., Loper, E., Klein, E.: Natural language toolkit. http://www.nltk.org/. Accessed 06 June 2017

  36. Python Core Team: Python: A dynamic, open source programming language. http://www.python.org/. Accessed 06 June 2017

Download references

Acknowledgements

This material is based upon work supported by the Strategic Research Initiatives (SRI) Program by the College of Engineering at the University of Illinois at Urbana-Champaign.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaijian Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, K., El-Gohary, N. (2018). Unsupervised Named Entity Normalization for Supporting Information Fusion for Big Bridge Data Analytics. In: Smith, I., Domer, B. (eds) Advanced Computing Strategies for Engineering. EG-ICE 2018. Lecture Notes in Computer Science(), vol 10864. Springer, Cham. https://doi.org/10.1007/978-3-319-91638-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91638-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91637-8

  • Online ISBN: 978-3-319-91638-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics