Unsupervised Named Entity Normalization for Supporting Information Fusion for Big Bridge Data Analytics

Liu, Kaijian; El-Gohary, Nora

doi:10.1007/978-3-319-91638-5_7

Unsupervised Named Entity Normalization for Supporting Information Fusion for Big Bridge Data Analytics

Conference paper
First Online: 19 May 2018

2665 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10864))

Abstract

The large amount of multi-type and multi-source bridge data open unprecedented opportunities to big data analytics for better bridge deterioration prediction. Information fusion is needed prior to the analytics to transform the heterogeneous data from different sources into a unified representation. Resolving the ambiguities in the named entities extracted from bridge inspection reports is one of the most important fusion tasks. The ambiguity stems from the use of different and ambiguous surface forms to the same target named entity. There is, thus, a need for named entity normalization (NEN) methods that can map these ambiguous surface forms into their canonical form – an identifier concept. However, existing NEN methods are limited in this regard. This is because they mostly require pre-established knowledge (e.g., dictionaries or Wikipedia) and/or training data, and mostly ignore the impact of the normalization on data analytics. To address this need, this paper proposes an unsupervised NEN method. It includes two main components: candidate identifier concept generation based on multi-grams of each named entity set, and candidate identifier concept ranking based on a proposed ranking function. The function uses the TF-IDF (term frequency–inverse document frequency) weight and is further improved by considering the impacts of gram lengths and positions on the ranking. It aims to balance the abstractness and detailedness of the identifier concepts, so as to ensure that the resulting data are neither too dense nor too sparse for the analytics. A set of experiments were conducted to evaluate the performance of the proposed method. It achieved an accuracy of 84.5%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

McLinn, J.: Major bridge collapses in the US, and around the world. IEEE Trans. Reliab. 59(3), 449–482 (2010)
Article Google Scholar
Pearson-Kirk, D.: The benefits of bridge condition monitoring. In: Proceedings of the Institution of Civil Engineers – Bridge Engineers, vol. 161, no. 3, pp. 151–185 (2008)
Article Google Scholar
Commonwealth Bureau of Roads: The condition of bridges on interstate highways. Commonwealth Bureau of Roads, Canberra (1986)
Google Scholar
Organisation for Economic Cooperation and Development: The durability of concrete road bridges. Road Transport Research Program, Organisation for Economic Cooperation and Development, Paris (1988)
Google Scholar
American Society of Civil Engineers: Report card for America’s infrastructure. https://www.infrastructurereportcard.org/cat-item/bridges. Accessed 09 Jul 2017
National Transportation Safety Board: Highway accident report interstate 35W over the Mississippi River Minneapolis, Minnesota. National Transportation Safety Board, Washington, D.C. (2008)
Google Scholar
Morcous, G., Lounis, Z., Cho, Y.: An integrated system for bridge management using probabilistic and mechanistic deterioration models: application to bridge decks. KSCE J. Civil Eng. 14(4), 527–537 (2010)
Article Google Scholar
Huang, Y.: Artificial neural network model of bridge deterioration. J. Perform. Constr. Facil. 24(6), 597–602 (2010)
Article Google Scholar
Liu, H., Madanat, S.: Adaptive optimisation methods in system-level bridge management. Struct. Infrastruct. Eng. 11(7), 884–896 (2015)
Article Google Scholar
Saeed, T.U., Moomen, M., Ahmed, A., Murillo-Hoyos, J., Volovski, M., Labi, S.: Performance evaluation and life prediction of highway concrete bridge superstructure across design types. J. Perform. Constr. Facil. 31(5) (2017)
Article Google Scholar
Liu, K., El-Gohary, N.: Semantic modeling of bridge deterioration knowledge for supporting big bridge data analytics. In: Proceedings of the 2016 ASCE Construction Research Congress, pp. 930–939. American Society of Civil Engineers, Reston (2016)
Google Scholar
Liu, K., El-Gohary, N.: Similarity-based dependency parsing for extracting dependency relations from bridge inspection reports. In: Proceedings of the 2017 ASCE International Workshop on Computing in Civil Engineering, pp. 316–323. American Society of Civil Engineers, Reston (2017)
Google Scholar
Liu, K., El-Gohary, N.: Feature discretization and selection methods for supporting bridge deterioration prediction. In: Proceedings of the 2018 ASCE Construction Research Congress. American Society of Civil Engineers, Reston (2018, in press)
Google Scholar
Liu, K., El-Gohary, N.: Ontology-based semi-supervised conditional random fields for automated information extraction from bridge inspection reports. Autom. Constr. 81, 313–323 (2017)
Article Google Scholar
Liu, K., El-Gohary, N.: Semantic neural network ensemble for automated dependency relation extraction from bridge inspection reports. Automation in Construction (2017, Submitted)
Google Scholar
Liu, K., El-Gohary, N.: Hierarchical spectral clustering for unsupervised linking of data extracted from bridge inspection reports. Advanced Engineering Informatics (2017, Submitted)
Google Scholar
Federal Highway Administration: Developing advanced methods of assessing bridge performance. http://www.fhwa.dot.gov/publications/publicroads/09novdec/04.cfm. Accessed 13 Mar 2018
Popov, A.M., Adaskina Y.V., Andreyeva, D.A., Charabet, J.K., Moskvina, A.D., Protopopova, E.V., Yushina, T.A.: Named entity normalization for fact extraction task. In: Proceedings of the International Conference “Dialogue 2016”. Computational Linguistics and Intellectual Technologies, Moscow, Russia (2016)
Google Scholar
Liu, X., Zhou, M., Wei, F., Fu, Z., Zhou, X.: Joint inference of named entity recognition and normalization for tweets. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, pp. 526–535. Association for Computational Linguistics, Stroudsburg (2002)
Google Scholar
Cho, H., Choi, W., Lee, H.: A method for named entity normalization in biomedical articles: application to diseases and plants. BMC Bioinform. 18(1) (2017)
Google Scholar
Li, H., Srihari, R.K., Niu, C., Li, W.: Location normalization for information extraction. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7. Association for Computational Linguistics, Stroudsburg (2002)
Google Scholar
Zhou, W., Yu, C., Smalheiser, N., Torvik, V., Hong, J.: Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 655–662. Association for Computing Machinery, New York (2007)
Google Scholar
Cohen, A.M.: Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. In: Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 17–24. Association for Computational Linguistics, Stroudsburg (2005)
Google Scholar
Hanisch, D., Fundel, K., Mevissen, H.T., Zimmer, R., Fluck, J.: ProMiner: rule-based protein and gene entity recognition. BMC Bioinform. 6(1) (2005)
Article Google Scholar
Wei, C.H., Kao, H.Y.: Cross-species gene normalization by species inference. BMC Bioinform. 12(8) (2011)
Article Google Scholar
Campos, D., Matos, S., Oliveira, J.L.: A modular framework for biomedical concept recognition. BMC Bioinform. 14(1) (2013)
Article Google Scholar
Jijkoun, V., Khalid, M.A., Marx, M., Rijke, M.D.: Named entity normalization in user generated content. In: Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data, pp. 23–30. Association for Computing Machinery, New York (2008)
Google Scholar
Khalid, M.A., Jijkoun, V., de Rijke, M.: The impact of named entity normalization on information retrieval for question answering. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 705–710. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_83
Chapter Google Scholar
Magdy, W., Darwish, K., Emam, O., Hassan, H.: Arabic cross-document person name normalization. In: Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, pp. 25–32. Association for Computational Linguistics, Stroudsburg (2007)
Google Scholar
Chrupala, G.: Normalizing tweets with edit scripts and recurrent neural embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 680–686. Association for Computational Linguistics, Stroudsburg (2014)
Google Scholar
Liu, F., Weng, F., Jiang, X.: A broad-coverage normalization system for social media language. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, pp. 1035–1044. Association for Computational Linguistics, Stroudsburg (2012)
Google Scholar
Hassan, H., Menezes, A.: Social text normalization using contextual graph random walks. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1577–1586. Association for Computational Linguistics, Stroudsburg (2013)
Google Scholar
Cook, P., Stevenson, S.: An unsupervised model for text message normalization. In: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pp. 71–78. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Leaman, R., Lu, Z.: TaggerOne: joint named entity recognition and normalization with semi-Markov models. Bioinformatics 32(18), 2839–2846 (2016)
Article Google Scholar
Bird, S., Loper, E., Klein, E.: Natural language toolkit. http://www.nltk.org/. Accessed 06 June 2017
Python Core Team: Python: A dynamic, open source programming language. http://www.python.org/. Accessed 06 June 2017

Download references

Acknowledgements

This material is based upon work supported by the Strategic Research Initiatives (SRI) Program by the College of Engineering at the University of Illinois at Urbana-Champaign.

Author information

Authors and Affiliations

Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Kaijian Liu & Nora El-Gohary

Authors

Kaijian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Nora El-Gohary
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kaijian Liu .

Editor information

Editors and Affiliations

Applied Computing and Mechanics Laboratory (IMAC), School of Architecture, Civil and Environmental Engineering (ENAC), Swiss Federal Institute of Technology, Lausanne (EPFL), Lausanne, Switzerland
Ian F. C. Smith
Institute for Landscape, Architecture, Construction and Territory (inPact) Construction and Environment Department (CED), University of Applied Sciences, Geneva (HEPIA), Geneva, Switzerland
Bernd Domer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, K., El-Gohary, N. (2018). Unsupervised Named Entity Normalization for Supporting Information Fusion for Big Bridge Data Analytics. In: Smith, I., Domer, B. (eds) Advanced Computing Strategies for Engineering. EG-ICE 2018. Lecture Notes in Computer Science(), vol 10864. Springer, Cham. https://doi.org/10.1007/978-3-319-91638-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-91638-5_7
Published: 19 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91637-8
Online ISBN: 978-3-319-91638-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics