A Machine-Learning Approach for Semantic Matching of Building Codes and Building Information Models (BIMs) for Supporting Automated Code Checking
Various automated code compliance checking (ACC) systems have been developed and used to check the compliance of building information models (BIMs) with building codes, to reduce the time, cost, and errors of the code compliance checking process. All these systems require some form of code-BIM matching – matching of the concept representations in the codes to those in the BIMs – which is a difficult task. Traditionally, semantic matching was conducted in a highly-manual manner. To address this problem, more recently, a limited number of efforts have proposed fully automated semantic matching methods, which mostly rely on matching annotations and/or rules developed by domain experts. Despite their relatively good performance, these methods are by nature difficult to generalize or scale up (e.g., the matching rules need to be updated, modified, or extended when switching from one type of code to another). There is, thus, a need for semantic matching approaches that are more generalizable and scalable. To address this need, this paper proposes a new, machine learning-based approach to automatically match the building-code concepts and relations to their equivalent concepts and relations in the Industry Foundation Classes (IFC). The proposed approach consists of five primary tasks: (1) prepare and process the training and testing data; (2) automatically identify the domain word embeddings by learning from a large corpus of building-code text and generate the final semantic representations by combining the domain and general word embeddings; (3) match the building-code concepts to the IFC elements; (4) match the building-code relations to the IFC relations; and (5) evaluate the performance of the proposed approach using accuracy. The proposed approach was implemented and tested on a number of chapters from the 2009 International Building Code (IBC) and the Champaign 2015 IBC Amendments. The preliminary results show that the proposed approach achieved an accuracy of 77% for matching building-code concepts to IFC elements, and 78% for matching building-code relations to IFC relations, indicating promising semantic matching performance.
The authors would like to thank the National Science Foundation (NSF). This material is based on work supported by the NSF under Grant No. 1827733. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.
- Zhang, J., El-Gohary, N.: Extending building information models semiautomatically using semantic natural language processing techniques. J. Comput. Civ. Eng. ASCE (2016). https://doi.org/10.1061/(asce)cp.1943-5487.0000536CrossRefGoogle Scholar
- Zhou, P., El-Gohary, N.: Automated matching of design information in BIM to regulatory information in energy codes. In: Construction Research Congress 2018: Construction Information Technology, ASCE (2018)Google Scholar
- Karan, E.P., Irizarry, J., Haymaker, J.: BIM and GIS integration and interoperability based on semantic web technology. J. Comput. Civ. Eng. ASCE (2015). https://doi.org/10.1061/(asce)cp.1943-5487.0000519CrossRefGoogle Scholar
- Yao, Z., Sun, Y., Ding, W., Rao, N., Xiong, H.: Dynamic word embeddings for evolving semantic discovery. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, ACM (2018). https://doi.org/10.1145/3159652.3159703
- Zhang, R., El-Gohary, N.: A machine learning-based approach for building code requirement hierarchy extraction. In: Proceedings of the 7th CSCE International Construction Specialty Conference (Jointly with Construction Research Congress), CSCE (2019)Google Scholar
- Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, ACL (2014). https://doi.org/10.3115/v1/d14-1162
- Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA (2010)Google Scholar
- buildingSMART: Industry Foundation Classes, Version 4 - Addendum 2, 15 June 2019. http://www.buildingsmart-tech.org/ifc/IFC4/Add2/html/