Abstract
Conceptual models are the foundation for many modern intelligent systems, as well as a theoretical basis for conducting more in-depth scientific research. Various information sources (e.g., databases, spreadsheets data, and text documents, etc.) and the reverse engineering procedure can be used for creation of such models. In this paper, we propose an approach to support the conceptual model engineering based on the analysis and transformation of tabular data from CSV files. Industrial safety inspection (ISI) reports are used as examples for spreadsheets data analysis and transformation. The automated conceptual model engineering involves five steps and employs the following software: TabbyXL for extraction of canonical (relational) tables from arbitrary spreadsheet data in the CSV format; Personal Knowledge Base Designer (PKBD) for generation of conceptual model fragments based on analysis and transformation of canonical tables, and aggregating these fragments into domain model. Verification of the approach was carried out on the corpus containing 216 spreadsheets extracted from six ISI reports. The obtained conceptual models can be used in the design of knowledge bases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Berman, A.F., Nikolaichuk, O.A., Yurin, A.Y., Kuznetsov, K.A.: Support of decision-making based on a production approach in the performance of an industrial safety review. Chem. Petrol. Eng. 50(11–12), 730–738 (2015). https://doi.org/10.1007/s10556-015-9970-x
Yurin, A.Y., Dorodnykh, N.O., Nikolaychuk, O.A., Grishenko, M.A.: Prototyping rule-based expert systems with the aid of model transformations. J. Comput. Sci. 14(5), 680–698 (2018). https://doi.org/10.3844/jcssp.2018.680.698
TabbyXL wiki. https://github.com/tabbydoc/tabbyxl/wiki/Industrial-Safety-Inspection. Accessed 13 Sept 2019
Shigarov, A.O., Mikhailov, A.A.: Rule-based spreadsheet data transformation from arbitrary to relational tables. Inf. Syst. 71, 123–136 (2017). https://doi.org/10.1016/j.is.2017.08.004
Mauro, N., Esposito, F., Ferilli, S.: Finding critical cells in web tables with SRL: trying to uncover the devil’s tease. In: 12th International Conference on Document Analysis and Recognition, pp. 882–886 (2013). https://doi.org/10.1109/ICDAR.2013.180
Adelfio, M., Samet, H.: Schema extraction for tabular data on the web. VLDB Endowment 6(6), 421–432 (2013). https://doi.org/10.14778/2536336.2536343
Chen, Z., Cafarella, M.: Integrating spreadsheet data via accurate and low-effort extraction. In: 20th ACM SIGKDD International Conference Knowledge Discovery and Data Mining, pp. 1126–1135 (2014). https://doi.org/10.1145/2623330.2623617
Embley, D.W., Krishnamoorthy, M.S., Nagy, G., Seth, S.: Converting heterogeneous statistical tables on the web to searchable databases. IJDAR 19(2), 119–138 (2016). https://doi.org/10.1007/s10032-016-0259-1
Rastan, R., Paik, H., Shepherd, J., Haller, A.: Automated table understanding using stub patterns. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9642, pp. 533–548. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32025-0_33
Goto, K., Ohta, Yu., Inakoshi, H., Yugami, N.: Extraction algorithms for hierarchical header structures from spreadsheets. In: Workshops of the EDBT/ICDT 2016 Joint Conference, vol. 1558, pp. 1–6 (2016)
Nagy, G., Seth, S.: Table headers: An entrance to the data mine. In: 23rd International Conference Pattern Recognition, pp. 4065–4070 (2016). https://doi.org/10.1109/ICPR.2016.7900270
Koci, E., Thiele, M., Romero, O., Lehner, W.: A machine learning approach for layout inference in spreadsheets. In: Proceedings of 8th International Joint Conference Knowledge Discovery, Knowledge Engineering and Knowledge Management, pp. 77–88 (2016). https://doi.org/10.5220/0006052200770088
de Vos, M., Wielemaker, J., Rijgersberg, H., Schreiber, G., Wielinga, B., Top, J.: Combining information on structure and content to automatically annotate natural science spreadsheets. Int. J. Hum.-Comput. Stud. 130, 63–76 (2017). https://doi.org/10.1016/j.ijhcs.2017.02.006
Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: SIGCHI Conference on Human Factors in Computing Systems, 3363–3372 (2011). https://doi.org/10.1145/1978942.1979444
Hung, V., Benatallah, B., Saint-Paul, R.: Spreadsheet-based complex data transformation. In: 20th ACM International Conference on Information and Knowledge Management, pp. 1749–1754 (2011). https://doi.org/10.1145/2063576.2063829
Harris, W., Gulwani, S.: Spreadsheet table transformations from examples. ACM SIGPLAN Notices 46(6), 317–328 (2011). https://doi.org/10.1145/1993316.1993536
Astrakhantsev, N., Turdakov, D., Vassilieva, N.: Semi-automatic data extraction from tables. In: Proceedings 15th All-Russian Conference Digital Libraries, pp. 14–20 (2013)
Barowy, D.W., Gulwani, S., Hart, T., Zorn, B.: FlashRelate: extracting relational data from semi-structured spreadsheets using examples. ACM SIGPLAN Notices 50(6), 218–228 (2015). https://doi.org/10.1145/2813885.2737952
Cunha, J., Erwig, M., Mendes, M., Saraiva, J.: Model inference for spreadsheets. Autom. Softw. Eng. 23, 361–392 (2016). https://doi.org/10.1007/s10515-014-0167-x
Jin, Z., Anderson, M.R., Cafarella, M., Jagadish, H.V.: Foofah: Transforming data by example. In: ACM International Conference Management of Data, pp. 683–698 (2017). https://doi.org/10.1145/3035918.3064034
Hermans, F., Pinzger, M., van Deursen, A.: Automatically extracting class diagrams from spreadsheets. In: D’Hondt, T. (ed.) ECOOP 2010. LNCS, vol. 6183, pp. 52–75. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14107-2_4
Amalfitano, D., Fasolino, A.R., Tramontana, P., De Simone, V., Di Mare, G., Scala, S.: A reverse engineering process for inferring data models from spreadsheet-based information systems: an automotive industrial experience. In: Helfert, M., Holzinger, A., Belo, O., Francalanci, C. (eds.) DATA 2014. CCIS, vol. 178, pp. 136–153. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25936-9_9
Tijerino, Y.A., Embley, D.W., Lonsdale, D.W., Ding, Y., Nagy, G.: Towards ontology generation from tables. World Wide Web Internet Web Inf. Syst. 8(8), 261–285 (2005). https://doi.org/10.1007/s11280-005-0360-8
Yurin A.Y., Dorodnykh N.O., Nikolaychuk O.A., Berman A.F., Pavlov A.I.: ISI models, mendeley data, v1 (2019). https://doi.org/10.17632/f9h2t766tk.1
Acknowledgments
This work was supported by the Russian Science Foundation, grant number 18-71-10001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Dorodnykh, N.O., Yurin, A.Y., Shigarov, A.O. (2020). Conceptual Model Engineering for Industrial Safety Inspection Based on Spreadsheet Data Analysis. In: Simian, D., Stoica, L. (eds) Modelling and Development of Intelligent Systems. MDIS 2019. Communications in Computer and Information Science, vol 1126. Springer, Cham. https://doi.org/10.1007/978-3-030-39237-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-39237-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39236-9
Online ISBN: 978-3-030-39237-6
eBook Packages: Computer ScienceComputer Science (R0)