Abstract
Spreadsheets are widely adopted as “popular databases”, where authors shape their solutions interactively. Although spreadsheets are easily adaptable by the author, their informal schemas cannot be automatically interpreted by machines to integrate data across independent spreadsheets. In biology, we observed a significant amount of biodiversity data in spreadsheets treated as isolated entities with different tabular organizations, but with high potential for data articulation. In order to automatically interpret these spreadsheets we exploit construction patterns followed by users in the biodiversity domain. This paper details evidences of such patterns and how they can lead to characterize the nature of a spreadsheet, as well as, its fields in a domain. It combines an automatic analysis of thousands of spreadsheets, collected on the Web, with results from a survey conducted with biologists. We propose a representation model to be used in automatic interpretation systems that captures these patterns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tolk, A.: What comes after the Semantic Web - PADS Implications for the Dynamic Web, pp. 55–62 (2006)
Bernardo, I.R., Santanchè, A., Baranauskas, M.C.C.: Automatic interpretation spreadsheets based on construction patterns recognition. In: International Conference on Enterprise Information Systems (ICEIS), pp. 1–12 (2014)
Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a Web of Semantic Data for Interpreting Tables, pp. 26–27 (2010)
O’Connor, M.J., Halaschek-Wiener, C., Musen, M.A.: Mapping master: a flexible approach for mapping spreadsheets to OWL. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part II. LNCS, vol. 6497, pp. 194–208. Springer, Heidelberg (2010)
Zhao, C., Zhao, L., Wang, H.: A spreadsheet system based on data semantic object. In: 2010 2nd IEEE International Conference on Information Management and Engineering, pp. 407–411 (2010)
Han, L., Finin, T.W., Parr, C.S., Sachs, J., Joshi, A.: RDF123: from spreadsheets to RDF. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 451–466. Springer, Heidelberg (2008)
Yang, S., Bhowmick, S.S., Madria, S.: Bio2X: a rule-based approach for semi-automatic transformation of semi-structured biological data to XML. Data Knowl. Eng. 52(2), 249–271 (2005)
Ponder, W.F., Carter, G.A., Flemons, P., Chapman, R.R.: Evaluation of Museum Collection Data for Use in Biodiversity Assessment. 15(3), 648–657 (2010)
Doush, I.A., Pontelli, E.: Detecting and recognizing tables in spreadsheets. In: Proceedings 8th IAPR International Workshop Document Analysis System - DAS 2010, pp. 471–478 (2010)
Abraham, R., Erwig, M.: Inferring templates from spreadsheets. In: Proceeding 28th International Conference on Software Engineering - ICSE 2006, vol. 15, p. 182 (2006)
Jannach, D., Shchekotykhin, K., Friedrich, G.: Automated ontology instantiation from tabular web sources—The AllRight system☆, Web Semant. Sci. Serv. Agents World Wide Web 7(3), 136–153 (2009)
Venetis, P., Halevy, A., Pas, M., Shen, W.: Recovering semantics of tables on the web. Proc. VLDB Endow. 4, 528–538 (2011)
Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: Proceedings of the International Workshop on Consuming Linked Data, pp. 1–12 (2010)
Jang, W., Seiie, Ko, Eun-Jung and Woo: Unified user-centric context: who, where, when, what, how and why. In: Proceedings of the International Workshop on Personalized Context Modeling and Management for UbiComp Applications, pp. 26–34 (2005)
Langegger, A., Wöß, W.: XLWrap – querying and integrating arbitrary spreadsheets with SPARQL. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 359–374. Springer, Heidelberg (2009)
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 1–45 (2009)
Acknowledgements
Work partially financed by FAPESP (2012/16159-6), the Microsoft Research FAPESP Virtual Institute (NavScales project), the Center for Computational Engineering and Sciences - Fapesp/Cepid 2013/08293-7, CNPq (grant 143483/2011-0, MuZOO Project and PRONEX-FAPESP), INCT in Web Science (CNPq 557.128/2009-9), CAPES, as well as individual grants from CNPq.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bernardo, I.R., Borges, M., Baranauskas, M.C.C., Santanchè, A. (2015). Interpretation of Construction Patterns for Biodiversity Spreadsheets. In: Cordeiro, J., Hammoudi, S., Maciaszek, L., Camp, O., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2014. Lecture Notes in Business Information Processing, vol 227. Springer, Cham. https://doi.org/10.1007/978-3-319-22348-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-22348-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22347-6
Online ISBN: 978-3-319-22348-3
eBook Packages: Computer ScienceComputer Science (R0)