Abstract
Metadata is an essential part of modern information system since it helps people to find relevant documents from disparate repositories. This paper proposes an innovative metadata extraction method for spreadsheets. Unlike traditional methods which concern only content information, this paper considers both layout and content. The proposed method extracts metadata from the spreadsheets whose metadata is stored under certain conditions. Data types (such as date, number, etc.) of metadata are taken into account in order to realize document search based on metadata of various data types. Furthermore, the extracted metadata is semantically classified and hierarchically grouped in order to allow end-users to define complex search queries and the meanings of search keywords. System implementation of the proposed method is also discussed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Automatic Metadata Generation (2011), http://ariadne.cs.kuleuven.be/SAmgI/design/SimpleAmgInterface_1_0_prealpha.pdf
Chatvichienchai, S., Tanaka, K.: Office Document Search by Semantic Relationship Approach. International Journal of Advances on Information Sciences and Service Sciences 3(1), 30–40 (2011)
Cvetković, S., Stojanović, M., Stanković, M.: An Approach for Extraction and Visualization of Scientific Metadata. In: ICT Innovations 2010 Web Proceedings, pp.161–170 (2010)
DC-dot (2011), http://www.ukoln.ac.uK-metadata/dcdot/
Google, Google Desktop Search (2011), http://desktop.google.com/
Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing XPath queries. ACM Trans. Database Syst. 30(2), 444–491 (2005)
Guo, Z., Jin, H.: A Rule-Based Framework of metadata Extraction from Scientific Papers. In: 10th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, JiangSu, China, pp. 400–404 (2011)
IBM. Lotus 1-2-3 (2011), http://www-01.ibm.com/software/lotus/products/123/
Liu, Y., Bai, K., Mitra, P., Giles, C.: Searching for tables in digital documents. In: 9th Int’l Conf. on Document Analysis and Recognition (ICDAR 2007), pp. 934–938 (2007)
Metadata Miner Pro (2011), http://peccatte.karefil.com/software/Catalogue/MetadataMiner.htm
Microsoft Excel (2010), http://office.microsoft.com/en-us/excel/excel-2010-features-and-benefits-HA101806958.aspx
Microsoft Visual Studio (2010), http://www.microsoft.com/visualstudio/en-us
OpenOffice, Calc: The all-purpose spreadsheet (2011), http://www.openoffice.org/product/calc.html
PostgreSQL (2012), http://www.postgresql.org/docs/8.3/static/functions-xml.html
Vick, P.: The Visual Basic.NET Programming Language. Addison-Wesley Professional (March 2004)
W3C, XML Path Language (XPath) Version 1.0, REC-xpath-19991116 (1999), http://www.w3.org/TR/1999/
W3C, XML Schema (2001), http://www.w3c.org/XML/Schema
W3C, Extensible Markup Language (XML) 1.0, 4th edn. (2006), http://www.w3.org/TR/2006/REC-xml-20060816/
Walkenbach, Excel 2007 Power Programming with VBA. Wiley (2007)
X1 Technologies, X1 Professional Client (2011), http://www.x1.com/products/professional-client
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chatvichienchai, S. (2012). Spreadsheet Metadata Extraction: A Layout-Based Approach. In: Liddle, S.W., Schewe, KD., Tjoa, A.M., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32600-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-32600-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32599-1
Online ISBN: 978-3-642-32600-4
eBook Packages: Computer ScienceComputer Science (R0)