Spreadsheet Metadata Extraction: A Layout-Based Approach

Chatvichienchai, Somchai

doi:10.1007/978-3-642-32600-4_12

Spreadsheet Metadata Extraction: A Layout-Based Approach

Somchai Chatvichienchai²⁰

Conference paper

882 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7446))

Abstract

Metadata is an essential part of modern information system since it helps people to find relevant documents from disparate repositories. This paper proposes an innovative metadata extraction method for spreadsheets. Unlike traditional methods which concern only content information, this paper considers both layout and content. The proposed method extracts metadata from the spreadsheets whose metadata is stored under certain conditions. Data types (such as date, number, etc.) of metadata are taken into account in order to realize document search based on metadata of various data types. Furthermore, the extracted metadata is semantically classified and hierarchically grouped in order to allow end-users to define complex search queries and the meanings of search keywords. System implementation of the proposed method is also discussed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Automatic Metadata Generation (2011), http://ariadne.cs.kuleuven.be/SAmgI/design/SimpleAmgInterface_1_0_prealpha.pdf
Chatvichienchai, S., Tanaka, K.: Office Document Search by Semantic Relationship Approach. International Journal of Advances on Information Sciences and Service Sciences 3(1), 30–40 (2011)
Article Google Scholar
Cvetković, S., Stojanović, M., Stanković, M.: An Approach for Extraction and Visualization of Scientific Metadata. In: ICT Innovations 2010 Web Proceedings, pp.161–170 (2010)
Google Scholar
DC-dot (2011), http://www.ukoln.ac.uK-metadata/dcdot/
Google Scholar
Google, Google Desktop Search (2011), http://desktop.google.com/
Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing XPath queries. ACM Trans. Database Syst. 30(2), 444–491 (2005)
Article MathSciNet Google Scholar
Guo, Z., Jin, H.: A Rule-Based Framework of metadata Extraction from Scientific Papers. In: 10th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, JiangSu, China, pp. 400–404 (2011)
Google Scholar
IBM. Lotus 1-2-3 (2011), http://www-01.ibm.com/software/lotus/products/123/
Liu, Y., Bai, K., Mitra, P., Giles, C.: Searching for tables in digital documents. In: 9th Int’l Conf. on Document Analysis and Recognition (ICDAR 2007), pp. 934–938 (2007)
Google Scholar
Metadata Miner Pro (2011), http://peccatte.karefil.com/software/Catalogue/MetadataMiner.htm
Microsoft Excel (2010), http://office.microsoft.com/en-us/excel/excel-2010-features-and-benefits-HA101806958.aspx
Microsoft Visual Studio (2010), http://www.microsoft.com/visualstudio/en-us
OpenOffice, Calc: The all-purpose spreadsheet (2011), http://www.openoffice.org/product/calc.html
PostgreSQL (2012), http://www.postgresql.org/docs/8.3/static/functions-xml.html
Vick, P.: The Visual Basic.NET Programming Language. Addison-Wesley Professional (March 2004)
Google Scholar
W3C, XML Path Language (XPath) Version 1.0, REC-xpath-19991116 (1999), http://www.w3.org/TR/1999/
W3C, XML Schema (2001), http://www.w3c.org/XML/Schema
W3C, Extensible Markup Language (XML) 1.0, 4th edn. (2006), http://www.w3.org/TR/2006/REC-xml-20060816/
Walkenbach, Excel 2007 Power Programming with VBA. Wiley (2007)
Google Scholar
X1 Technologies, X1 Professional Client (2011), http://www.x1.com/products/professional-client

Download references

Author information

Authors and Affiliations

Dept. of Information and Media Studies, University of Nagasaki, Japan
Somchai Chatvichienchai

Authors

Somchai Chatvichienchai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Marriott School, Brigham Young University, 784 TNRB, 84602, Provo, UT, USA
Stephen W. Liddle
Software Competence Center Hagenberg, Softwarepark 21, 4232, Hagenberg, Austria
Klaus-Dieter Schewe
Institute of Software Technology & Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, 1040, Vienna, Austria
A Min Tjoa
School of Information Technology and Electrical Engineering, University of Queensland, 4072, Brisbane, QLD, Australia
Xiaofang Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chatvichienchai, S. (2012). Spreadsheet Metadata Extraction: A Layout-Based Approach. In: Liddle, S.W., Schewe, KD., Tjoa, A.M., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32600-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-32600-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32599-1
Online ISBN: 978-3-642-32600-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics