Skip to main content

Spreadsheet Metadata Extraction: A Layout-Based Approach

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7446))

Abstract

Metadata is an essential part of modern information system since it helps people to find relevant documents from disparate repositories. This paper proposes an innovative metadata extraction method for spreadsheets. Unlike traditional methods which concern only content information, this paper considers both layout and content. The proposed method extracts metadata from the spreadsheets whose metadata is stored under certain conditions. Data types (such as date, number, etc.) of metadata are taken into account in order to realize document search based on metadata of various data types. Furthermore, the extracted metadata is semantically classified and hierarchically grouped in order to allow end-users to define complex search queries and the meanings of search keywords. System implementation of the proposed method is also discussed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Automatic Metadata Generation (2011), http://ariadne.cs.kuleuven.be/SAmgI/design/SimpleAmgInterface_1_0_prealpha.pdf

  2. Chatvichienchai, S., Tanaka, K.: Office Document Search by Semantic Relationship Approach. International Journal of Advances on Information Sciences and Service Sciences 3(1), 30–40 (2011)

    Article  Google Scholar 

  3. Cvetković, S., Stojanović, M., Stanković, M.: An Approach for Extraction and Visualization of Scientific Metadata. In: ICT Innovations 2010 Web Proceedings, pp.161–170 (2010)

    Google Scholar 

  4. DC-dot (2011), http://www.ukoln.ac.uK-metadata/dcdot/

    Google Scholar 

  5. Google, Google Desktop Search (2011), http://desktop.google.com/

  6. Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing XPath queries. ACM Trans. Database Syst. 30(2), 444–491 (2005)

    Article  MathSciNet  Google Scholar 

  7. Guo, Z., Jin, H.: A Rule-Based Framework of metadata Extraction from Scientific Papers. In: 10th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, JiangSu, China, pp. 400–404 (2011)

    Google Scholar 

  8. IBM. Lotus 1-2-3 (2011), http://www-01.ibm.com/software/lotus/products/123/

  9. Liu, Y., Bai, K., Mitra, P., Giles, C.: Searching for tables in digital documents. In: 9th Int’l Conf. on Document Analysis and Recognition (ICDAR 2007), pp. 934–938 (2007)

    Google Scholar 

  10. Metadata Miner Pro (2011), http://peccatte.karefil.com/software/Catalogue/MetadataMiner.htm

  11. Microsoft Excel (2010), http://office.microsoft.com/en-us/excel/excel-2010-features-and-benefits-HA101806958.aspx

  12. Microsoft Visual Studio (2010), http://www.microsoft.com/visualstudio/en-us

  13. OpenOffice, Calc: The all-purpose spreadsheet (2011), http://www.openoffice.org/product/calc.html

  14. PostgreSQL (2012), http://www.postgresql.org/docs/8.3/static/functions-xml.html

  15. Vick, P.: The Visual Basic.NET Programming Language. Addison-Wesley Professional (March 2004)

    Google Scholar 

  16. W3C, XML Path Language (XPath) Version 1.0, REC-xpath-19991116 (1999), http://www.w3.org/TR/1999/

  17. W3C, XML Schema (2001), http://www.w3c.org/XML/Schema

  18. W3C, Extensible Markup Language (XML) 1.0, 4th edn. (2006), http://www.w3.org/TR/2006/REC-xml-20060816/

  19. Walkenbach, Excel 2007 Power Programming with VBA. Wiley (2007)

    Google Scholar 

  20. X1 Technologies, X1 Professional Client (2011), http://www.x1.com/products/professional-client

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chatvichienchai, S. (2012). Spreadsheet Metadata Extraction: A Layout-Based Approach. In: Liddle, S.W., Schewe, KD., Tjoa, A.M., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32600-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32600-4_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32599-1

  • Online ISBN: 978-3-642-32600-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics