Advertisement

Interactive Conversion of Web Tables

  • Raghav Krishna Padmanabhan
  • Ramana Chakradhar Jandhyala
  • Mukkai Krishnamoorthy
  • George Nagy
  • Sharad Seth
  • William Silversmith
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6020)

Abstract

Two hundred web tables from ten sites were imported into Excel. The tables were edited as needed, then converted into layout independent Wang Notation using the Table Abstraction Tool (TAT). The output generated by TAT consists of XML files to be used for constructing narrow-domain ontologies. On an average each table required 104 seconds for editing. Augmentations like aggregates, footnotes, table titles, captions, units and notes were also extracted in an average time of 93 seconds. Every user intervention was logged and audited. The logged interactions were analyzed to determine the relative influence of factors like table size, number of categories and various types of augmentations on the processing time. The analysis suggests which aspects of interactive table processing can be automated in the near term, and how much time such automation would save. The correlation coefficient between predicted and actual processing time was 0.66.

Keywords

Document Understanding Interactive Table Interpretation Performance Evaluation Ontology Construction Table Abstraction Tool 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tijerino, Y.A., Embley, D.W., Lonsdale, D.W., Ding, Y., Nagy, G.: Toward Ontology Generation from Tables. World Wide Web: Internet and Web Information Systems 8(3), 261–285 (2005)Google Scholar
  2. 2.
    Padmanabhan, R.: Table Abstraction Tool, RPI DocLab, Master’s Thesis, May 16 (2009)Google Scholar
  3. 3.
    Jha, P., Nagy, G.: Wang Notation Tool: Layout Independent Representation of Tables. In: Proceedings of the Nineteenth International Conference on Pattern Recognition (ICPR 2008), Tampa (April 2008)Google Scholar
  4. 4.
    Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, observations, transformations, and inferences. International Journal of Document Analysis and Recognition 7(1), 1–16 (2004)Google Scholar
  5. 5.
    Lopresti, D., Embley, D.W., Hurst, M., Nagy, G.: Table Processing Paradigms: A Research Survey. International Journal of Document Analysis and Recognition 8(2-3), 66–86 (2006)CrossRefGoogle Scholar
  6. 6.
    Sobue, T., Watanabe, T.: Identification of Item Fields in Table-form Documents with/without Line Segments. In: Proceedings of IAPR Workshop on Machine Vision Applications, Tokyo, Japan, November 12-14, pp. 522–525 (1996)Google Scholar
  7. 7.
    Klink, S., Kieninger, T.: Rule-based document structure understanding with a fuzzy combination of layout and textual features. International Journal of Document Analysis and Recognition 4(1), 18–26 (2001)CrossRefGoogle Scholar
  8. 8.
    Laurentini, A., Viada, P.: Identifying and understanding tabular material in compound documents. In: Proceedings of the Eleventh International Conference on Pattern Recognition (ICPR 1992), The Hague, pp. 405–409 (1992)Google Scholar
  9. 9.
    Itonori, K.: A table structure recognition based on textblock arrangement and ruled line position. In: Proceedings of the Second International Conference on Document Analysis and Recognition (ICDAR 1993), Tsukuba Science City, Japan, pp. 765–768 (1993)Google Scholar
  10. 10.
    Silva, E.C., Jorge, A.M., Torgo, L.: Design of an end-to-end method to extract information from tables. International Journal of Document Analysis and Recognition 8(2), 144–171 (2006)CrossRefGoogle Scholar
  11. 11.
    Krüpl, B., Herzog, M., Gatterbauer, W.: Using visual cues for extraction of tabular data from arbitrary HTML documents. In: Proceedings of the 14th Int’l. Conf. on World Wide Web, pp. 1000–1001 (2005)Google Scholar
  12. 12.
    Lopresti, D., Nagy, G.: Automated Table Processing: An (Opinionated) Survey. In: Proceedings of the Third IAPR International Workshop on Graphics Recognition, Jaipur, India, pp. 109–134 (September 1999)Google Scholar
  13. 13.
    Wang, Y., Hu, J.: Automatic Table Detection in HTML Documents. In: Web Document Analysis: Challenges and Opportunities, October 2003, pp. 135–154 (2003)Google Scholar
  14. 14.
    Handley, J.C.: Table analysis for multiline cell identification. In: Proceedings of Document Recognition and Retrieval VIII (IS\&T/SPIE Electronic Imaging), San Jose, CA, vol. 4307, pp. 44–55 (2001)Google Scholar
  15. 15.
    Jandhyala, R.C., Nagy, G., Seth, S., Silversmith, W., Krishnamoorthy, M., Padmanabhan, R.: From tessellations to table interpretation. In: Carette, J., Dixon, L., Coen, C.S., Watt, S.M. (eds.) Calculemus 2009. LNCS, vol. 5625, pp. 422–437. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  16. 16.
    Embley, D.W., Lopresti, D., Nagy, G.: Notes on Contemporary Table Recognition Workshop on Document Analysis Systems. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 164–175. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  17. 17.
    Wang, X.: Tabular Abstraction, Editing, and Formatting, Ph.D Dissertation, University of Waterloo, Waterloo, ON, Canada (1996)Google Scholar
  18. 18.
    Lopresti, D., Nagy, G.: A Tabular Survey of Automated Table Processing, Graphics Recognition: Recent Advances. In: Chhabra, A.K., Dori, D. (eds.) GREC 1999. LNCS, vol. 1941, pp. 93–120. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  19. 19.
    Seth, S., Jandhyala, R., Krishnamoorthy, M., Nagy, G.: Analysis and Taxonomy of Column Header Categories for Web Tables. To appear in Proceedings of the Document Analysis Systems, Boston (June 2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Raghav Krishna Padmanabhan
    • 1
  • Ramana Chakradhar Jandhyala
    • 1
  • Mukkai Krishnamoorthy
    • 1
  • George Nagy
    • 1
  • Sharad Seth
    • 2
  • William Silversmith
    • 1
  1. 1.ECSE, DocLabRensselaer Polytechnic InstituteTroyUSA
  2. 2.CSEUniversity of Nebraska-LincolnLincolnUSA

Personalised recommendations