Skip to main content

Distributed Representations for Words on Tables

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10234))

Abstract

We consider a problem of word embedding for tables, and we obtain distributed representations for words found in tables. We propose a table word-embedding method, which considers both horizontal and vertical relations between cells to estimate appropriate word embedding for words in tables. We propose objective functions that make use of horizontal and vertical relations, both individually and jointly.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The total number of tables found in the corpus was 255,039.

  2. 2.

    In our data set, 266 (93.7%) out of 284 randomly sampled tables were row-wise.

  3. 3.

    In this research, we ignore tables that have no attribute names. Although this strategy can cause noise in the set of attribute vectors, the effects of such noise are small, because values are of many types and their frequency is relatively lower than that of the attributes.

  4. 4.

    The original paper of word2vec derived this objective by maximizing the probability of a word appearing in the given contexts, but here we ignore these derivations and consider only the following objectives as merely the score function for the purpose of obtaining word-embedding vectors.

  5. 5.

    In addition, note that only two of these four terms are used for each (wz) pair, rendering the SGD implementation for this model nearly the same as that of word2vec.

  6. 6.

    Although two (the first and second) terms are used for the word w and its vertical context word c, we can differentiate each term independently because there are no vectors appearing both of the terms, thus we can use the iteration method similar to that of word2vec.

  7. 7.

    Note that as a result, the size of the similarity and analogy task queries was reduced to 445 and 5,124, respectively.

References

  1. Bollegala, D., Alsuhaibani, M., Maehara, T., Kawarabayashi, K.I.: Joint word representation learning using a corpus and a semantic lexicon. In: Proceedings of AAAI 2016, pp. 2690–2696 (2016)

    Google Scholar 

  2. Bollegala, D., Maehara, T., Yoshida, Y., Kawarabayashi, K.I.: Learning word representations from relational graphs. In: Proceedings of AAAI 2015, pp. 2146–2152 (2015)

    Google Scholar 

  3. Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proc. VLDB Endowment 1(1), 538–549 (2008)

    Article  Google Scholar 

  4. Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kaufmann Publishers, Burlington (2002)

    Google Scholar 

  5. Embley, D., Hurst, M., Lopresti, D., Nagy, G.: Table-processing paradigms: a research survey. Int. J. Doc. Anal. Recogn. 8(2), 66–86 (2006)

    Article  Google Scholar 

  6. Ji, S., Satish, N., Li, S., Dubey, P.: Parallelizing word2vec in shared and distributed memory. CoRR abs/ 1604.04661 (2016)

    Google Scholar 

  7. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endowment 3(1), 1338–1347 (2010)

    Article  Google Scholar 

  8. Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of AAAI 2015, pp. 2181–2187 (2015)

    Google Scholar 

  9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS 2013, pp. 3111–3119 (2013)

    Google Scholar 

  10. Munoz, E., Hogan, A., Mileo, A.: Triplifying Wikipedia’s tables. In: Proceedings of the ISWC 2013 Workshop on Linked Data for Information Extraction (2013)

    Google Scholar 

  11. Neelakantan, A., Roth, B., McCallum, A.: Compositional vector space models for knowledge base completion. In: Proceedings of ACL 2015, pp. 156–166 (2015)

    Google Scholar 

  12. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014)

    Google Scholar 

  13. Pimplikar, R., Sarawagi, S.: Answering table queries on the web using column keywords. Proc. VLDB Endowment 5(10), 908–919 (2012)

    Article  Google Scholar 

  14. Recht, B., Re, C., Wright, S.J., Niu, F.: Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Proceedings of NIPS 2011, pp. 693–701 (2011)

    Google Scholar 

  15. Toutanova, K., Chen, D., Pantel, P., Poon, H., Choudhury, P., Gamon, M.: Representing text for joint embedding of text and knowledge bases. In: Proceedings of EMNLP 2015, pp. 1499–1509 (2015)

    Google Scholar 

  16. Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of AAAI 2014, pp. 1112–1119 (2014)

    Google Scholar 

  17. Yin, P., Lu, Z., Li, H., Kao, B.: Neural enquirer: learning to query tables in natural language. In: Proceedings of IJCAI 2016, pp. 2308–2314 (2016)

    Google Scholar 

  18. Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Int. J. Doc. Anal. Recogn. 7(1), 1–16 (2004)

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant Numbers JP15K00309, JP15K00425, JP15K16077.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minoru Yoshida .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Yoshida, M., Matsumoto, K., Kita, K. (2017). Distributed Representations for Words on Tables. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10234. Springer, Cham. https://doi.org/10.1007/978-3-319-57454-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57454-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57453-0

  • Online ISBN: 978-3-319-57454-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics