Building and Validating Hierarchical Lexicons with a Case Study on Personal Values

Wilson, Steven R.; Shen, Yiting; Mihalcea, Rada

doi:10.1007/978-3-030-01129-1_28

Steven R. Wilson¹⁶,
Yiting Shen¹⁶ &
Rada Mihalcea¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11185))

Included in the following conference series:

International Conference on Social Informatics

2441 Accesses
4 Citations

Abstract

We introduce a crowd-powered approach for the creation of a lexicon for any theme given a set of seed words that cover a variety of concepts within the theme. Terms are initially sorted by automatically clustering their embeddings and subsequently rearranged by crowd workers in order to create a tree structure. This type of organization captures hierarchical relationships between concepts and allows for a tunable level of specificity when using the lexicon to collect measurements from a piece of text. We use a lexicon expansion method to increase the overall coverage of the produced resource. Using our proposed approach, we create a hierarchical lexicon of personal values and evaluate its internal and external consistency. We release this novel resource to the community as a tool for measuring value content within text corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This new values lexicon, along with code that can be used to build an initial hierarchy, manage the human-powered sorting, and expand the sorted hierarchy can be found at: http://nlp.eecs.umich.edu/downloads.html.
2.
reddit.com.

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
MATH Google Scholar
Boyd, R.L., Wilson, S.R., Pennebaker, J.W., Kosinski, M., Stillwell, D.J., Mihalcea, R.: Values in words: using language to evaluate and understand personal values. In: ICWSM, pp. 31–40 (2015)
Google Scholar
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2012)
MATH Google Scholar
Fast, E., Chen, B., Bernstein, M.S.: Empath: understanding topic signals in large-scale text. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 4647–4657. ACM (2016)
Google Scholar
Graham, J., Haidt, J., Nosek, B.A.: Liberals and conservatives rely on different sets of moral foundations. J. Pers. Soc. Psychol. 96(5), 1029 (2009)
Article Google Scholar
Igo, S.P., Riloff, E.: Corpus-based semantic lexicon induction with web-based corroboration. In: Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics, pp. 18–26. Association for Computational Linguistics (2009)z
Google Scholar
Magnini, B., Cavaglia, G.: Integrating subject field codes into wordnet. In: LREC, pp. 1413–1418 (2000)
Google Scholar
Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)
Article MathSciNet Google Scholar
Morstatter, F., Liu, H.: A novel measure for coherence in statistical topic models. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Short Papers), vol. 2, pp. 543–548 (2016)
Google Scholar
Mrkšić, N., Séaghdha, D.O., Thomson, B., Gašić, M., Rojas-Barahona, L., Su, P.H., Vandyke, D., Wen, T.H., Young, S.: Counter-fitting word vectors to linguistic constraints. arXiv preprint arXiv:1603.00892 (2016)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Pennebaker, J.W., Boyd, R.L., Jordan, K., Blackburn, K.: The development and psychometric properties of liwc2015. Technical report (2015)
Google Scholar
Rao, D., Ravichandran, D.: Semi-supervised polarity lexicon induction. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 675–682. Association for Computational Linguistics (2009)
Google Scholar
Stone, P.J., Bales, R.F., Namenwirth, J.Z., Ogilvie, D.M.: The general inquirer: a computer system for content analysis and retrieval based on the sentence as a unit of information. Syst. Res. Behav. Sci. 7(4), 484–498 (1962)
Article Google Scholar
Thelen, M., Riloff, E.: A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 214–221. Association for Computational Linguistics (2002)
Google Scholar
Wieting, J., Bansal, M., Gimpel, K., Livescu, K.: Towards universal paraphrastic sentence embeddings. arXiv preprint arXiv:1511.08198 (2015)
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Association for Computational Linguistics (2005)
Google Scholar

Download references

Acknowledgements

This material is based in part upon work supported by the Michigan Institute for Data Science, by the National Science Foundation (grant #1344257), and by the John Templeton Foundation (grant #48503). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the Michigan Institute for Data Science, the National Science Foundation, or the John Templeton Foundation.

Author information

Authors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Steven R. Wilson, Yiting Shen & Rada Mihalcea

Authors

Steven R. Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Yiting Shen
View author publications
You can also search for this author in PubMed Google Scholar
Rada Mihalcea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steven R. Wilson .

Editor information

Editors and Affiliations

University of Koblenz, Koblenz, Germany
Steffen Staab
National Research University Higher School of Economics, St. Petersburg, Russia
Olessia Koltsova
National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wilson, S.R., Shen, Y., Mihalcea, R. (2018). Building and Validating Hierarchical Lexicons with a Case Study on Personal Values. In: Staab, S., Koltsova, O., Ignatov, D. (eds) Social Informatics. SocInfo 2018. Lecture Notes in Computer Science(), vol 11185. Springer, Cham. https://doi.org/10.1007/978-3-030-01129-1_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-01129-1_28
Published: 20 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01128-4
Online ISBN: 978-3-030-01129-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics