Toward Three-Stage Automation of Annotation for Human Values

Ishita, Emi; Fukuda, Satoshi; Oga, Toru; Oard, Douglas W.; Fleischmann, Kenneth R.; Tomiura, Yoichi; Cheng, An-Shou

doi:10.1007/978-3-030-15742-5_18

Emi Ishita¹⁸,
Satoshi Fukuda¹⁸,
Toru Oga¹⁸,
Douglas W. Oard¹⁹,
Kenneth R. Fleischmann²⁰,
Yoichi Tomiura¹⁸ &
…
An-Shou Cheng²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11420))

Included in the following conference series:

International Conference on Information

5114 Accesses
3 Citations

Abstract

Prior work on automated annotation of human values has sought to train text classification techniques to label text spans with labels that reflect specific human values such as freedom, justice, or safety. This confounds three tasks: (1) selecting the documents to be labeled, (2) selecting the text spans that express or reflect human values, and (3) assigning labels to those spans. This paper proposes a three-stage model in which separate systems can be optimally trained for each of the three stages. Experiments from the first stage, document selection, indicate that annotation diversity trumps annotation quality, suggesting that when multiple annotators are available, the traditional practice of adjudicating conflicting annotations of the same documents is not as cost effective as an alternative in which each annotator labels different documents. Preliminary results for the second stage, selecting value sentences, indicate that high recall (94%) can be achieved on that task with levels of precision (above 80%) that seem suitable for use as part of a multi-stage annotation pipeline. The annotations created for these experiments are being made freely available, and the content that was annotated is available from commercial sources at modest cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Two documents in Round 9 were discovered to be duplicates and removed.
2.
As fastText parameters we selected: -dim 50; -loss negative sampling (ns); -epoch 1000; -wordNgrams 5; -minCount 2.
3.
We have also generated learning curves using all of the adjudicated annotations, again plotting only at even values for the number of original annotations. This yields similar results.

References

CD-Mainichi Shimbun Data Collection 2011 version; 2012 version; 2013 version; 2014 version; 2015 version; and 2016 version
Google Scholar
Cheng, A.-S., Fleischmann, K.R., Wang, P., Ishita, E., Oard, D.W.: The role of innovation and wealth in the net neutrality debate: a content analysis of human values in congressional and FCC hearings. J. Am. Soc. Inf. Sci. Technol. 63, 1360–1373 (2012)
Article Google Scholar
Fleischmann, K.R.: Information and Human Values. Morgan & Claypool, San Rafael (2014)
Google Scholar
Friedman, B., Kahn Jr., P.H., Borning, A.: Value sensitive design and information systems. In: Zhang, P., Galletta, D. (eds.) Human-Computer Interaction and Management Information Systems: Foundations, pp. 348–372. M.E. Sharpe, Armonk (2006). https://doi.org/10.1002/9780470281819.ch4
Chapter Google Scholar
Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21, 267–297 (2013)
Article Google Scholar
Ishita, E., et al.: Toward automating detection of human values in the nuclear power debate. In: Proceedings of 80th Annual Meeting of the Association for Information Science and Technology, vol. 54, no. 1, pp. 714–715 (2017). https://doi.org/10.1002/pra2.2017.14505401127
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. https://arxiv.org/abs/1607.01759. Accessed 10 Sept 2018
JUMAN (a user-extensible morphological analyze for Japanese). http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?JUMAN. Accessed 10 Sept 2018
Khetan, A., Lipton, Z.C., Anandkumar, A.: Learning from noisy singly-labeled data. In: Proceedings of ICLR 2018, 15 p. (2018). https://arxiv.org/abs/1712.04577. Accessed 10 Sept 2018
Nelson, L.K.: Computational grounded theory: a methodological framework. Sociol. Methods Res. (2017). https://doi.org/10.1177/0049124117729703
Nelson, L.K., Burk, D., Knudsen, M., McCall, L.: The future of coding: a comparison of hand-coding and three types of computer-assisted text analysis methods. Sociol. Methods Res. (2018). https://doi.org/10.1177/0049124118769114
Pang, B., Lee, L.: Opinion mining and sentiment analysis. J. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008). https://doi.org/10.1561/1500000011
Article Google Scholar
Schwartz, S.H.: Value orientations: measurement, antecedents and consequences across nations. In: Jowell, R., Roberts, C., Fitzgerald, R., Eva, G. (eds.) Measuring Attitudes Cross-Nationally: Lessons from the European Social Survey, pp. 169–203. Sage, London (2007). https://doi.org/10.4135/9781849209458.n9
Takayama, Y., Tomiura, Y., Ishita, E., Oard, D.W., Fleischmann, K.R., Cheng, A.-S.: A word-scale probabilistic latent variable model for detecting human values. In: Proceedings on ACM International Conference on Information and Knowledge Management (CIKM 2014), pp. 1489–1498 (2014). https://doi.org/10.1145/2661829.2661966
Clay, T., Fleischmann, K.R.: The relationship between human values and attitudes toward the Park51 and nuclear power controversies. In: Proceedings of the 74th Annual Meeting of the American Society for Information Science and Technology, New Orleans, LA (2011). https://doi.org/10.1002/meet.2011.14504801172
TinySVM: Support Vector Machines. http://chasen.org/~taku/software/TinySVM/. Accessed 10 Sept 2018
Verma, N., Fleischmann, K.R., Koltai, K.S.: Human values and trust in scientific journals, the mainstream media and fake news. In: Proceedings of 80th Annual Meeting of the Association for Information Science and Technology, vol. 54, no. 1, pp. 426–435 (2017)
Google Scholar
Yan, J.L.S., McCracken, N., Crowston, K.: Semi-automatic content analysis of qualitative data. In: Proceedings of the iConference, pp. 1128–1132 (2014)
Google Scholar

Download references

Acknowledgements

This work has been supported in part by JSPS KAKENHI Grant Number JP18H03495.

Author information

Authors and Affiliations

Kyushu University, Fukuoka, 819-0395, Japan
Emi Ishita, Satoshi Fukuda, Toru Oga & Yoichi Tomiura
University of Maryland, College Park, MD, 20742, USA
Douglas W. Oard
University of Texas at Austin, Austin, TX, 78705, USA
Kenneth R. Fleischmann
National Sun Yat-sen University, Kaohsiung, 804, Taiwan
An-Shou Cheng

Authors

Emi Ishita
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Fukuda
View author publications
You can also search for this author in PubMed Google Scholar
Toru Oga
View author publications
You can also search for this author in PubMed Google Scholar
Douglas W. Oard
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth R. Fleischmann
View author publications
You can also search for this author in PubMed Google Scholar
Yoichi Tomiura
View author publications
You can also search for this author in PubMed Google Scholar
An-Shou Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emi Ishita .

Editor information

Editors and Affiliations

University of South Florida, Tampa, FL, USA
Natalie Greene Taylor
University of Maryland, College Park, MD, USA
Caitlin Christian-Lamb
University of Washington, Seattle, WA, USA
Michelle H. Martin
University of California, Irvine, Irvine, CA, USA
Bonnie Nardi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ishita, E. et al. (2019). Toward Three-Stage Automation of Annotation for Human Values. In: Taylor, N., Christian-Lamb, C., Martin, M., Nardi, B. (eds) Information in Contemporary Society. iConference 2019. Lecture Notes in Computer Science(), vol 11420. Springer, Cham. https://doi.org/10.1007/978-3-030-15742-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-15742-5_18
Published: 13 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15741-8
Online ISBN: 978-3-030-15742-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics