Identifying incompleteness in privacy policy goals using semantic frames

  • Jaspreet BhatiaEmail author
  • Morgan C. Evans
  • Travis D. Breaux
RE 2018


Companies that collect personal information online often maintain privacy policies that are required to accurately reflect their data practices and privacy goals. To be comprehensive and flexible for future practices, policies contain ambiguity that summarizes practices over multiple types of products and business contexts. Ambiguity in data practice descriptions undermines policies as an effective way to communicate system design choices to users and as a reliable regulatory mechanism. In this paper, we report an investigation to identify incompleteness by representing data practice descriptions as semantic frames. The approach is a grounded analysis to discover which semantic roles corresponding to a data action are needed to construct complete data practice descriptions. Our results include 698 data action instances obtained from 949 manually annotated statements across 15 privacy policies and three domains: health, news and shopping. Therein, we identified 2316 instances of 17 types of semantic roles and found that the distribution of semantic roles across the three domains was similar. Incomplete data practice descriptions undermine user comprehension and can affect the user’s perceived privacy risk, which we measure using factorial vignette surveys. We observed that user risk perception decreases when two roles are present in a statement: the condition under which a data action is performed, and the purpose for which the user’s information is used.


Semantic frames Semantic roles Privacy risk Natural language processing Privacy 



We thank the CMU RE Lab for their helpful feedback. This research was funded by NSF Frontier Award #1330596 and NSF CAREER Award #1453139.


  1. 1.
    Aarts B (2011) Oxford modern english grammar. Oxford University Press, OxfordGoogle Scholar
  2. 2.
    Acquisti A, Grossklags J (2012) An online survey experiment on ambiguity and privacy. Commun Strateg 88(4):19–39Google Scholar
  3. 3.
    Acquisti A, Gritzalis S, Lambrinoudakis C, di Vimercati S (2007) Digital privacy: theory, technologies, and practices. CRC Press, Boca RatonCrossRefGoogle Scholar
  4. 4.
    Antón AI, Earp JB (2004) A requirements taxonomy for reducing web site privacy vulnerabilities. Requir Eng J 9(3):169–185CrossRefGoogle Scholar
  5. 5.
    Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics—volume 1 (ACL’98), vol 1. Association for Computational Linguistics, Stroudsburg, pp 86–90Google Scholar
  6. 6.
    Bellman S, Johnson EJ, Kobrin SJ, Lohse GL (2004) International differences in information privacy concerns: a global survey of consumers. Inf Soc 20(5):313–324CrossRefGoogle Scholar
  7. 7.
    Bhatia J, Breaux TD, Reidenberg JR, Norton TB (2016) A theory of vagueness and privacy risk perception. In: IEEE 24th international requirements engineering conference (RE’16), Beijing, China, 2016Google Scholar
  8. 8.
    Bhatia J, Breaux TD (2017) A data purpose case study of privacy policies. In: 25th IEEE international requirements engineering conference, RE: Next! Track, Lisbon, Portugal, 2017Google Scholar
  9. 9.
    Bhatia J, Breaux T (2018a) Semantic incompleteness in privacy policy goals. In: 2018 IEEE 26th international requirements engineering conference (RE), Banff, AB, Canada, 2018, pp 159–169.
  10. 10.
    Bhatia J, Breaux T (2018) Empirical measurement of perceived privacy risk. ACM Trans Hum Comput Interact (TOCHI) 25(6):34Google Scholar
  11. 11.
    Breaux TD, Antón AI (2007) Impalpable constraints: framing requirements for formal methods. Technical report technical report TR-2006-06, Department of Computer Science, North Carolina State University, Raleigh, North Carolina, February 2007Google Scholar
  12. 12.
    Breaux TD, Vail MW, Antón AI (2006) Towards compliance: extracting rights and obligations to align requirements with regulations. In: Proceedings of IEEE 14th international requirements engineering conference (RE’06), Minneapolis, Minnesota, pp 49–58Google Scholar
  13. 13.
    Clark LA, Watson D (1995) Constructing validity: basic issues in objective scale development. Psychol Assess 7(3):309–319CrossRefGoogle Scholar
  14. 14.
    Dalpiaz F, van der Schalk I, Lucassen G (2018) Pinpointing ambiguity and incompleteness in requirements engineering via information visualization and NLP. In: Requirements engineering: foundation for software quality 2018, pp 119–135Google Scholar
  15. 15.
    Das D, Chen D, Martins AFT, Schneider N, Smith NA (2014) Frame-semantic parsing. Comput Linguist 40:1CrossRefGoogle Scholar
  16. 16.
    de Salvo Braz R, Girju R, Punyakanok V, Roth D, Sammons M (2005) An inference model for semantic entailment in natural language. In: National conference on artificial intelligence (AAAI), pp 1678–1679Google Scholar
  17. 17.
    Fernández DM, Wagner S (2015) Naming the pain in requirements engineering: a design for a global family of surveys and first results from Germany. Inf Softw Technol 57:616–643CrossRefGoogle Scholar
  18. 18.
    Fikes RE, Kehler T (1985) The role of frame-based representation in knowledge representation and reasoning. Commun ACM 28(9):904–920CrossRefGoogle Scholar
  19. 19.
    Fischhoff B, Slovic P, Lichtenstein S, Read S, Combs B (1978) How safe is safe enough? A psychometric study of attitudes towards technological risks and benefits. Policy Sci 9:127–152CrossRefGoogle Scholar
  20. 20.
    Gelman A, Hill J (2006) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  21. 21.
    Gruber JS (1965) Studies in lexical relations. Ph.D. thesis, MITGoogle Scholar
  22. 22.
    Fillmore CJ (1976) Frame semantics and the nature of language. Ann N Y Acad Sci 280:20–32CrossRefGoogle Scholar
  23. 23.
    Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall PTR, Upper Saddle RiverGoogle Scholar
  24. 24.
    Kaisser M, Webber B (2007) Question answering based on semantic roles. In: Proceedings of the workshop on deep linguistic processing (DeepLP’07). Association for Computational Linguistics, Stroudsburg, PA, USA, pp 41–48Google Scholar
  25. 25.
    Knijnenburg B, Kobsa A (2014) Increasing sharing tendency without reducing satisfaction: finding the best privacy-settings user interface for social networks. In: 35th international conference on information systems, pp 1–21Google Scholar
  26. 26.
    Massey A, Rutledge RL, Antón AI, Swire PP (2014) Identifying and classifying ambiguity for regulatory requirements. In: 22nd IEEE international requirement engineering conference, pp 83–92Google Scholar
  27. 27.
    Minsky M (1981) A framework for representing knowledge. In: Haugeland J (ed) Mind design. MIT Press, CambridgeGoogle Scholar
  28. 28.
    Perrin A, Duggan M (2015) Americans’ internet access: 2000–2015. In: PEW internet and American life project, June 26, 2015Google Scholar
  29. 29.
    Roth M, Lapata M (2015) Context-aware frame-semantic role labeling. Trans Assoc Comput Linguist 3:449–460CrossRefGoogle Scholar
  30. 30.
    Saldaña J (2012) The coding manual for qualitative researchers. SAGE Publications, Thousand OaksGoogle Scholar
  31. 31.
    Shadish WR, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton, Mifflin and Company, BostonGoogle Scholar
  32. 32.
    Surdeanu M, Harabagiu S, Williams J, Aarseth P (2003) Using predicate-argument structures for information extraction. In: Proceedings of 41st annual meeting on association for computational linguistics—volume 1 (ACL’03), vol 1. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 8–15Google Scholar
  33. 33.
    Tsai JY, Egelman S, Cranor L, Acquisti A (2011) The effect of online privacy information on purchasing behavior: an experimental study. Inf Syst Res 22(2):254–268CrossRefGoogle Scholar
  34. 34.
    Wakslak C, Trope Y (2009) The effect of construal level on subjective probability estimates. Psychol Sci 20(1):52–58CrossRefGoogle Scholar
  35. 35.
    Wallander L (2009) 25 years of factorial surveys in sociology: a review. Soc Sci Res 38(3):505–520CrossRefGoogle Scholar
  36. 36.
    Wang Y (2015) Semantic information extraction for software requirements using semantic role labeling. In: 2015 IEEE international conference on progress in informatics and computing (PIC), Nanjing, 2015, pp 332–337Google Scholar
  37. 37.
    Yin RK (2013) Case study research: design and methods, 5th edn. Sage Publication, CambridgeGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute for Software ResearchCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations