User Studies of Principled Model Finder Output

  • Natasha DanasEmail author
  • Tim NelsonEmail author
  • Lane Harrison
  • Shriram Krishnamurthi
  • Daniel J. Dougherty
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10469)


Model-finders such as SAT-solvers are attractive for producing concrete models, either as sample instances or as counterexamples when properties fail. However, the generated model is arbitrary. To address this, several research efforts have proposed principled forms of output from model-finders. These include minimal and maximal models, unsat cores, and proof-based provenance of facts.

While these methods enjoy elegant mathematical foundations, they have not been subjected to rigorous evaluation on users to assess their utility. This paper presents user studies of these three forms of output performed on advanced students. We find that most of the output forms fail to be effective, and in some cases even actively mislead users. To make such studies feasible to run frequently and at scale, we also show how we can pose such studies on the crowdsourcing site Mechanical Turk.


Models User studies HCI Minimization Provenance Unsat core 



This work is partially supported by the US National Science Foundation.


  1. 1.
    Aitken, S., Gray, P., Melham, T., Thomas, M.: Interactive theorem proving: an empirical study of user activity. J. Symb. Comput. 25(2), 263–284 (1998)CrossRefGoogle Scholar
  2. 2.
    Akhawe, D., Barth, A., Lam, P., Mitchell, J., Song, D.: Towards a formal foundation of web security. In: IEEE Computer Security Foundations Symposium (2010)Google Scholar
  3. 3.
    Beckert, B., Grebing, S., Böhl, F.: How to put usability into focus: using focus groups to evaluate the usability of interactive theorem provers. In: Workshop on User Interfaces for Theorem Provers (UITP) (2014)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Beckert, B., Grebing, S., Böhl, F.: A usability evaluation of interactive theorem provers using focus groups. In: Workshop on Human Oriented Formal Methods (HOFM) (2014)Google Scholar
  5. 5.
    Bry, F., Yahya, A.: Positive unit hyperresolution tableaux and their application to minimal model generation. J. Autom. Reason. 25(1), 35–82 (2000)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Cunha, A., Macedo, N., Guimarães, T.: Target oriented relational model finding. In: Gnesi, S., Rensink, A. (eds.) FASE 2014. LNCS, vol. 8411, pp. 17–31. Springer, Heidelberg (2014). doi: 10.1007/978-3-642-54804-8_2CrossRefGoogle Scholar
  7. 7.
    D’Antoni, L., Kini, D., Alur, R., Gulwani, S., Viswanathan, M., Hartmann, B.: How can automatic feedback help students construct automata? Trans. Comput. Hum. Interact. 22(2), March 2015CrossRefGoogle Scholar
  8. 8.
    DeOrio, A., Bertacco, V.: Human computing for EDA. In: Proceedings of the 46th Annual Design Automation Conference, pp. 621–622 (2009)Google Scholar
  9. 9.
    Doghmi, S.F., Guttman, J.D., Thayer, F.J.: Searching for shapes in cryptographic protocols. In: Grumberg, O., Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, pp. 523–537. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-71209-1_41CrossRefzbMATHGoogle Scholar
  10. 10.
    Fagin, R., Ullman, J.D., Vardi, M.Y.: On the semantics of updates in databases. In: Principles of Database Systems (PODS), pp. 352–365. ACM (1983)Google Scholar
  11. 11.
    Fu, Z., Malik, S.: On solving the partial MAX-SAT problem. In: Biere, A., Gomes, C.P. (eds.) SAT 2006. LNCS, vol. 4121, pp. 252–265. Springer, Heidelberg (2006). doi: 10.1007/11814948_25CrossRefGoogle Scholar
  12. 12.
    Ghoniem, M., Fekete, J.D., Castagliola, P.: A comparison of the readability of graphs using node-link and matrix-based representations. In: Information Visualization (INFOVIS) (2004)Google Scholar
  13. 13.
    Gould, S., Cox, A.L., Brumby, D.P.: Diminished control in crowdsourcing: an investigation of crowdworker multitasking behavior. Trans. Comput. Hum. Interact. 23, 19:1–19:29 (2016)CrossRefGoogle Scholar
  14. 14.
    Hentschel, M., Hähnle, R., Bubel, R.: An empirical evaluation of two user interfaces of an interactive program verifier. In: International Conference on Automated Software Engineering (2016)Google Scholar
  15. 15.
    Herman, G.L., Kaczmarczyk, L.C., Loui, M.C., Zilles, C.B.: Proof by incomplete enumeration and other logical misconceptions. In: International Computing Education Research Workshop, ICER, pp. 59–70 (2008)Google Scholar
  16. 16.
    Jackson, D.: Software Abstractions: Logic, Language, and Analysis. MIT Press, Cambridge (2012)Google Scholar
  17. 17.
    Janota, M.: SAT solving in interactive configuration. Ph.D. thesis, University College Dublin (2010)Google Scholar
  18. 18.
    Kittur, A., Chi, E.H., Suh, B.: Crowdsourcing user studies with Mechanical Turk. In: Conference on Human Factors in Computing Systems (CHI) (2008)Google Scholar
  19. 19.
    Koshimura, M., Nabeshima, H., Fujita, H., Hasegawa, R.: Minimal model generation with respect to an atom set. In: First-Order Theorem Proving (FTP), p. 49 (2009)Google Scholar
  20. 20.
    Maldonado-Lopez, F.A., Chavarriaga, J., Donoso, Y.: Detecting network policy conflicts using Alloy. In: International Conference on Abstract State Machines, Alloy, B, and Z (2014)Google Scholar
  21. 21.
    Maoz, S., Ringert, J.O., Rumpe, B.: CD2Alloy: class diagrams analysis using Alloy revisited. In: Model Driven Engineering Languages and Systems (2011)CrossRefGoogle Scholar
  22. 22.
    Maoz, S., Ringert, J.O., Rumpe, B.: CDDiff: semantic differencing for class diagrams. In: European Conference on Object Oriented Programming (2011)Google Scholar
  23. 23.
    Mason, W., Suri, S.: Conducting behavioral research on Amazon’s Mechanical Turk. Behav. Res. Methods 44(1), 1–23 (2012)CrossRefGoogle Scholar
  24. 24.
    McCune, W.: Mace4 reference manual and guide. arXiv preprint cs/0310055 (2003)Google Scholar
  25. 25.
    Munzner, T.: Visualization Analysis and Design. CRC Press (2014)Google Scholar
  26. 26.
    Nelson, T., Danas, N., Dougherty, D.J., Krishnamurthi, S.: The power of “why” and “why not”: enriching scenario exploration with provenance. In: Foundations of Software Engineering (2017)Google Scholar
  27. 27.
    Nelson, T., Saghafi, S., Dougherty, D.J., Fisler, K., Krishnamurthi, S.: Aluminum: principled scenario exploration through minimality. In: ICSE, pp. 232–241 (2013)Google Scholar
  28. 28.
    Nelson, T., Barratt, C., Dougherty, D.J., Fisler, K., Krishnamurthi, S.: The Margrave tool for firewall analysis. In: Large Installation System Administration Conference (2010)Google Scholar
  29. 29.
    Niemelä, I.: A tableau calculus for minimal model reasoning. In: Miglioli, P., Moscato, U., Mundici, D., Ornaghi, M. (eds.) TABLEAUX 1996. LNCS, vol. 1071, pp. 278–294. Springer, Heidelberg (1996). doi: 10.1007/3-540-61208-4_18CrossRefGoogle Scholar
  30. 30.
    Ottley, A., Peck, E.M., Harrison, L.T., Afergan, D., Ziemkiewicz, C., Taylor, H.A., Han, P.K., Chang, R.: Improving Bayesian reasoning: the effects of phrasing, visualization, and spatial ability. Vis. Comput. Graph. 22(1), 529–538 (2016)CrossRefGoogle Scholar
  31. 31.
    Peer, E., Vosgerau, J., Acquisti, A.: Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behav. Res. Methods 46(4), 1023–1031 (2014)CrossRefGoogle Scholar
  32. 32.
    Robinson, A., Voronkov, A.: Handbook of Automated Reasoning, vol. 1. Elsevier, Amsterdam (2001)zbMATHGoogle Scholar
  33. 33.
    Ruchansky, N., Proserpio, D.: A (not) NICE way to verify the OpenFlow switch specification: formal modelling of the OpenFlow switch using Alloy. ACM Comput. Commun. Rev. 43(4), 527–528 (2013)CrossRefGoogle Scholar
  34. 34.
    Saghafi, S., Danas, R., Dougherty, D.J.: Exploring theories with a model-finding assistant. In: Felty, A.P., Middeldorp, A. (eds.) CADE 2015. LNCS, vol. 9195, pp. 434–449. Springer, Cham (2015). doi: 10.1007/978-3-319-21401-6_30CrossRefGoogle Scholar
  35. 35.
    Simons, D.J.: Current approaches to change blindness. Vis. Cogn. 7(1–3), 1–15 (2000)CrossRefGoogle Scholar
  36. 36.
    Torlak, E., Chang, F.S.H., Jackson, D.: Finding minimal unsatisfiable cores of declarative specifications. In: International Symposium on Formal Methods (FM) (2008)Google Scholar
  37. 37.
    Wills, G.J.: Visual exploration of large structured datasets. In: Proceedings of New Techniques and Trends in Statistics (NTTS), pp. 237–246 (1997)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Brown UniversityProvidenceUSA
  2. 2.Worcester Polytechnic InstituteWorcesterUSA

Personalised recommendations