From Here to Provtopia

  • Thomas PasquierEmail author
  • David Eyers
  • Margo Seltzer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11721)


Valuable, sensitive, and regulated data flow freely through distributed systems. In such a world, how can systems plausibly comply with the regulations governing the collection, use, and management of such data? We claim that distributed data provenance, the directed acyclic graph documenting the origin and transformations of data holds the key. Provenance analysis has already been demonstrated in a wide range of applications: from intrusion detection to performance analysis. We describe how similar systems and analysis techniques are suitable both for implementing the complex policies that govern data and verifying compliance with regulatory mandates. We also highlight the challenges to be addressed to move provenance from research laboratories to production systems.


Provenance Distributed systems Compliance 


  1. 1.
    Alvaro, P., Tymon, S.: Abstracting the geniuses away from failure testing. ACM Queue 15, 29–53 (2017)CrossRefGoogle Scholar
  2. 2.
    Anderson, J.P.: Computer security technology planning study. Technical report. ESD-TR-73-51, ESD/AFSC, Hanscom AFB, Bedford, MA, October 1972Google Scholar
  3. 3.
    Bajikar, S.: Trusted Platform Module (TPM) based security on notebook PCs-white paper. Mobile Platforms Group Intel Corporation, pp. 1–20 (2002)Google Scholar
  4. 4.
    Bates, A., Butler, K., Moyer, T.: Take only what you need: leveraging mandatory access control policy to reduce provenance storage costs. In: Workshop on Theory and Practice of Provenance (TaPP 2015), p. 7. USENIX (2015)Google Scholar
  5. 5.
    Bates, A.M., Tian, D., Butler, K.R., Moyer, T.: Trustworthy whole-system provenance for the Linux kernel. In: USENIX Security, pp. 319–334 (2015)Google Scholar
  6. 6.
    Belhajjame, K., et al.: PROV-DM: the PROV data model. Technical report, World Wide Web Consortium (W3C) (2013).
  7. 7.
    Braun, U., Shinnar, A., Seltzer, M.I.: Securing provenance. In: HotSec (2008)Google Scholar
  8. 8.
    Buneman, P., Chapman, A., Cheney, J.: Provenance management in curated databases. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of data, pp. 539–550. ACM (2006)Google Scholar
  9. 9.
    Buneman, P., Khanna, S., Wang-Chiew, T.: Why and where: a characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2001). Scholar
  10. 10.
    CamFlow. Accessed 21 Sep 2019
  11. 11.
    Carata, L., et al.: A primer on provenance. Commun. ACM 57(5), 52–60 (2014)CrossRefGoogle Scholar
  12. 12.
    Cheney, J., Ahmed, A., Acar, U.A.: Provenance as dependency analysis. In: Arenas, M., Schwartzbach, M.I. (eds.) DBPL 2007. LNCS, vol. 4797, pp. 138–152. Springer, Heidelberg (2007). Scholar
  13. 13.
    Cheney, J., Chiticariu, L., Tan, W.C., et al.: Provenance in databases: why how and where. Found. Trends® Databases 1(4), 379–474 (2009)CrossRefGoogle Scholar
  14. 14.
    Coker, G.: Principles of remote attestation. Int. J. Inf. Secur. 10(2), 63–81 (2011)CrossRefGoogle Scholar
  15. 15.
    Edwards, A., Jaeger, T., Zhang, X.: Runtime verification of authorization hook placement for the Linux security modules framework. In: Conference on Computer and Communications Security (CCS 2002), pp. 225–234. ACM (2002)Google Scholar
  16. 16.
  17. 17.
    Fedorova, A., et al.: Performance comprehension at WiredTiger. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2018, pp. 83–94. ACM (2018)Google Scholar
  18. 18.
    Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10(3), 11–21 (2008)CrossRefGoogle Scholar
  19. 19.
    Ganapathy, V., Jaeger, T., Jha, S.: Automatic placement of authorization hooks in the Linux security modules framework. In: Conference on Computer and Communications Security (CCS 2005), pp. 330–339. ACM (2005)Google Scholar
  20. 20.
    Garfinkel, T., et al.: Traps and pitfalls: practical problems in system call interposition based security tools. NDSS 3, 163–176 (2003)Google Scholar
  21. 21.
    Georget, L., Jaume, M., Tronel, F., Piolle, G., Tong, V.V.T.: Verifying the reliability of operating system-level information flow control systems in Linux. In: International Workshop on Formal Methods in Software Engineering (FormaliSE 2017), pp. 10–16. IEEE/ACM (2017)Google Scholar
  22. 22.
    Goldman, K., Perez, R., Sailer, R.: Linking remote attestation to secure tunnel endpoints. In: Workshop on Scalable Trusted Computing, pp. 21–24. ACM (2006)Google Scholar
  23. 23.
    Han, X., Pasquier, T., Seltzer, M.: Provenance-based intrusion detection: opportunities and challenges. In: Workshop on the Theory and Practice of Provenance (TaPP 2018). USENIX (2018)Google Scholar
  24. 24.
    Han, X., Pasquier, T., Ranjan, T., Goldstein, M., Seltzer, M.: FRAPpuccino: fault-detection through runtime analysis of provenance. In: Workshop on Hot Topics in Cloud Computing (HotCloud 2017). USENIX (2017)Google Scholar
  25. 25.
    Hasan, R., Sion, R., Winslett, M.: The case of the fake picasso: preventing history forgery with secure provenance. In: Conference on File and Storage Technologies (FAST 2009). USENIX (2009)CrossRefGoogle Scholar
  26. 26.
    Hassan, W.U., Lemay, M., Aguse, N., Bates, A., Moyer, T.: Towards scalable cluster auditing through grammatical inference over provenance graphs. In: Network and Distributed Systems Security Symposium. Internet Society (2018)Google Scholar
  27. 27.
    Hossain, M.N., Wang, J., Sekar, R., Stoller, S.D.: Dependence-preserving data compaction for scalable forensic analysis. In: Security Symposium (USENIX Security 2018). USENIX Association (2018)Google Scholar
  28. 28.
    Huang, Y., Gottardo, R.: Comparability and reproducibility of biomedical data. Brief. Bioinform. 14(4), 391–401 (2012)CrossRefGoogle Scholar
  29. 29.
    Interlandi, M., et al.: Titian: data provenance support in Spark. Proc. VLDB Endowment 9(3), 216–227 (2015)CrossRefGoogle Scholar
  30. 30.
    Jaeger, T., Edwards, A., Zhang, X.: Consistency analysis of authorization hook placement in the Linux security modules framework. ACM Trans. Inf. Syst. Secur. (TISSEC) 7(2), 175–205 (2004)CrossRefGoogle Scholar
  31. 31.
    King, S.T., Chen, P.M.: Backtracking intrusions. ACM SIGOPS Oper. Syst. Rev. 37(5), 223–236 (2003)CrossRefGoogle Scholar
  32. 32.
    Lee, S., Niu, X., Ludäscher, B., Glavic, B.: Integrating approximate summarization with provenance capture. In: Workshop on the Theory and Practice of Provenance (TaPP 2017). USENIX (2017)Google Scholar
  33. 33.
    Lerner, B., Boose, E.: RDataTracker: collecting provenance in an interactive scripting environment. In: Workshop on the Theory and Practice of Provenance (TaPP 2014). USENIX (2014)Google Scholar
  34. 34.
    Li, J., et al.: PCatch: automatically detecting performance cascading bugs in cloud systems. In: EuroSys 2018, pp. 7:1–7:14. ACM (2018)Google Scholar
  35. 35.
    Moreau, L., et al.: The open provenance model core specification (v1.1). Future Gener. Comput. Syst. 27(6), 743–756 (2011)CrossRefGoogle Scholar
  36. 36.
    Moyer, T., Gadepally, V.: High-throughput ingest of data provenance records into Accumulo. In: High Performance Extreme Computing Conference (HPEC 2016), pp. 1–6. IEEE (2016)Google Scholar
  37. 37.
    Muniswamy-Reddy, K.K., et al.: Layering in provenance systems. In: USENIX Annual Technical Conference (ATC 2009) (2009)Google Scholar
  38. 38.
    Muniswamy-Reddy, K.K., Holland, D.A., Braun, U., Seltzer, M.I.: Provenance-aware storage systems. In: USENIX Annual Technical Conference (ATC 2006), pp. 43–56 (2006)Google Scholar
  39. 39.
    Oracle Corporation: Oracle Total Recall with Oracle Database 11g release 2 (2009).
  40. 40.
    Ou, X., Boyer, W.F., McQueen, M.A.: A scalable approach to attack graph generation. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, pp. 336–345. ACM (2006)Google Scholar
  41. 41.
    Pasquier, T., Eyers, D., Bacon, J.: Personal data and the Internet of Things. Commun. ACM 62(6), 32–34 (2019)CrossRefGoogle Scholar
  42. 42.
    Pasquier, T., et al.: Practical whole-system provenance capture. In: Symposium on Cloud Computing (SoCC 2017). ACM (2017)Google Scholar
  43. 43.
    Pasquier, T., et al.: Runtime analysis of whole-system provenance. In: Conference on Computer and Communications Security (CCS 2018). ACM (2018)Google Scholar
  44. 44.
    Pasquier, T., et al.: If these data could talk. Sci. Data 4 (2017). Article number: 170114.
  45. 45.
    Pasquier, T., Singh, J., Bacon, J., Eyers, D.: Information flow audit for PaaS clouds. In: IEEE International Conference on Cloud Engineering (IC2E), pp. 42–51. IEEE (2016)Google Scholar
  46. 46.
    Pasquier, T., Singh, J., Eyers, D., Bacon, J.: CamFlow: managed data-sharing for cloud services. IEEE Trans. Cloud Comput. 5, 472–484 (2015)CrossRefGoogle Scholar
  47. 47.
    Pasquier, T., Singh, J., Powles, J., Eyers, D., Seltzer, M., Bacon, J.: Data provenance to audit compliance with privacy policy in the Internet of Things. Pers. Ubiquit. Comput. 22, 333–344 (2018)CrossRefGoogle Scholar
  48. 48.
    Perez, R., Sailer, R., van Doorn, L., et al.: vTPM: virtualizing the trusted platform module. In: Proceedings of the 15th Conference on USENIX Security Symposium, pp. 305–320 (2006)Google Scholar
  49. 49.
    Pohly, D.J., McLaughlin, S., McDaniel, P., Butler, K.: Hi-Fi: collecting high-fidelity whole-system provenance. In: Annual Computer Security Applications Conference, pp. 259–268. ACM (2012)Google Scholar
  50. 50.
    Salem, A., Zhang, Y., Humbert, M., Berrang, P., Fritz, M., Backes, M.: ML-leaks: model and data independent membership inference attacks and defenses on machine learning models. arXiv preprint arXiv:1806.01246 (2018)
  51. 51.
    Schreiber, A., Struminski, R.: Tracing personal data using comics. In: Antona, M., Stephanidis, C. (eds.) UAHCI 2017. LNCS, vol. 10277, pp. 444–455. Springer, Cham (2017). Scholar
  52. 52.
    Sheyner, O., Haines, J., Jha, S., Lippmann, R., Wing, J.M.: Automated generation and analysis of attack graphs. In: Proceedings 2002 IEEE Symposium on Security and Privacy, pp. 273–284. IEEE (2002)Google Scholar
  53. 53.
    Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance techniques. Computer Science Department, Indiana University, 69 (2005)Google Scholar
  54. 54.
    Singh, G., et al.: A metadata catalog service for data intensive applications. In: ACM/IEEE Conference on Supercomputing, p. 33. IEEE (2003)Google Scholar
  55. 55.
    Swiler, L.P., Phillips, C., Ellis, D., Chakerian, S.: Computer-attack graph generation tool. In: Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX 2001, vol. 2, pp. 307–321. IEEE (2001)Google Scholar
  56. 56.
    Tariq, D., Ali, M., Gehani, A.: Towards automated collection of application-level data provenance. In: Workshop on Theory and Practice of Provenance, TaPP 2012, p. 16. USENIX (2012)Google Scholar
  57. 57.
    Wang, H., et al.: GRANO: Interactive graph-based root cause analysis for cloud-native distributed data platform. In: Proceedings of the 45th International Conference on Very Large Data Bases (VLDB) (2019, to appear)Google Scholar
  58. 58.
    Watson, R.N.: Exploiting concurrency vulnerabilities in system call wrappers. WOOT 7, 1–8 (2007)Google Scholar
  59. 59.
    Whittaker, M., Alvaro, P., Teodoropol, C., Hellerstein, J.: Debugging distributed systems with why-across-time provenance. In: Symposium on Cloud Computing, SoCC 2018, pp. 333–346. ACM (2018)Google Scholar
  60. 60.
    Wright, C., Cowan, C., Morris, J., Smalley, S., Kroah-Hartman, G.: Linux security module framework. In: Ottawa Linux Symposium, vol. 8032, pp. 6–16 (2002)Google Scholar
  61. 61.
    Xu, S.C., Rogers, T., Fairweather, E., Glenn, A.P., Curran, J.P., Curcin, V.: Application of data provenance in healthcare analytics software: information visualisation of user activities. AMIA Jt. Summits Transl. Sci. Proc. 2018, 263–272 (2018)Google Scholar
  62. 62.
    Zhou, W., Fei, Q., Narayan, A., Haeberlen, A., Loo, B.T., Sherr, M.: Secure network provenance. In: Symposium on Operating Systems Principles (SOSP 2011), pp. 295–310. ACM (2011)Google Scholar
  63. 63.
    Zhou, W., Sherr, M., Tao, T., Li, X., Loo, B.T., Mao, Y.: Efficient querying and maintenance of network provenance at internet-scale. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 615–626. ACM (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of BristolBristolUK
  2. 2.University of OtagoDunedinNew Zealand
  3. 3.University of British ColumbiaVancouverCanada

Personalised recommendations