Abstract
With an increasing number of applications being mirgrated to cloud, it becomes evident that faults in these applications or the underlying cloud platform can be costly. In cases where a system fault occurs, administrators often find themselves needing to answer attribution questions, to perform a variety of managerial tasks including system debugging, accountability enforcement, and attack analysis. In this chapter, we propose Secure Time-Aware Provenance (STAP), a data-centric approach that provides the fundamental functionality required to answer such attribution questions—the capability to “explain” the existence (or change) of a certain distributed system state at a given time in a potentially adversarial environment.
The proposed STAP model allows consistent and complete explanations of system state (and changes) in dynamic environments, and can be efficiently maintained and queried even in potentially adversarial environments. STAPincorporates tamper-evident properties, and guarantees eventual detection of compromised nodes that lie or falsely implicate correct nodes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
After retrieving the recv entry based on VID and RTime, we use the STime (sender’s timestamp) attribute in recv to fetch the appropriate send entry on the sender’s side. This avoids explicit time synchronization.
- 2.
MICROQUERY returns a single vertex; provenance queries must invoke it repeatedly to explore G ν . Hence the name.
References
Peter Alvaro, Tyson Condie, Neil Conway, Khaled Elmeleegy, Joseph M. Hellerstein, and Russell Sears. Boom Analytics: Exploring Data-Centric, Declarative Programming for the Cloud. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys), 2010.
Zhuowei Bao, Susan B. Davidson, Sanjeev Khanna, and Sudeepa Roy. An optimal labeling scheme for workflow provenance using skeleton labels. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD), 2010.
Zhuowei Bao, Susan B. Davidson, and Tova Milo. Labeling recursive workflow executions on-the-fly. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD), 2011.
Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. Using Magpie for request extraction and workload modelling. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2004.
Murtaza Basrai and Peter M. Chen. Cooperative ReVirt: adapting message logging for intrusion analysis. Technical Report University of Michigan CSE-TR-504-04, 2004.
Rajendra Bose and James Frew. Lineage retrieval for scientific data processing: a survey. ACM Computing Survey, 37(1):1–28, 2005.
Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. Why and where: A characterization of data provenance. In Proceedings of the International Conference on Database Theory (ICDT), 2001.
Steven Callahan, Juliana Freire, Emanuele Santos, Carlos Scheidegger, Claudio Silva, and Huy Vo. VisTrails: Visualization meets data management. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD), 2006.
David Chiyuan Chu, Lucian Popa, Arsalan Tavakoli, Joseph M. Hellerstein, Philip Levis, Scott Shenker, and Ion Stoica. The Design and Implementation of a Declarative Sensor Network System. In Proceedings of ACM Conference on Embedded networked Sensor Systems (SenSys), 2007.
Byung-Gon Chun, Petros Maniatis, Scott Shenker, and John Kubiatowicz. Attested append-only memory: Making adversaries stick to their word. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2007.
Sarah Cohen-Boulakia, Olivier Biton, Shirley Cohen, and Susan Davidson. Addressing the provenance challenge using zoom. Concurrency and Computation: Practice and Experience, 20:497–506, 2008.
Yingwei Cui, Jennifer Widom, and Janet L.Wiener. Tracing the lineage of view data in a warehousing environment. ACM Transaction on Database Systems (TODS), 25, 2000.
Susan B. Davidson, Sarah Cohen Boulakia, Anat Eyal, Bertram Ludäscher, Timothy M. McPhillips, Shawn Bowers, Manish Kumar Anand, and Juliana Freire. Provenance in scientific workflow systems. IEEE Data Engineering Bulletin, 30(4):44–50, 2007.
Susan B. Davidson, Sanjeev Khanna, Tova Milo, Debmalya Panigrahi, and Sudeepa Roy. Provenance views for module privacy. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS), 2011.
Susan B. Davidson, Sanjeev Khanna, Sudeepa Roy, Julia Stoyanovich, Val Tannen, Yi Chen, and Tova Milo. Enabling privacy in provenance-aware workflow systems. In Proceedings of Biennial Conference on Innovative Data System Research (CIDR), 2011.
Anja Feldmann, Olaf Maennel, Z. Morley Mao, Arthur Berger, and Bruce Maggs. Locating internet routing instabilities. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMMM), 2004.
Ian T. Foster, Jens-S. Vöckler, Michael Wilde, and Yong Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of Scientific and Statistical Database Management Conference (SSDBM), 2002.
Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. Provenance for computational tasks: A survey. Computing in Science and Engineering, 10, 2008.
James Frew and Peter Slaughter. Provenance and annotation of data and processes. Chapter ES3: A Demonstration of Transparent Provenance for Scientific Computation, pages 200–207. Springer-Verlag, Berlin, Heidelberg, 2008.
Yun Fu, Jeffrey Chase, Brent Chun, Stephen Schwab, and Amin Vahdat. SHARP: An architecture for secure resource peering. In Proceedings of ACM Symposium on Operating Systems Principles (SOSP), 2003.
Dennis Geels, Gautam Altekar, Petros Maniatis, Timothy Roscoe, and Ion Stoica. Friday: Global Comprehension for Distributed Replay. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2007.
Dennis Geels, Gautam Altekar, Scott Shenker, and Ion Stoica. Replay Debugging for Distributed Applications. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC), 2006.
Boris Glavic and Gustavo Alonso. Perm: Processing provenance and data on the same data model through query rewriting. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), 2009.
Jim Gray, Paul McJones, Mike Blasgen, Bruce Lindsay, Raymond Lorie, Tom Price, Franco Putzolu, and Irving Traiger. The recovery manager of the system r database manager. ACM Computing Survey, 13(2):223–242, 1981.
Todd J. Green, Grigoris Karvounarakis, Zachary G. Ives, and Val Tannen. Update exchange with mappings and provenance. In Proceedings of the International Conference on Very Large Databases (VLDB), 2007.
Todd J. Green, Grigoris Karvounarakis, and Val Tannen. Provenance semirings. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS), 2007.
Todd J. Green, Grigoris Karvounarakis, Nicholas E. Taylor, Olivier Biton, Zachary G. Ives, and Val Tannen. ORCHESTRA: Facilitating collaborative data sharing. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD), 2007.
Timothy G. Griffin, F. Bruce Shepherd, and Gordon Wilfong. The stable paths problem and interdomain routing. IEEE/ACM Transactions on Networking, 10(2):232–243, April 2002.
Alexander J. T. Gurney, Andreas Haeberlen, Wenchao Zhou, Micah Sherr, and Boon Thau Loo. Having your cake and eating it too: Routing security with privacy protections. In Proceedings of the ACM Workshop on Hot Topics in Networks (HotNets-X), 2011.
Hadoop. http://hadoop.apache.org/.
Andreas Haeberlen, Ioannis Avramopoulos, Jennifer Rexford, and Peter Druschel. NetReview: Detecting when interdomain routing goes wrong. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2009.
Andreas Haeberlen and Petr Kuznetsov. The Fault Detection Problem. In Proceedings of the International Conference on Principles of Distributed Systems (OPODIS), 2009.
Andreas Haeberlen, Petr Kuznetsov, and Peter Druschel. PeerReview: Practical accountability for distributed systems. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2007.
Ragib Hasan, Radu Sion, and Marianne Winslett. Preventing history forgery with secure provenance. ACM Transactions on Storage (TOS), 5(4):1–43, 2009.
James J. Hunt, Kiem-Phong Vo, and Walter F. Tichy. Delta algorithms: an empirical analysis. ACM Transactions on Software Engineering and Methodology (TOSEM), 7(2):192–214, 1998.
Robert Ikeda, Hyunjung Park, and Jennifer Widom. Provenance for generalized map and reduce workflows. In Proceedings of Biennial Conference on Innovative Data System Research (CIDR), 2011.
Robert Ikeda and Jennifer Widom. Panda: A system for provenance and data. IEEE Data Engineering Bulletin, Special Issue on Data Provenance, 33:42–49, 2010.
C. S. Jensen, J. Clifford, S. K. Gadia, A. Segev, and Richard Thomas Snodgrass. A glossary of temporal database concepts. SIGMOD Record, 21:35–43, 1992.
Grigoris Karvounarakis, Zachary G. Ives, and Val Tannen. Querying data provenance. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD), 2010.
Bernhard Kauer. OSLO: Improving the security of Trusted Computing. In Proceedings of the USENIX Security Symposium (USENIX Security), 2007.
Charles Killian, James W. Anderson, Ranjit Jhala, and Amin Vahdat. Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2007.
Samuel T. King and Peter M. Chen. Backtracking intrusions. ACM Transactions on Computer Systems, 23(1):51–76, 2005.
Samuel T. King, Z. Morley Mao, Dominic Lucchetti, and Peter Chen. Enriching intrusion alerts through multi-host causality. In Proceedings of Network and Distributed System Security Symposium (NDSS), 2005.
Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. Zyzzyva: Speculative Byzantine fault tolerance. In Proceedings of ACM Symposium on Operating Systems Principles (SOSP), 2007.
Anil Kumar, Vassilis J. Tsotras, and Christos Faloutsos. Designing access methods for bitemporal databases. IEEE Transaction on Knowledge and Data Engineering (TKDE), 10:1–20, 1998.
Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems (TOPLAS), 4(3):382–401, 1982.
Dave Levin, John R. Douceur, Jacob R. Lorch, and Thomas Moscibroda. TrInc: Small Trusted Hardware for Large Distributed Systems. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2009.
Changbin Liu, Ricardo Correa, Harjot Gill, Tanveer Gill, Xiaozhou Li, Shivkumar Muthukumar, Taher Saeed, Boon Thau Loo, and Prithwish Basu. PUMA: Policy-based Unified Multi-radio Architecture for Agile Mesh Networking. In Proceedings of International Conference on Communication Systems and Networks (COMSNETS), 2012.
Changbin Liu, Richardo Correa, Xiaozhou Li, Prithwish Basu, Boon Thau Loo, and Yun Mao. Declarative policy-based adaptive mobile ad hoc networking. IEEE/ACM Transactions on Networking (TON), 2011.
Xuezheng Liu, Zhenyu Guo, Xi Wang, Feibo Chen, Xiaochen Lian, Jian Tang, Ming Wu, M. Frans Kaashoek, and Zheng Zhang. D3S: debugging deployed distributed systems. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2008.
Xuezheng Liu, Wei Lin, Aimin Pan, and Zheng Zhang. WiDS Checker: Combating Bugs in Distributed Systems. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2007.
Boon Thau Loo, Tyson Condie, Minos Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. Declarative Networking: Language, Execution and Optimization. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD), 2006.
Boon Thau Loo, Tyson Condie, Minos Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. Declarative Networking. Communication of ACM, 2009.
Boon Thau Loo, Tyson Condie, Joseph M. Hellerstein, Petros Maniatis, Timothy Roscoe, and Ion Stoica. Implementing Declarative Overlays. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2005.
Boon Thau Loo, Joseph M. Hellerstein, Ion Stoica, and Raghu Ramakrishnan. Declarative routing: extensible routing with declarative queries. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMMM), 2005.
Yun Mao, Boon Thau Loo, Zachary Ives, and Jonathan M. Smith. MOSAIC: Unified Platform for Dynamic Overlay Selection and Composition. In Proceedings of ACM International Conference on emerging Networking EXperiments and Technologies (CoNEXT), 2008.
Patrick McDaniel, Kevin Butler, Stephen McLaughlin, Radu Sion, Erez Zadok, and Marianne Winslett. Towards a Secure and Efficient System for End-to-End Provenance. In Proceedings of the Workshop on the Theory and Practice of Provenance (TaPP), 2010.
C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS), 17(1):94–162, 1992.
Luc Moreau, Beth Plale, Simon Miles, Carole Goble, Paolo Missier, Roger Barga, Yogesh Simmhan, Joe Futrelle, Robert E. McGrath, Jim Myers, Patrick Paulson, Shawn Bowers, Bertram Ludaescher, Natalia Kwasnikowska, Jan Van den Bussche, Tommy Ellkvist, Juliana Freire, and Paul Groth. The open provenance model (v1.01). http://eprints.ecs.soton.ac.uk/16148/1/opm-v1.01.pdf.
Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Margo Seltzer. Provenance-aware storage systems. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC), 2006.
Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer. Provenance for the cloud. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST), 2010.
Tsuen-Wan Ngan, Dan Wallach, and Peter Druschel. Enforcing fair sharing of peer-to-peer resources. In Proceedings of International Workshop on Peer-to-Peer Systems (IPTPS), 2003.
Tom Oinn, Matthew Addis, Justin Ferris, Darren Marvin, Tim Carver, Matthew R. Pocock, and Anil Wipat. Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20:3045–3054, 2004.
Adam J. Oliner and Alex Aiken. A query language for understanding component interactions in production systems. In Proceedings of the ACM International Conference on Supercomputing (ICS), 2010.
Hweehwa Pang and Kian-Lee Tan. Verifying Completeness of Relational Query Answers from Online Servers. ACM Transactions on Information and System Security (TISSEC), 11(2):1–50, 2008.
Quagga Routing Suite. http://www.quagga.net/.
Anirudh Ramachandran, Kaushik Bhandankar, Mukarram Bin Tariq, and Nick Feamster. Packets with provenance. Technical Report GT-CS-08-02, Georgia Tech, 2008.
Christopher Ré, Nilesh Dalvi, and Dan Suciu. Efficient top-k query evaluation on probabilistic data. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), 2007.
Christopher Ré and Dan Suciu. Approximate lineage for probabilistic databases. In Proceedings of the International Conference on Very Large Databases (VLDB), 2008.
Patrick Reynolds, Charles Edwin Killian, Janet L. Wiener, Jeffrey C. Mogul, Mehul A. Shah, and Amin Vahdat. Pip: Detecting the Unexpected in Distributed Systems. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2006.
Mendel Rosenblum and John K. Ousterhout. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems (TOCS), 10(1):26–52, 1992.
Carlos Eduardo Scheidegger, Huy T. Vo, David Koop, Juliana Freire, and Cláudio T. Silva. Querying and creating visualizations by analogy. IEEE Transactions on Visualization and Computing Graphics (TOVCG), 13(6):1560–1567, 2007.
Margo Seltzer, Keith Bostic, Marshall Kirk Mckusick, and Carl Staelin. An implementation of a log-structured file system for unix. In Proceedings of the USENIX Winter Conference (USENIX Winter), 1993.
Kulesh Shanmugasundaram, Nasir Memon, Anubhav Savant, and Herve Bronnimann. ForNet: A distributed forensics network. In Proceedings of International Workshop on Mathematical Methods, Models and Architectures for Computer Networks Security (MMM-ACNS), 2003.
Micah Sherr, Andrew Mao, William R. Marczak, Wenchao Zhou, Boon Thau Loo, and Matt Blaze. A3: An Extensible Platform for Application-Aware Anonymity. In Proceedings of Network and Distributed System Security (NDSS), 2010.
Atul Singh, Petros Maniatis, Timothy Roscoe, and Peter Druschel. Using queries for distributed monitoring and forensics. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys), 2006.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMMM), 2001.
Workflow System, Ilkay Altintas, Oscar Barney, and Efrat Jaeger-frank. Provenance collection support in the kepler scientific workflow system. In Proceedings of the International Provenance and Annotation Workshop (IPAW), 2006.
Renata Teixeira and Jennifer Rexford. A measurement framework for pin-pointing routing changes. In Proceedings of the ACM SIGCOMM Network Troubleshooting Workshop, 2004.
Walter F. Tichy. Design, implementation, and evaluation of a revision control system. In Proceedings of the International Conference on Software Engineering (ICSE), 1982.
The Stanford WebBase Project. http://diglib.stanford.edu/~testbed/doc2/WebBase/.
Jennifer Widom. Trio: A system for integrated management of data, accuracy, and lineage. In Proceedings of Biennial Conference on Innovative Data System Research (CIDR), 2005.
Yinglian Xie, Vyas Sekar, Mike Reiter, and Hui Zhang. Forensic analysis for epidemic attacks in federated networks. In Proceedings of the IEEE International Conference on Network Protocols (ICNP), 2006.
Mingchen Zhao, Wenchao Zhou, Alexander J. T. Gurney, Andreas Haeberlen, Micah Sherr, and Boon Thau Loo. Private and verifiable interdomain routing decisions. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMMM), 2012.
Wenchao Zhou, Qiong Fei, Arjun Narayan, Andreas Haeberlen, Boon Thau Loo, and Micah Sherr. Secure network provenance. In Proceedings of ACM Symposium on Operating Systems Principles (SOSP), 2011.
Wenchao Zhou, Qiong Fei, Arjun Narayan, Andreas Haeberlen, Boon Thau Loo, and Micah Sherr. Secure network provenance. Technical Report MS-CIS-11-14, University of Pennsylvania, 2011.
Wenchao Zhou, Qiong Fei, Shengzhi Sun, Tao Tao, Andreas Haeberlen, Zachary Ives, Boon Thau Loo, and Micah Sherr. NetTrails: A declarative platform for provenance maintenance and querying in distributed systems. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD) – demonstration, 2011.
Wenchao Zhou, Suyog Mapara, Yiqing Ren, Yang Li, Andreas Haeberlen, Zachary Ives, Boon Thau Loo, and Micah Sherr. Distributed time-aware provenance. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 2013.
Wenchao Zhou, Micah Sherr, Tao Tao, Xiaozhou Li, Boon Thau Loo, and Yun Mao. Efficient querying and maintenance of network provenance at Internet-scale. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD), 2010.
Acknowledgements
The research work presented in this chapter are performed in collaboration with Boon Thau Loo, Andreas Haeberlan and Zachary Ives from University of Pennsylvania, and Micah Sherr from Georgetown University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Zhou, W. (2014). Towards a Data-Centric Approach to Attribution in the Cloud. In: Jajodia, S., Kant, K., Samarati, P., Singhal, A., Swarup, V., Wang, C. (eds) Secure Cloud Computing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-9278-8_13
Download citation
DOI: https://doi.org/10.1007/978-1-4614-9278-8_13
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-9277-1
Online ISBN: 978-1-4614-9278-8
eBook Packages: Computer ScienceComputer Science (R0)