Skip to main content

Data Provenance for Big Data Security and Accountability

  • Reference work entry
  • First Online:
  • 43 Accesses

Synonyms

Lineage; Pedigree

Definitions

Provenance is the derivative history of data (Ko et al. 2015; Ko and Will 2014). While provenance does not directly contribute to upholding and enforcing the information security requirements (confidentiality, integrity, and availability) in the context of Big Data security, provenance and its sources (e.g., metadata, lineage, data activities (create, read, update, and delete)) strongly provide verification and historical evidence to support the analysis or forecasting needs for the purpose of data security. One example is to analyze provenance to understand and prevent outages better (Ko et al. 2012), so as to achieve better availability. Provenance also contributes strongly to data forensics, especially in the study of data activity patterns triggered by software or human processes (Ko et al. 2015) (e.g., ransomware). The lineage and metadata describing provenance also provide substantial evidence for transparency and data accountability (Ko 2014...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   849.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Agrawal D, Bernstein P, Bertino E, Davidson S, Dayal U, Franklin M, Gehrke J, Haas L, Halevy A, Han J, Jagadish HV, Labrinidis A, Madden S, Papakonstantinou Y, Patel JM, Ramakrishnan R, Ross K, Shahabi C, Suciu D, Vaithyanathan S, Widom J (2012) Challenges and opportunities with big data: a white paper prepared for the computing community consortium committee of the Computing Research Association. Technical report. https://cra.org/ccc/wp-content/uploads/sites/2/2015/05/bigdatawhitepaper.pdf

    Google Scholar 

  • Fu X, Gao Y, Luo B, Du X, Guizani M (2017) Security threats to hadoop: data leakage attacks and investigation. IEEE Netw 31(2):67–71

    Article  Google Scholar 

  • Ko RKL (2014) Data accountability in cloud systems. In: Nepal S, Pathan M (eds) Security, privacy and trust in cloud systems. Springer, Berlin. pp 211–238

    Chapter  Google Scholar 

  • Ko RKL, Phua TW (2017) The full provenance stack: five layers for complete and meaningful provenance. In: Proceedings of the security, privacy and anonymity in computation, communication and storage: SpaCCS 2017 international workshops, UbiSafe, ISSR, TrustData, TSP, SPIoT, NOPE, DependSys, SCS, WCSSC, MSCF and SPBD, 12–15 Dec 2017. Springer, Guangzhou

    Chapter  Google Scholar 

  • Ko RKL, Will MA (2014) Progger: an efficient, tamper-evident kernel-space logger for cloud data provenance tracking. In: Proceedings of the IEEE international conference on cloud computing, CLOUD ’14. IEEE Computer Society, Washington, DC, pp 881–889. https://doi.org/10.1109/CLOUD.2014.121

    Google Scholar 

  • Ko RKL, Jagadpramana P, Lee BS (2011) Flogger: a file-centric logger for monitoring file access and transfers within cloud computing environments. In: Proceedings of the IEEE 10th international conference on trust, security and privacy in computing and communications, TRUSTCOM ’11. IEEE Computer Society, Washington, DC, pp 765–771. https://doi.org/10.1109/TrustCom.2011.100

    Google Scholar 

  • Ko RKL, Lee SSG, Rajan V (2012) Understanding cloud failures. IEEE Spectr 49(12):84–84. https://doi.org/10.1109/MSPEC.2012.6361788

    Article  Google Scholar 

  • Ko RKL, Russello G, Nelson R, Pang S, Cheang A, Dobbie G, Sarrafzadeh A, Chaisiri S, Asghar MR, Holmes G (2015) Stratus: towards returning data control to cloud users. In: International conference on algorithms and architectures for parallel processing. Springer, pp 57–70

    Google Scholar 

  • Muniswamy-Reddy KK, Holland DA, Braun U, Seltzer MI (2006) Provenance-aware storage systems. In: USENIX annual technical conference, general track. pp 43–56

    Google Scholar 

  • Xie Y, Muniswamy-Reddy KK, Feng D, Li Y, Long DDE (2013) Evaluation of a hybrid approach for efficient provenance storage. Trans Storage 9(4):14:1–14:29. https://doi.org/10.1145/2501986

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryan K. L. Ko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Phua, T.W., Ko, R.K.L. (2019). Data Provenance for Big Data Security and Accountability. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_237

Download citation

Publish with us

Policies and ethics