Skip to main content

Humble Data Management to Big Data Analytics/Science: A Retrospective Stroll

  • Conference paper
  • First Online:
  • 1521 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11297))

Abstract

We are on the cusp of analyzing a variety of data being collected in every walk of life in diverse ways and holistically as well as developing a science (Big Data Science) to benefit humanity at large in the best possible way. This warrants developing and using new approaches – technological, scientific, and systems – in addition to building upon and integrating with the ones that have been developed so far. With this ambitious goal, there is also the accompanying risk of these advancements being misused or abused as we have seen so many times with respect to new technologies.

In this paper, we plan on providing a retrospective bird’s-eye-view on the approaches that have come about for managing and analyzing data over the last 40+ years. Since the advent of Database Management Systems (or DBMSs) and especially the Relational DBMSs (or RDBMSs), data management and analysis have seen several significant strides. Today, data has become an important tool (or even a weapon) in society and its role and importance is unprecedented.

The goal of this paper is to provide the reader an understanding of data management and analysis approaches with respect to where we have come from, motivations for developing them, and what this journey has been about in a short span of 40+ years. We sincerely hope this presentation provides a historical as well as a pedagogical perspective for those who are new to the field and provides a useful perspective that they can relate to and appreciate for those who have been working and contributing to the field.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Prof. Michael Stonebraker fondly refers to SQL as the inter-galactic data speak. Others may see it differently.

  2. 2.

    This could be for a building, mall, check post, or a parking lot etc.

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)

    Article  Google Scholar 

  2. Anwar, E., Maugis, L., Chakravarthy, S.: A new perspective on rule support for object-oriented databases. In: SIGMOD Conference, pp. 99–108 (1993)

    Article  Google Scholar 

  3. Arasu, A., Widom, J.: A denotational semantics for continuous queries over streams and relations. SIGMOD Rec. 33(3), 6–12 (2004)

    Article  Google Scholar 

  4. Balachandran, R., Padmanabhan, S., Chakravarthy, S.: Enhanced DB-subdue: supporting subtle aspects of graph mining using a relational approach. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 673–678. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_77

    Chapter  Google Scholar 

  5. Bodra, J., Das, S., Santra, A., Chakravarthy, S.: Query processing on large graphs: scalability through partitioning. In: Ordonez, C., Bellatreche, L. (eds.) DaWaK 2018. LNCS, vol. 11031, pp. 271–288. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98539-8_21

    Chapter  Google Scholar 

  6. Bodra, J.D.: Processing Queries Over Partitioned Graph Databases: An Approach And It’s Evaluation. Master’s thesis, The University of Texas at Arlington, May 2016

    Google Scholar 

  7. Chakravarthy, S., Anwar, E., Maugis, L., Mishra, D.: Design of sentinel: an object-oriented DBMS with event-based rules. Inf. Softw. Technol. 36(9), 559–568 (1994)

    Article  Google Scholar 

  8. Chakravarthy, S., et al.: HiPAC: A Research Project in Active. Time-Constrained Database Management. Technical report, Xerox Advanced Information Technology, Cambridge (1989)

    Google Scholar 

  9. Chakravarthy, S.: Divide and conquer: a basis for augmenting a conventional query optimizer with multiple query proceesing capabilities. In: ICDE, pp. 482–490 (1991)

    Google Scholar 

  10. Chakravarthy, S., Beera, R., Balachandran, R.: DB-subdue: database approach to graph mining. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 341–350. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24775-3_42

    Chapter  Google Scholar 

  11. Chakravarthy, S., Jiang, Q.: Stream Data Management: A Quality of Service Perspective. Springer, Boston (2009)

    MATH  Google Scholar 

  12. Chakravarthy, S., Krishnaprasad, V., Anwar, E., Kim, S.: Composite events for active databases: semantics, contexts and detection. In: VLDB, pp. 606–617 (1994)

    Google Scholar 

  13. Chakravarthy, S., Nesson, S.: Making an object-oriented DBMS active: design, implementation, and evaluation of a prototype. In: Bancilhon, F., Thanos, C., Tsichritzis, D. (eds.) EDBT 1990. LNCS, vol. 416, pp. 393–406. Springer, Heidelberg (1990). https://doi.org/10.1007/BFb0022185

    Chapter  Google Scholar 

  14. Chakravarthy, U.S., Grant, J., Minker, J.: Logic-based approach to semantic query optimization. ACM Trans. Database Syst. 15(2), 162–207 (1990)

    Article  Google Scholar 

  15. Chakravarthy, U.S., Minker, J.: Multiple query processing in deductive databases using query graphs. In: VLDB, pp. 384–391 (1986)

    Google Scholar 

  16. Chang, F., et al.: Bigtable: a distributed storage system for structured data (awarded best paper!). In: 7th Symposium on Operating Systems Design and Implementation (OSDI 2006), 6–8 November 2006, Seattle, WA, USA, pp. 205–218 (2006). http://www.usenix.org/events/osdi06/tech/chang.html

  17. Chellappa, R.: Frontiers in image and video analysis NSF/FBI/DARPA workshop report. In: Workshop, p. 120 (2014). www.umiacs.umd.edu/~rama/NSF_report.pdf

  18. Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)

    Article  Google Scholar 

  19. Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Intell. Res. 1, 231–255 (1994)

    Article  Google Scholar 

  20. Das, S.: Divide and Conquer Approach to Scalable Substructure Discovery: Partitioning Schemes, Algorithms, Optimization And Performance Analysis Using Map/reduce Paradigm. Ph.D. thesis, The University of Texas at Arlington, May 2017

    Google Scholar 

  21. Das, S., Chakravarthy, S.: Partition and conquer: map/reduce way of substructure discovery. In: Madria, S., Hara, T. (eds.) DaWaK 2015. LNCS, vol. 9263, pp. 365–378. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22729-0_28

    Chapter  Google Scholar 

  22. Das, S., Chakravarthy, S.: Duplicate reduction in graph mining: approaches, analysis, and evaluation. IEEE Trans. Knowl. Data Eng. 30(8), 1454–1466 (2018). https://doi.org/10.1109/TKDE.2018.2795003

    Article  Google Scholar 

  23. Das, S., Goyal, A., Chakravarthy, S.: Plan before you execute: a cost-based query optimizer for attributed graph databases. In: DaWaK 2016, Porto, Portugal, 6–8 September 2016, pp. 314–328 (2016)

    Chapter  Google Scholar 

  24. Dayal, U., et al.: The HiPAC project: combining active databases and timing constraints. SIGMOD Rec. 17(1), 51–70 (1988)

    Article  Google Scholar 

  25. Dayal, U., Buchmann, A.P., Chakravarthy, S.: The HiPAC project. In: Active Database Systems: Triggers and Rules for Advanced Database Processing, pp. 177–206. Morgan Kaufmann (1996)

    Google Scholar 

  26. Dittrich, K.R., Kotz, A.M., Mulle, J.A.: An event/trigger mechanism to enforce complex consistency constraints in design databases. SIGMOD Rec. 15(3), 22–36 (1986)

    Article  Google Scholar 

  27. Engström, H., Chakravarthy, S., Lings, B.: A systematic approach to selecting maintenance policies in a data warehouse environment. In: Jensen, C.S., et al. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 317–335. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45876-X_22

    Chapter  MATH  Google Scholar 

  28. Engström, H., Chakravarthy, S., Lings, B.: Implementation and comparative evaluation of maintenance policies in a data warehouse environment. In: Eaglestone, B., North, S., Poulovassilis, A. (eds.) BNCOD 2002. LNCS, vol. 2405, pp. 90–102. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45495-0_14

    Chapter  MATH  Google Scholar 

  29. Engström, H., Chakravarthy, S., Lings, B.: A heuristic for refresh policy selection in heterogeneous environments. In: ICDE, pp. 674–676 (2003)

    Google Scholar 

  30. Engström, H., Chakravarthy, S., Lings, B.: Maintenance policy selection in heterogeneous data warehouse environments: a heuristics-based approach. In: DOLAP, pp. 71–78 (2003)

    Google Scholar 

  31. Goyal, A.: QP-SUBDUE: Processing Queries Over Graph Databases. Master’s thesis, The University of Texas at Arlington, December 2015

    Google Scholar 

  32. Hwang, J.H., Cha, S., Çetintemel, U., Zdonik, S.B.: Borealis-R: a replication-transparent stream processing system for wide-area monitoring applications. In: SIGMOD Conference, pp. 1303–1306 (2008)

    Google Scholar 

  33. Jiang, Q., Adaikkalavan, R., Chakravarthy, S.: \(NFM^i\): an inter-domain network fault management system. In: ICDE, pp. 1036–1047 (2005)

    Google Scholar 

  34. Jiang, Q., Adaikkalavan, R., Chakravarthy, S.: MavEStream: synergistic integration of stream and event processing. In: International Conference on Digital Communications, p. 29 (2007)

    Google Scholar 

  35. Jiang, Q., Chakravarthy, S.: Queueing analysis of relational operators for continuous data streams. In: CIKM, pp. 271–278 (2003)

    Google Scholar 

  36. Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y., Porter, M.A.: Multilayer networks. CoRR abs/1309.7233 (2013). http://arxiv.org/abs/1309.7233

  37. Kona, H., Chakravarthy, S.: An SQL-based approach to incremental association rule mining. Found. Comput. Decis. Sci. J. (2006). Special issue

    Google Scholar 

  38. Kona, H., Chakravarthy, S.: Partitioned approach to association rule mining over multiple databases. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 320–330. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30076-2_32

    Chapter  Google Scholar 

  39. Lerner, A., Shasha, D.: Aquery: query language for ordered data, optimization techniques, and experiments. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 345–356. VLDB Endowment (2003)

    Google Scholar 

  40. Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters (2008)

    Google Scholar 

  41. Mishra, P., Chakravarthy, S.: Performance evaluation and analysis of k-way join variants for association rule mining. In: James, A., Younas, M., Lings, B. (eds.) BNCOD 2003. LNCS, vol. 2712, pp. 95–114. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45073-4_9

    Chapter  Google Scholar 

  42. Mishra, P., Chakravarthy, S.: Performance evaluation of SQL-OR variants for association rule mining. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 288–298. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45228-7_29

    Chapter  Google Scholar 

  43. Mishra, P.: Performance Evaluation and Analysis of SQL-based Approaches for Association Rule Mining. Master’s thesis, The University of Texas at Arlington, December 2002

    Google Scholar 

  44. Newman, M.: Networks: An Introduction. Oxford University Press Inc., New York (2010)

    Book  Google Scholar 

  45. Padmanabhan, S.: HDB-Subdue: A Relational Database Approach to Graph Mining and Hierarchical Reduction. Master’s thesis, The University of Texas at Arlington, December 2005

    Google Scholar 

  46. Qingchun, J.: A Framework for Supporting Quality of Service Requirements in a Data Stream Management System. Ph.D. thesis, The University of Texas at Arlington, August 2005

    Google Scholar 

  47. Ramakrishnan, R.: Database Management Systems. WCB/McGraw-Hill (1998)

    Google Scholar 

  48. Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems, 2nd edn. Benjamin/Cummings, Redwood City (1994)

    MATH  Google Scholar 

  49. Rosenthal, A., Chakravarthy, S., Blaustein, B.T., Blakeley, J.A.: Situation monitoring for active databases. In: VLDB, pp. 455–464 (1989)

    Google Scholar 

  50. Rosenthal, A., Chakravarthy, U.S.: Anatomy of a mudular multiple query optimizer. In: VLDB, pp. 230–239 (1988)

    Google Scholar 

  51. Santra, A., Bhowmick, S.: Holistic analysis of multi-source, multi-feature data: modeling and computation challenges. In: Reddy, P.K., Sureka, A., Chakravarthy, S., Bhalla, S. (eds.) BDA 2017. LNCS, vol. 10721, pp. 59–68. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72413-3_4

    Chapter  Google Scholar 

  52. Santra, A., Bhowmick, S., Chakravarthy, S.: Efficient community re-creation in multilayer networks using boolean operations. In: International Conference on Computational Science, ICCS 2017, 12–14 June 2017, Zurich, Switzerland, pp. 58–67 (2017). https://doi.org/10.1016/j.procs.2017.05.246

    Article  Google Scholar 

  53. Santra, A., Bhowmick, S., Chakravarthy, S.: HUBify: efficient estimation of central entities across multiplex layer compositions. In: 2017 IEEE International Conference on Data Mining Workshops, ICDM Workshops (2017)

    Google Scholar 

  54. Solé-Ribalta, A., De Domenico, M., Gómez, S., Arenas, A.: Centrality rankings in multiplex networks. In: Proceedings of the 2014 ACM Conference on Web Science, pp. 149–155. ACM (2014)

    Google Scholar 

  55. Stonebraker, M., Hanson, E., Potamianos, S.: The POSTGRES rule manager. IEEE Trans. Softw. Eng. 14(7), 897–907 (1988)

    Article  Google Scholar 

  56. Zdonik, S.B., Stonebraker, M., Cherniack, M., Çetintemel, U., Balazinska, M., Balakrishnan, H.: The aurora and medusa projects. IEEE Data Eng. Bull. 26(1), 3–10 (2003)

    Google Scholar 

  57. Zhang, H., Wang, C.D., Lai, J.H., Philip, S.Y.: Modularity in complex multilayer networks with multiple aspects: a static perspective. Appl. Inform. 4, 7 (2017)

    Article  Google Scholar 

Download references

Acknowledgment

We would like to thank Dr. Sanjukta Bhowmick on her collaboration with us on the multilayer network analysis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sharma Chakravarthy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chakravarthy, S., Santra, A., Komar, K.S. (2018). Humble Data Management to Big Data Analytics/Science: A Retrospective Stroll. In: Mondal, A., Gupta, H., Srivastava, J., Reddy, P., Somayajulu, D. (eds) Big Data Analytics. BDA 2018. Lecture Notes in Computer Science(), vol 11297. Springer, Cham. https://doi.org/10.1007/978-3-030-04780-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04780-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04779-5

  • Online ISBN: 978-3-030-04780-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics