Advertisement

Humble Data Management to Big Data Analytics/Science: A Retrospective Stroll

  • Sharma ChakravarthyEmail author
  • Abhishek Santra
  • Kanthi Sannappa Komar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11297)

Abstract

We are on the cusp of analyzing a variety of data being collected in every walk of life in diverse ways and holistically as well as developing a science (Big Data Science) to benefit humanity at large in the best possible way. This warrants developing and using new approaches – technological, scientific, and systems – in addition to building upon and integrating with the ones that have been developed so far. With this ambitious goal, there is also the accompanying risk of these advancements being misused or abused as we have seen so many times with respect to new technologies.

In this paper, we plan on providing a retrospective bird’s-eye-view on the approaches that have come about for managing and analyzing data over the last 40+ years. Since the advent of Database Management Systems (or DBMSs) and especially the Relational DBMSs (or RDBMSs), data management and analysis have seen several significant strides. Today, data has become an important tool (or even a weapon) in society and its role and importance is unprecedented.

The goal of this paper is to provide the reader an understanding of data management and analysis approaches with respect to where we have come from, motivations for developing them, and what this journey has been about in a short span of 40+ years. We sincerely hope this presentation provides a historical as well as a pedagogical perspective for those who are new to the field and provides a useful perspective that they can relate to and appreciate for those who have been working and contributing to the field.

Keywords

Data management Relational databases Data warehouses Event and stream data processing Data mining Video situation analysis Big data analytics/science 

Notes

Acknowledgment

We would like to thank Dr. Sanjukta Bhowmick on her collaboration with us on the multilayer network analysis.

References

  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)CrossRefGoogle Scholar
  2. 2.
    Anwar, E., Maugis, L., Chakravarthy, S.: A new perspective on rule support for object-oriented databases. In: SIGMOD Conference, pp. 99–108 (1993)CrossRefGoogle Scholar
  3. 3.
    Arasu, A., Widom, J.: A denotational semantics for continuous queries over streams and relations. SIGMOD Rec. 33(3), 6–12 (2004)CrossRefGoogle Scholar
  4. 4.
    Balachandran, R., Padmanabhan, S., Chakravarthy, S.: Enhanced DB-subdue: supporting subtle aspects of graph mining using a relational approach. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 673–678. Springer, Heidelberg (2006).  https://doi.org/10.1007/11731139_77CrossRefGoogle Scholar
  5. 5.
    Bodra, J., Das, S., Santra, A., Chakravarthy, S.: Query processing on large graphs: scalability through partitioning. In: Ordonez, C., Bellatreche, L. (eds.) DaWaK 2018. LNCS, vol. 11031, pp. 271–288. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-98539-8_21CrossRefGoogle Scholar
  6. 6.
    Bodra, J.D.: Processing Queries Over Partitioned Graph Databases: An Approach And It’s Evaluation. Master’s thesis, The University of Texas at Arlington, May 2016Google Scholar
  7. 7.
    Chakravarthy, S., Anwar, E., Maugis, L., Mishra, D.: Design of sentinel: an object-oriented DBMS with event-based rules. Inf. Softw. Technol. 36(9), 559–568 (1994)CrossRefGoogle Scholar
  8. 8.
    Chakravarthy, S., et al.: HiPAC: A Research Project in Active. Time-Constrained Database Management. Technical report, Xerox Advanced Information Technology, Cambridge (1989)Google Scholar
  9. 9.
    Chakravarthy, S.: Divide and conquer: a basis for augmenting a conventional query optimizer with multiple query proceesing capabilities. In: ICDE, pp. 482–490 (1991)Google Scholar
  10. 10.
    Chakravarthy, S., Beera, R., Balachandran, R.: DB-subdue: database approach to graph mining. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 341–350. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-24775-3_42CrossRefGoogle Scholar
  11. 11.
    Chakravarthy, S., Jiang, Q.: Stream Data Management: A Quality of Service Perspective. Springer, Boston (2009)zbMATHGoogle Scholar
  12. 12.
    Chakravarthy, S., Krishnaprasad, V., Anwar, E., Kim, S.: Composite events for active databases: semantics, contexts and detection. In: VLDB, pp. 606–617 (1994)Google Scholar
  13. 13.
    Chakravarthy, S., Nesson, S.: Making an object-oriented DBMS active: design, implementation, and evaluation of a prototype. In: Bancilhon, F., Thanos, C., Tsichritzis, D. (eds.) EDBT 1990. LNCS, vol. 416, pp. 393–406. Springer, Heidelberg (1990).  https://doi.org/10.1007/BFb0022185CrossRefGoogle Scholar
  14. 14.
    Chakravarthy, U.S., Grant, J., Minker, J.: Logic-based approach to semantic query optimization. ACM Trans. Database Syst. 15(2), 162–207 (1990)CrossRefGoogle Scholar
  15. 15.
    Chakravarthy, U.S., Minker, J.: Multiple query processing in deductive databases using query graphs. In: VLDB, pp. 384–391 (1986)Google Scholar
  16. 16.
    Chang, F., et al.: Bigtable: a distributed storage system for structured data (awarded best paper!). In: 7th Symposium on Operating Systems Design and Implementation (OSDI 2006), 6–8 November 2006, Seattle, WA, USA, pp. 205–218 (2006). http://www.usenix.org/events/osdi06/tech/chang.html
  17. 17.
    Chellappa, R.: Frontiers in image and video analysis NSF/FBI/DARPA workshop report. In: Workshop, p. 120 (2014). www.umiacs.umd.edu/~rama/NSF_report.pdf
  18. 18.
    Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)CrossRefGoogle Scholar
  19. 19.
    Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Intell. Res. 1, 231–255 (1994)CrossRefGoogle Scholar
  20. 20.
    Das, S.: Divide and Conquer Approach to Scalable Substructure Discovery: Partitioning Schemes, Algorithms, Optimization And Performance Analysis Using Map/reduce Paradigm. Ph.D. thesis, The University of Texas at Arlington, May 2017Google Scholar
  21. 21.
    Das, S., Chakravarthy, S.: Partition and conquer: map/reduce way of substructure discovery. In: Madria, S., Hara, T. (eds.) DaWaK 2015. LNCS, vol. 9263, pp. 365–378. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-22729-0_28CrossRefGoogle Scholar
  22. 22.
    Das, S., Chakravarthy, S.: Duplicate reduction in graph mining: approaches, analysis, and evaluation. IEEE Trans. Knowl. Data Eng. 30(8), 1454–1466 (2018).  https://doi.org/10.1109/TKDE.2018.2795003CrossRefGoogle Scholar
  23. 23.
    Das, S., Goyal, A., Chakravarthy, S.: Plan before you execute: a cost-based query optimizer for attributed graph databases. In: DaWaK 2016, Porto, Portugal, 6–8 September 2016, pp. 314–328 (2016)CrossRefGoogle Scholar
  24. 24.
    Dayal, U., et al.: The HiPAC project: combining active databases and timing constraints. SIGMOD Rec. 17(1), 51–70 (1988)CrossRefGoogle Scholar
  25. 25.
    Dayal, U., Buchmann, A.P., Chakravarthy, S.: The HiPAC project. In: Active Database Systems: Triggers and Rules for Advanced Database Processing, pp. 177–206. Morgan Kaufmann (1996)Google Scholar
  26. 26.
    Dittrich, K.R., Kotz, A.M., Mulle, J.A.: An event/trigger mechanism to enforce complex consistency constraints in design databases. SIGMOD Rec. 15(3), 22–36 (1986)CrossRefGoogle Scholar
  27. 27.
    Engström, H., Chakravarthy, S., Lings, B.: A systematic approach to selecting maintenance policies in a data warehouse environment. In: Jensen, C.S., et al. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 317–335. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45876-X_22CrossRefzbMATHGoogle Scholar
  28. 28.
    Engström, H., Chakravarthy, S., Lings, B.: Implementation and comparative evaluation of maintenance policies in a data warehouse environment. In: Eaglestone, B., North, S., Poulovassilis, A. (eds.) BNCOD 2002. LNCS, vol. 2405, pp. 90–102. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45495-0_14CrossRefzbMATHGoogle Scholar
  29. 29.
    Engström, H., Chakravarthy, S., Lings, B.: A heuristic for refresh policy selection in heterogeneous environments. In: ICDE, pp. 674–676 (2003)Google Scholar
  30. 30.
    Engström, H., Chakravarthy, S., Lings, B.: Maintenance policy selection in heterogeneous data warehouse environments: a heuristics-based approach. In: DOLAP, pp. 71–78 (2003)Google Scholar
  31. 31.
    Goyal, A.: QP-SUBDUE: Processing Queries Over Graph Databases. Master’s thesis, The University of Texas at Arlington, December 2015Google Scholar
  32. 32.
    Hwang, J.H., Cha, S., Çetintemel, U., Zdonik, S.B.: Borealis-R: a replication-transparent stream processing system for wide-area monitoring applications. In: SIGMOD Conference, pp. 1303–1306 (2008)Google Scholar
  33. 33.
    Jiang, Q., Adaikkalavan, R., Chakravarthy, S.: \(NFM^i\): an inter-domain network fault management system. In: ICDE, pp. 1036–1047 (2005)Google Scholar
  34. 34.
    Jiang, Q., Adaikkalavan, R., Chakravarthy, S.: MavEStream: synergistic integration of stream and event processing. In: International Conference on Digital Communications, p. 29 (2007)Google Scholar
  35. 35.
    Jiang, Q., Chakravarthy, S.: Queueing analysis of relational operators for continuous data streams. In: CIKM, pp. 271–278 (2003)Google Scholar
  36. 36.
    Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y., Porter, M.A.: Multilayer networks. CoRR abs/1309.7233 (2013). http://arxiv.org/abs/1309.7233
  37. 37.
    Kona, H., Chakravarthy, S.: An SQL-based approach to incremental association rule mining. Found. Comput. Decis. Sci. J. (2006). Special issueGoogle Scholar
  38. 38.
    Kona, H., Chakravarthy, S.: Partitioned approach to association rule mining over multiple databases. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 320–330. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-30076-2_32CrossRefGoogle Scholar
  39. 39.
    Lerner, A., Shasha, D.: Aquery: query language for ordered data, optimization techniques, and experiments. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 345–356. VLDB Endowment (2003)Google Scholar
  40. 40.
    Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters (2008)Google Scholar
  41. 41.
    Mishra, P., Chakravarthy, S.: Performance evaluation and analysis of k-way join variants for association rule mining. In: James, A., Younas, M., Lings, B. (eds.) BNCOD 2003. LNCS, vol. 2712, pp. 95–114. Springer, Heidelberg (2003).  https://doi.org/10.1007/3-540-45073-4_9CrossRefGoogle Scholar
  42. 42.
    Mishra, P., Chakravarthy, S.: Performance evaluation of SQL-OR variants for association rule mining. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 288–298. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-540-45228-7_29CrossRefGoogle Scholar
  43. 43.
    Mishra, P.: Performance Evaluation and Analysis of SQL-based Approaches for Association Rule Mining. Master’s thesis, The University of Texas at Arlington, December 2002Google Scholar
  44. 44.
    Newman, M.: Networks: An Introduction. Oxford University Press Inc., New York (2010)CrossRefGoogle Scholar
  45. 45.
    Padmanabhan, S.: HDB-Subdue: A Relational Database Approach to Graph Mining and Hierarchical Reduction. Master’s thesis, The University of Texas at Arlington, December 2005Google Scholar
  46. 46.
    Qingchun, J.: A Framework for Supporting Quality of Service Requirements in a Data Stream Management System. Ph.D. thesis, The University of Texas at Arlington, August 2005Google Scholar
  47. 47.
    Ramakrishnan, R.: Database Management Systems. WCB/McGraw-Hill (1998)Google Scholar
  48. 48.
    Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems, 2nd edn. Benjamin/Cummings, Redwood City (1994)zbMATHGoogle Scholar
  49. 49.
    Rosenthal, A., Chakravarthy, S., Blaustein, B.T., Blakeley, J.A.: Situation monitoring for active databases. In: VLDB, pp. 455–464 (1989)Google Scholar
  50. 50.
    Rosenthal, A., Chakravarthy, U.S.: Anatomy of a mudular multiple query optimizer. In: VLDB, pp. 230–239 (1988)Google Scholar
  51. 51.
    Santra, A., Bhowmick, S.: Holistic analysis of multi-source, multi-feature data: modeling and computation challenges. In: Reddy, P.K., Sureka, A., Chakravarthy, S., Bhalla, S. (eds.) BDA 2017. LNCS, vol. 10721, pp. 59–68. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-72413-3_4CrossRefGoogle Scholar
  52. 52.
    Santra, A., Bhowmick, S., Chakravarthy, S.: Efficient community re-creation in multilayer networks using boolean operations. In: International Conference on Computational Science, ICCS 2017, 12–14 June 2017, Zurich, Switzerland, pp. 58–67 (2017). https://doi.org/10.1016/j.procs.2017.05.246CrossRefGoogle Scholar
  53. 53.
    Santra, A., Bhowmick, S., Chakravarthy, S.: HUBify: efficient estimation of central entities across multiplex layer compositions. In: 2017 IEEE International Conference on Data Mining Workshops, ICDM Workshops (2017)Google Scholar
  54. 54.
    Solé-Ribalta, A., De Domenico, M., Gómez, S., Arenas, A.: Centrality rankings in multiplex networks. In: Proceedings of the 2014 ACM Conference on Web Science, pp. 149–155. ACM (2014)Google Scholar
  55. 55.
    Stonebraker, M., Hanson, E., Potamianos, S.: The POSTGRES rule manager. IEEE Trans. Softw. Eng. 14(7), 897–907 (1988)CrossRefGoogle Scholar
  56. 56.
    Zdonik, S.B., Stonebraker, M., Cherniack, M., Çetintemel, U., Balazinska, M., Balakrishnan, H.: The aurora and medusa projects. IEEE Data Eng. Bull. 26(1), 3–10 (2003)Google Scholar
  57. 57.
    Zhang, H., Wang, C.D., Lai, J.H., Philip, S.Y.: Modularity in complex multilayer networks with multiple aspects: a static perspective. Appl. Inform. 4, 7 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Sharma Chakravarthy
    • 1
    Email author
  • Abhishek Santra
    • 1
  • Kanthi Sannappa Komar
    • 1
  1. 1.Information Technology Laboratory (IT Lab), Computer Science and Engineering DepartmentUniversity of Texas at ArlingtonArlingtonUSA

Personalised recommendations