Humble Data Management to Big Data Analytics/Science: A Retrospective Stroll

Chakravarthy, Sharma; Santra, Abhishek; Komar, Kanthi Sannappa

doi:10.1007/978-3-030-04780-1_3

Humble Data Management to Big Data Analytics/Science: A Retrospective Stroll

Sharma Chakravarthy¹⁸,
Abhishek Santra¹⁸ &
Kanthi Sannappa Komar¹⁸

Conference paper
First Online: 22 November 2018

1521 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11297))

Abstract

We are on the cusp of analyzing a variety of data being collected in every walk of life in diverse ways and holistically as well as developing a science (Big Data Science) to benefit humanity at large in the best possible way. This warrants developing and using new approaches – technological, scientific, and systems – in addition to building upon and integrating with the ones that have been developed so far. With this ambitious goal, there is also the accompanying risk of these advancements being misused or abused as we have seen so many times with respect to new technologies.

In this paper, we plan on providing a retrospective bird’s-eye-view on the approaches that have come about for managing and analyzing data over the last 40+ years. Since the advent of Database Management Systems (or DBMSs) and especially the Relational DBMSs (or RDBMSs), data management and analysis have seen several significant strides. Today, data has become an important tool (or even a weapon) in society and its role and importance is unprecedented.

The goal of this paper is to provide the reader an understanding of data management and analysis approaches with respect to where we have come from, motivations for developing them, and what this journey has been about in a short span of 40+ years. We sincerely hope this presentation provides a historical as well as a pedagogical perspective for those who are new to the field and provides a useful perspective that they can relate to and appreciate for those who have been working and contributing to the field.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Prof. Michael Stonebraker fondly refers to SQL as the inter-galactic data speak. Others may see it differently.
2.
This could be for a building, mall, check post, or a parking lot etc.

References

Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)
Article Google Scholar
Anwar, E., Maugis, L., Chakravarthy, S.: A new perspective on rule support for object-oriented databases. In: SIGMOD Conference, pp. 99–108 (1993)
Article Google Scholar
Arasu, A., Widom, J.: A denotational semantics for continuous queries over streams and relations. SIGMOD Rec. 33(3), 6–12 (2004)
Article Google Scholar
Balachandran, R., Padmanabhan, S., Chakravarthy, S.: Enhanced DB-subdue: supporting subtle aspects of graph mining using a relational approach. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 673–678. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_77
Chapter Google Scholar
Bodra, J., Das, S., Santra, A., Chakravarthy, S.: Query processing on large graphs: scalability through partitioning. In: Ordonez, C., Bellatreche, L. (eds.) DaWaK 2018. LNCS, vol. 11031, pp. 271–288. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98539-8_21
Chapter Google Scholar
Bodra, J.D.: Processing Queries Over Partitioned Graph Databases: An Approach And It’s Evaluation. Master’s thesis, The University of Texas at Arlington, May 2016
Google Scholar
Chakravarthy, S., Anwar, E., Maugis, L., Mishra, D.: Design of sentinel: an object-oriented DBMS with event-based rules. Inf. Softw. Technol. 36(9), 559–568 (1994)
Article Google Scholar
Chakravarthy, S., et al.: HiPAC: A Research Project in Active. Time-Constrained Database Management. Technical report, Xerox Advanced Information Technology, Cambridge (1989)
Google Scholar
Chakravarthy, S.: Divide and conquer: a basis for augmenting a conventional query optimizer with multiple query proceesing capabilities. In: ICDE, pp. 482–490 (1991)
Google Scholar
Chakravarthy, S., Beera, R., Balachandran, R.: DB-subdue: database approach to graph mining. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 341–350. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24775-3_42
Chapter Google Scholar
Chakravarthy, S., Jiang, Q.: Stream Data Management: A Quality of Service Perspective. Springer, Boston (2009)
MATH Google Scholar
Chakravarthy, S., Krishnaprasad, V., Anwar, E., Kim, S.: Composite events for active databases: semantics, contexts and detection. In: VLDB, pp. 606–617 (1994)
Google Scholar
Chakravarthy, S., Nesson, S.: Making an object-oriented DBMS active: design, implementation, and evaluation of a prototype. In: Bancilhon, F., Thanos, C., Tsichritzis, D. (eds.) EDBT 1990. LNCS, vol. 416, pp. 393–406. Springer, Heidelberg (1990). https://doi.org/10.1007/BFb0022185
Chapter Google Scholar
Chakravarthy, U.S., Grant, J., Minker, J.: Logic-based approach to semantic query optimization. ACM Trans. Database Syst. 15(2), 162–207 (1990)
Article Google Scholar
Chakravarthy, U.S., Minker, J.: Multiple query processing in deductive databases using query graphs. In: VLDB, pp. 384–391 (1986)
Google Scholar
Chang, F., et al.: Bigtable: a distributed storage system for structured data (awarded best paper!). In: 7th Symposium on Operating Systems Design and Implementation (OSDI 2006), 6–8 November 2006, Seattle, WA, USA, pp. 205–218 (2006). http://www.usenix.org/events/osdi06/tech/chang.html
Chellappa, R.: Frontiers in image and video analysis NSF/FBI/DARPA workshop report. In: Workshop, p. 120 (2014). www.umiacs.umd.edu/~rama/NSF_report.pdf
Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)
Article Google Scholar
Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Intell. Res. 1, 231–255 (1994)
Article Google Scholar
Das, S.: Divide and Conquer Approach to Scalable Substructure Discovery: Partitioning Schemes, Algorithms, Optimization And Performance Analysis Using Map/reduce Paradigm. Ph.D. thesis, The University of Texas at Arlington, May 2017
Google Scholar
Das, S., Chakravarthy, S.: Partition and conquer: map/reduce way of substructure discovery. In: Madria, S., Hara, T. (eds.) DaWaK 2015. LNCS, vol. 9263, pp. 365–378. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22729-0_28
Chapter Google Scholar
Das, S., Chakravarthy, S.: Duplicate reduction in graph mining: approaches, analysis, and evaluation. IEEE Trans. Knowl. Data Eng. 30(8), 1454–1466 (2018). https://doi.org/10.1109/TKDE.2018.2795003
Article Google Scholar
Das, S., Goyal, A., Chakravarthy, S.: Plan before you execute: a cost-based query optimizer for attributed graph databases. In: DaWaK 2016, Porto, Portugal, 6–8 September 2016, pp. 314–328 (2016)
Chapter Google Scholar
Dayal, U., et al.: The HiPAC project: combining active databases and timing constraints. SIGMOD Rec. 17(1), 51–70 (1988)
Article Google Scholar
Dayal, U., Buchmann, A.P., Chakravarthy, S.: The HiPAC project. In: Active Database Systems: Triggers and Rules for Advanced Database Processing, pp. 177–206. Morgan Kaufmann (1996)
Google Scholar
Dittrich, K.R., Kotz, A.M., Mulle, J.A.: An event/trigger mechanism to enforce complex consistency constraints in design databases. SIGMOD Rec. 15(3), 22–36 (1986)
Article Google Scholar
Engström, H., Chakravarthy, S., Lings, B.: A systematic approach to selecting maintenance policies in a data warehouse environment. In: Jensen, C.S., et al. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 317–335. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45876-X_22
Chapter MATH Google Scholar
Engström, H., Chakravarthy, S., Lings, B.: Implementation and comparative evaluation of maintenance policies in a data warehouse environment. In: Eaglestone, B., North, S., Poulovassilis, A. (eds.) BNCOD 2002. LNCS, vol. 2405, pp. 90–102. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45495-0_14
Chapter MATH Google Scholar
Engström, H., Chakravarthy, S., Lings, B.: A heuristic for refresh policy selection in heterogeneous environments. In: ICDE, pp. 674–676 (2003)
Google Scholar
Engström, H., Chakravarthy, S., Lings, B.: Maintenance policy selection in heterogeneous data warehouse environments: a heuristics-based approach. In: DOLAP, pp. 71–78 (2003)
Google Scholar
Goyal, A.: QP-SUBDUE: Processing Queries Over Graph Databases. Master’s thesis, The University of Texas at Arlington, December 2015
Google Scholar
Hwang, J.H., Cha, S., Çetintemel, U., Zdonik, S.B.: Borealis-R: a replication-transparent stream processing system for wide-area monitoring applications. In: SIGMOD Conference, pp. 1303–1306 (2008)
Google Scholar
Jiang, Q., Adaikkalavan, R., Chakravarthy, S.: \(NFM^i\): an inter-domain network fault management system. In: ICDE, pp. 1036–1047 (2005)
Google Scholar
Jiang, Q., Adaikkalavan, R., Chakravarthy, S.: MavEStream: synergistic integration of stream and event processing. In: International Conference on Digital Communications, p. 29 (2007)
Google Scholar
Jiang, Q., Chakravarthy, S.: Queueing analysis of relational operators for continuous data streams. In: CIKM, pp. 271–278 (2003)
Google Scholar
Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y., Porter, M.A.: Multilayer networks. CoRR abs/1309.7233 (2013). http://arxiv.org/abs/1309.7233
Kona, H., Chakravarthy, S.: An SQL-based approach to incremental association rule mining. Found. Comput. Decis. Sci. J. (2006). Special issue
Google Scholar
Kona, H., Chakravarthy, S.: Partitioned approach to association rule mining over multiple databases. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 320–330. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30076-2_32
Chapter Google Scholar
Lerner, A., Shasha, D.: Aquery: query language for ordered data, optimization techniques, and experiments. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 345–356. VLDB Endowment (2003)
Google Scholar
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters (2008)
Google Scholar
Mishra, P., Chakravarthy, S.: Performance evaluation and analysis of k-way join variants for association rule mining. In: James, A., Younas, M., Lings, B. (eds.) BNCOD 2003. LNCS, vol. 2712, pp. 95–114. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45073-4_9
Chapter Google Scholar
Mishra, P., Chakravarthy, S.: Performance evaluation of SQL-OR variants for association rule mining. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 288–298. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45228-7_29
Chapter Google Scholar
Mishra, P.: Performance Evaluation and Analysis of SQL-based Approaches for Association Rule Mining. Master’s thesis, The University of Texas at Arlington, December 2002
Google Scholar
Newman, M.: Networks: An Introduction. Oxford University Press Inc., New York (2010)
Book Google Scholar
Padmanabhan, S.: HDB-Subdue: A Relational Database Approach to Graph Mining and Hierarchical Reduction. Master’s thesis, The University of Texas at Arlington, December 2005
Google Scholar
Qingchun, J.: A Framework for Supporting Quality of Service Requirements in a Data Stream Management System. Ph.D. thesis, The University of Texas at Arlington, August 2005
Google Scholar
Ramakrishnan, R.: Database Management Systems. WCB/McGraw-Hill (1998)
Google Scholar
Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems, 2nd edn. Benjamin/Cummings, Redwood City (1994)
MATH Google Scholar
Rosenthal, A., Chakravarthy, S., Blaustein, B.T., Blakeley, J.A.: Situation monitoring for active databases. In: VLDB, pp. 455–464 (1989)
Google Scholar
Rosenthal, A., Chakravarthy, U.S.: Anatomy of a mudular multiple query optimizer. In: VLDB, pp. 230–239 (1988)
Google Scholar
Santra, A., Bhowmick, S.: Holistic analysis of multi-source, multi-feature data: modeling and computation challenges. In: Reddy, P.K., Sureka, A., Chakravarthy, S., Bhalla, S. (eds.) BDA 2017. LNCS, vol. 10721, pp. 59–68. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72413-3_4
Chapter Google Scholar
Santra, A., Bhowmick, S., Chakravarthy, S.: Efficient community re-creation in multilayer networks using boolean operations. In: International Conference on Computational Science, ICCS 2017, 12–14 June 2017, Zurich, Switzerland, pp. 58–67 (2017). https://doi.org/10.1016/j.procs.2017.05.246
Article Google Scholar
Santra, A., Bhowmick, S., Chakravarthy, S.: HUBify: efficient estimation of central entities across multiplex layer compositions. In: 2017 IEEE International Conference on Data Mining Workshops, ICDM Workshops (2017)
Google Scholar
Solé-Ribalta, A., De Domenico, M., Gómez, S., Arenas, A.: Centrality rankings in multiplex networks. In: Proceedings of the 2014 ACM Conference on Web Science, pp. 149–155. ACM (2014)
Google Scholar
Stonebraker, M., Hanson, E., Potamianos, S.: The POSTGRES rule manager. IEEE Trans. Softw. Eng. 14(7), 897–907 (1988)
Article Google Scholar
Zdonik, S.B., Stonebraker, M., Cherniack, M., Çetintemel, U., Balazinska, M., Balakrishnan, H.: The aurora and medusa projects. IEEE Data Eng. Bull. 26(1), 3–10 (2003)
Google Scholar
Zhang, H., Wang, C.D., Lai, J.H., Philip, S.Y.: Modularity in complex multilayer networks with multiple aspects: a static perspective. Appl. Inform. 4, 7 (2017)
Article Google Scholar

Download references

Acknowledgment

We would like to thank Dr. Sanjukta Bhowmick on her collaboration with us on the multilayer network analysis.

Author information

Authors and Affiliations

Information Technology Laboratory (IT Lab), Computer Science and Engineering Department, University of Texas at Arlington, Arlington, TX, 76019, USA
Sharma Chakravarthy, Abhishek Santra & Kanthi Sannappa Komar

Authors

Sharma Chakravarthy
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Santra
View author publications
You can also search for this author in PubMed Google Scholar
Kanthi Sannappa Komar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sharma Chakravarthy .

Editor information

Editors and Affiliations

Ashoka University, Sonepat, India
Anirban Mondal
IBM Research - India, New Delhi, India
Himanshu Gupta
University of Minnesota, Minneapolis, MN, USA
Jaideep Srivastava
IIIT, Hyderabad, India
P. Krishna Reddy
National Institute of Technology, Warangal, India
D.V.L.N. Somayajulu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chakravarthy, S., Santra, A., Komar, K.S. (2018). Humble Data Management to Big Data Analytics/Science: A Retrospective Stroll. In: Mondal, A., Gupta, H., Srivastava, J., Reddy, P., Somayajulu, D. (eds) Big Data Analytics. BDA 2018. Lecture Notes in Computer Science(), vol 11297. Springer, Cham. https://doi.org/10.1007/978-3-030-04780-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-04780-1_3
Published: 22 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04779-5
Online ISBN: 978-3-030-04780-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics