Abstract
Big Data, Data Science and MapReduce are three keywords that have flooded our research papers and technical articles during the last two years. Also, due to the inherent distributed nature of computational infrastructures supporting Data Science (like Clouds and Grids), it is natural to view Distributed Intelligence as the most natural underlying paradigm for novel Data Science challenges. Following this major trend, in this paper we provide a background of these new terms, followed by a discussion of recent developments in the data mining and data warehousing areas in the light of aforementioned keywords. Finally, we provide our insights of the next stages in research and developments in this area.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: Hadoopdb: An architectural hybrid of mapreduce and dbms technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)
Agrawal, D., Das, S., Abbadi, A.E.: Big data and cloud computing: current state and future opportunities. In: EDBT, pp. 530–533 (2011)
Apache. Hadoop (July 2011), http://wiki.apache.org/hadoop
BBC. Gap scraps new logo after online outcry (2010), http://www.bbc.co.uk/news/business-11520930
Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: NIPS, pp. 281–288 (2006)
Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: Mad skills: New analysis practices for big data. PVLDB 2(2), 1481–1492 (2009)
Cordeiro, R.L.F., Traina Jr., C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: KDD, pp. 690–698 (2011)
Cuzzocrea, A.: CAMS: OLAPing Multidimensional Data Streams Efficiently. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 48–62. Springer, Heidelberg (2009)
Cuzzocrea, A.: Retrieving Accurate Estimates to OLAP Queries over Uncertain and Imprecise Multidimensional Data Streams. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 575–576. Springer, Heidelberg (2011)
Cuzzocrea, A., Chakravarthy, S.: Event-based lossy compression for effective and efficient olap over data streams. Data Knowl. Eng. 69(7), 678–708 (2010)
Cuzzocrea, A., Furfaro, F., Mazzeo, G.M., Saccá, D.: A Grid Framework for Approximate Aggregate Query Answering on Summarized Sensor Network Readings. In: Meersman, R., Tari, Z., Corsaro, A. (eds.) OTM-WS 2004. LNCS, vol. 3292, pp. 144–153. Springer, Heidelberg (2004)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Dittrich, J., Quiané-Ruiz, J.-A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). PVLDB 3(1), 518–529 (2010)
Ene, A., Im, S., Moseley, B.: Fast clustering using mapreduce. In: KDD, pp. 681–689 (2011)
Foster, I.T., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtual organizations. IJHPCA 15(3), 200–222 (2001)
Gaber, M.M.: Data stream mining using granularity-based approach. In: Foundations of Computational Intelligence, vol. (6), pp. 47–66. Springer (2009)
Ghoting, A., Kambadur, P., Pednault, E.P.D., Kannan, R.: Nimble: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce. In: KDD, pp. 334–342 (2011)
Bártolo Gomes, J., Gaber, M.M., Sousa, P.A.C., Menasalvas, E.: Context-Aware Collaborative Data Stream Mining in Ubiquitous Devices. In: Gama, J., Bradley, E., Hollmén, J. (eds.) IDA 2011. LNCS, vol. 7014, pp. 22–33. Springer, Heidelberg (2011)
Hacigümüs, H., Mehrotra, S., Iyer, B.R.: Providing database as a service. In: ICDE, pp. 29–38 (2002)
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A self-tuning system for big data analytics. In: CIDR, pp. 261–272 (2011)
Hill, K.: How target figured out a teen girl was pregnant before her father did. Forbes (2012)
Lintott, C.J., Schawinski, K., Slosar, A., Land, K., Bamford, S., Thomas, D., Raddick, M.J., Nichol, R.C., Szalay, A., Andreescu, D., Murray, P., Vandenberg, J.: Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society 389(3), 1179–1189 (2008)
Loukides, M.: What is data science? the future belongs to the companies and people that turn data into products. An OReilly Radar Report (June 2010)
Muthukrishnan, S.: Data streams: algorithms and applications. Foundations and trends in theoretical computer science. Now Publishers (2005)
Papadimitriou, S., Sun, J.: Disco: Distributed co-clustering with map-reduce: A case study towards petabyte-scale end-to-end mining. In: ICDM, pp. 512–521 (2008)
Papadimitriou, S., Sun, J., Yan, R.: Large-scale data mining: Mapreduce and beyond. In: Tutorial in KDD 2010(July 2010)
Soulellis, G.: Emerging trends in big data and analytics. Big Data Innovation, London (2012)
Stonebraker, M., Hong, J.: Researchers’ big data crisis; understanding design and functionality. Commun. ACM 55(2), 10–11 (2012)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)
Yin, J., Gaber, M.M.: Clustering distributed time series in sensor networks. In: ICDM, pp. 678–687 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cuzzocrea, A., Gaber, M.M. (2013). Data Science and Distributed Intelligence: Recent Developments and Future Insights. In: Fortino, G., Badica, C., Malgeri, M., Unland, R. (eds) Intelligent Distributed Computing VI. Studies in Computational Intelligence, vol 446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32524-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-32524-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32523-6
Online ISBN: 978-3-642-32524-3
eBook Packages: EngineeringEngineering (R0)