Skip to main content

Optimizing Intelligent Reduction Techniques for Big Data

  • Chapter
  • First Online:
Big Data Optimization: Recent Developments and Challenges

Part of the book series: Studies in Big Data ((SBD,volume 18))

Abstract

Working with big volume of data collected through many applications in multiple storage locations is both challenging and rewarding. Extracting valuable information from data means to combine qualitative and quantitative analysis techniques. One of the main promises of analytics is data reduction with the primary function to support decision-making. The motivation of this chapter comes from the new age of applications (social media, smart cities, cyber-infrastructures, environment monitoring and control, healthcare, etc.), which produce big data and many new mechanisms for data creation rather than a new mechanism for data storage. The goal of this chapter is to analyze existing techniques for data reduction, at scale to facilitate Big Data processing optimization and understanding. The chapter will cover the following subjects: data manipulation, analytics and Big Data reduction techniques considering descriptive analytics, predictive analytics and prescriptive analytics. The CyberWater case study will be presented by referring to: optimization process, monitoring, analysis and control of natural resources, especially water resources to preserve the water quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    These solution were grouped in the Special Issue on “Modern Dimension Reduction Methods for Big Data Problems in Ecology” edited by Wikle, Holan and Hooten, in Journal of Agricultural, Biological, and Environmental Statistics.

References

  1. Laurila, J.K., Gatica-Perez, D., Aad, I., Bornet, O., Do, T.M.T., Dousse, O., Eberle, J., Miettinen, M.: The mobile data challenge: Big data for mobile computing research. In: Mobile Data Challenge by Nokia Workshop (2012)

    Google Scholar 

  2. Davenport, T.H., Dyche, J.: Big data in big companies. Int. Inst. Anal. (2013)

    Google Scholar 

  3. Ho, D., Snow, C., Obel, B., Dissing Srensen, P., Kallehave, P.: Unleashing the potential of big data. Technical report, Organizational Design Community (2013)

    Google Scholar 

  4. Lynch, C.: Big data: How do your data grow? Nature 455(7209), pp. 28–29 (2008)

    Article  Google Scholar 

  5. Szala, A.: Science in an exponential world. Nature 440, 2020 (2006)

    Google Scholar 

  6. Birney, E.: The making of encode: lessons for big-data projects. Nature 489(7414), pp. 49–51 (2012)

    Article  Google Scholar 

  7. Bizer, C., Boncz, P., Brodie, M.L., Erling, O.: The meaningful use of big data: Four perspectives-four challenges. SIGMOD Rec. 40(4), pp. 56–60 (2012)

    Google Scholar 

  8. Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), pp. 4–6 (2012)

    Google Scholar 

  9. Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. Proc. VLDB Endow. 5(12), pp. 1802–1813 (2012)

    Google Scholar 

  10. Cuzzocrea, A., Song, I.Y. Davis, K.C.: Analytics over large-scale multidimensional data: the big data revolution! In: Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, DOLAP’11, pp. 101–104. ACM, New York, NY, USA (2011)

    Google Scholar 

  11. Negru, C., Pop, F., Cristea, V., Bessisy, N., Li, J.: Energy efficient cloud storage service: key issues and challenges. In: Proceedings of the 2013 Fourth International Conference on Emerging Intelligent Data and Web Technologies, EIDWT’13, pp. 763–766. IEEE Computer Society, Washington, DC, USA (2013)

    Google Scholar 

  12. Rao, S., Ramakrishnan, R., Silberstein, A., Ovsiannikov, M., Reeves, D.: Sailfish: a framework for large scale data processing. In: Proceedings of the Third ACM Symposium on Cloud Computing, SoCC’12, pp. 4:1–4:14. ACM, New York, NY, USA (2012)

    Google Scholar 

  13. Roddick, J.F., Hoel, E., Egenhofer, M.J., Papadias, D., Salzberg, B.: Spatial, temporal and spatio-temporal databases—hot issues and directions for Ph.D. research. SIGMOD Rec. 33(2), pp. 126–131 (2004)

    Google Scholar 

  14. Chen, C.X.: Spatio-temporal databases. In: Shekhar, S., Xiong, H. (eds.) Encyclopedia of GIS, pp. 1121–1121. Springer, USA (2008)

    Google Scholar 

  15. Guhaniyogi, R., Finley, A., Banerjee, S., Kobe, R.: Modeling complex spatial dependencies: low-rank spatially varying cross-covariances with application to soil nutrient data. J. Agric. Biol. Environ. Stat. 18(3), pp. 274–298 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  16. Johnson, D.S., Ream, R.R., Towell, R.G., Williams, M.T., Guerrero, J.D.L.: Bayesian clustering of animal abundance trends for inference and dimension reduction. J. Agric. Biol. Environ. Stat. 18(3), pp. 299–313 (2013)

    Google Scholar 

  17. Leininger, T.J., Gelfand, A.E., Allen, J.M., Silander Jr, J.A.: Spatial regression modeling for compositional data with many zeros. J. Agric. Biol. Environ. Stat. 18(3), pp. 314–334 (2013)

    Google Scholar 

  18. Wu, G., Holan, S.H., Wikle, C.K.: Hierarchical Bayesian spatio-temporal conwaymaxwell poisson models with dynamic dispersion. J. Agric. Biol. Environ. Stat. 18(3), pp. 335–356 (2013)

    Google Scholar 

  19. Dunstan, P.K., Foster, S.D., Hui, F.K., Warton, D.I.: Finite mixture of regression modeling for high-dimensional count and biomass data in ecology. J. Agric. Biol. Environ. Stat. 18(3), pp. 357–375 (2013)

    Google Scholar 

  20. Hooten, M.B., Garlick, M.J., Powell, J.A.: Computationally efficient statistical differential equation modeling using homogenization. J. Agric. Biol. Environ. Stat. 18(3), pp. 405–428 (2013)

    Google Scholar 

  21. Yang, W.-H., Wikle, C.K., Holan, S.H., Wildhaber, M.L.: Ecological prediction with nonlinear multivariate time-frequency functional data models. J. Agric. Biol. Environ. Stat. 18(3), pp. 450–474 (2013)

    Google Scholar 

  22. Loshin, D.: Nosql data management for big data. In: Loshin, D. (ed.) Big Data Analytics, pp. 83–90. Morgan Kaufmann, Boston (2013)

    Google Scholar 

  23. Madden, S.: Query processing for streaming sensor data. Comput. Sci. Div. (2002)

    Google Scholar 

  24. Hamm, C., Burleson, D.K.: Oracle Data Mining: Mining Gold from Your Warehouse. Oracle In-Focus Series. Rampant TechPress (2006)

    Google Scholar 

  25. Hellerstein, J.M., Ré, C., Schoppmann, F., Wang, D.Z., Fratkin, E., Gorajek, A., Ng, K.S., Welton, C., Feng, X., Li, K., Kumar, A.: The MADlib analytics library: or MAD skills, the SQL. Proc. VLDB Endow. 5(12), pp. 1700–1711 (2012)

    Google Scholar 

  26. Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), pp. 716–727 (2012)

    Google Scholar 

  27. Han, J., Kamber, M.: Data Mining, Southeast Asia Edition: Concepts and Techniques. Morgan kaufmann (2006)

    Google Scholar 

  28. Hilbert, M., Lopez, P.: The worlds technological capacity to store, communicate, and compute information. Science 332(6025), pp. 60–65 (2011)

    Article  Google Scholar 

  29. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)

    Google Scholar 

  30. Agrawal, P., Arasu, A., Kaushik, R.: On indexing error-tolerant set containment. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD’10, pp. 927–938. ACM, New York, NY, USA (2010)

    Google Scholar 

  31. Arasu, A., Gotz, M., Kaushik, R.: On active learning of record matching packages. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD’10, pp. 783–794. ACM, New York, NY, USA (2010)

    Google Scholar 

  32. Arasu, A., Re, C., Suciu, D.: Large-scale deduplication with constraints using dedupalog. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE’09, pp. 952–963. IEEE Computer Society, Washington, DC, USA (2009)

    Google Scholar 

  33. Varbanescu, A.L., Iosup, A.: On many-task big data processing: from GPUs to clouds. In: MTAGS Workshop, held in conjunction with ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–8. ACM (2013)

    Google Scholar 

  34. Loshin, D.: Big data tools and techniques. In: Loshin, D. (ed.) Big Data Analytics, pp. 61–72. Morgan Kaufmann, Boston (2013)

    Google Scholar 

  35. Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’10, pp. 11–11. USENIX Association, Berkeley, CA, USA (2010)

    Google Scholar 

  36. Jiang, Y.: HBase Administration Cookbook. Packt Publishing, Birmingham (2012)

    Google Scholar 

  37. Huai, Y., Chauhan, A., Gates, A., Hagleitner, G., Hanson, E.N., O’Malley, O., Pandey, J., Yuan, Y., Lee, R., Zhang, X.: Major technical advancements in apache hive. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD’14, pp. 1235–1246. ACM, New York, NY, USA (2014)

    Google Scholar 

  38. Shang, W., Adams, B., Hassan, A.E.: Using pig as a data preparation language for large-scale mining software repositories studies: an experience report. J. Syst. Softw. 85(10), pp. 2195–2204 (2012)

    Google Scholar 

  39. Owen, S., Anil, R., Dunning, T., Friedman, E.: Mahout in Action. Manning Publications Co., Greenwich, CT, USA (2011)

    Google Scholar 

  40. Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H.: Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Series B (Stat. Methodol.) 70(4), pp. 825–848 (2008)

    Google Scholar 

  41. Hensman, J., Fusi, N., Lawrence, N.D.: Gaussian processes for big data (2013). arXiv:1309.6835

  42. Feldman, D., Schmidt, M., Sohler, C.: Turning big data into tiny data: constant-size coresets for k-means, pca and projective clustering. In: Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’13, pp. 1434–1453. SIAM (2013)

    Google Scholar 

  43. Aflalo, Y., Kimmel, R.: Spectral multidimensional scaling. Proc. Natl. Acad. Sci. 110(45), pp. 18052–18057 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  44. Pop, F., Ciobanu, R.-I., Dobre, C.: Adaptive method to support social-based mobile networks using a pagerank approach. In: Concurrency and Computation: Practice and Experience (2013)

    Google Scholar 

  45. Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a self-tuning system for big data analytics. In: CIDR, vol. 11, pp. 261–272 (2011)

    Google Scholar 

  46. Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms, 2nd edn. Wiley-IEEE Press, Hoboken (2011)

    Google Scholar 

  47. Namey, E., Guest, G., Thairu, L., Johnson, L.: Data reduction techniques for large qualitative data sets. In: Guest, G., MacQueen, K.M. (eds.) Handbook for Team-Based Qualitative Research, pp. 137–162. AltaMira Press, USA (2007)

    Google Scholar 

  48. Aflalo, Y., Kimmel, R., Raviv, D.: Scale invariant geometry for nonrigid shapes. SIAM J. Imaging Sci. 6(3), pp. 1579–1597 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  49. Cambria, E., Rajagopal, D., Olsher, D., Das, D.: Big social data analysis. In: R. Akerkar (ed.) Big Data Computing, pp. 401–414. Taylor & Francis, New York (2013)

    Google Scholar 

  50. Yang, C., Zhang, X., Zhong, C., Liu, C., Pei, J., Ramamohanarao, K., Chen, J.: A spatiotemporal compression based approach for efficient big data processing on cloud. J. Comput. Syst. Sci. 80(8), pp. 1563–1583 (2014)

    Google Scholar 

  51. Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: Mad skills: new analysis practices for big data. Proc. VLDB Endow. 2(2), pp. 1481–1492 (2009)

    Google Scholar 

  52. Ciolofan, S.N., Mocanu, M., Ionita, A.: Distributed cyberinfrastructure for decision support in risk related environments. In: 2013 IEEE 12th International Symposium on Parallel and Distributed Computing (ISPDC), pp. 109–115 (2013)

    Google Scholar 

Download references

Acknowledgments

The research presented in this paper is supported by projects: CyberWater grant of the Romanian National Authority for Scientific Research, CNDI-UEFISCDI, project number 47/2012; CLUeFARM: Information system based on cloud services accessible through mobile devices, to increase product quality and business development farms—PN-II-PT-PCCA-2013-4-0870; DataWay: Real-time Data Processing Platform for Smart Cities: Making sense of Big Data - PN-II-RU-TE-2014-4-2731; MobiWay: Mobility Beyond Individualism: an Integrated Platform for Intelligent Transportation Systems of Tomorrow—PN-II-PT-PCCA-2013-4-0321.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florin Pop .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Pop, F., Negru, C., Ciolofan, S.N., Mocanu, M., Cristea, V. (2016). Optimizing Intelligent Reduction Techniques for Big Data. In: Emrouznejad, A. (eds) Big Data Optimization: Recent Developments and Challenges. Studies in Big Data, vol 18. Springer, Cham. https://doi.org/10.1007/978-3-319-30265-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30265-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30263-8

  • Online ISBN: 978-3-319-30265-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics