Skip to main content

Abstract

The problems related to the phenomenon of Big Data are currently among the top 10 hottest topics of information and communication technology. Big Data phenomenon refers to the data explosion observed today. At present, the term is widely used in different communities of many application domains, including researchers and practitioners. Big Data analysis can provide for many new opportunities in many respects motivating and stimulating industrial and commercial take-up of novel emerging technologies. The in-depth analysis of Big Data processing and analytics publications shows that the most of them write about “new opportunities” and “new challenges”. However, very few papers present the solutions for predictive analytics that go beyond the limits of OLAP-like processing models and technologies. The goal of this paper is to outline in more detail not only the nature of opportunities and particular challenges but also some original solutions to attack them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Hereinafter Big Data refers to the corresponding problem domain whereas big data refers to big data samples.

  2. 2.

    This application is used only to explain the essence of the big data analysis tasks arisen in social networks.

  3. 3.

    Unfortunately, there are many industrial practitioners who still trust that unlimited computing resources are capable to cope with any big data-related problem.

  4. 4.

    See [6] for impressive graphical illustrations of the error accumulation and the spurious correlation effects.

  5. 5.

    At this step, the testing procedure has to be applied only for data subset assigned label \( \bar{\omega }_{k} \).

  6. 6.

    Practically, causal analysis based on Bayesian network model can be used for data dimensionalities of no more than 20.

References

  1. Aliferis, C.F., Statnikov, A., Tsamardinos, I., Xenofon, S.M., Koutsoukos, D.: Local causal and markov blanket induction for causal discovery and feature selection for classification Part I: Algorithms and empirical evaluation. J. Mach. Learn. Res. 11, 171–234 (2010)

    MATH  MathSciNet  Google Scholar 

  2. Bedini, I., Nguyen, B.: Automatic Ontology Generation: State of the Art. http://bivan.free.fr/Janus/Docs/Automatic_Ontology_Generation_State_of_Art.pdf

  3. Big Data: A New World of Opportunities. NESSI White Paper, December 2012. http://www.nessi-europe.com/Files/Private/NESSI_WhitePaper_BigData.pdf

  4. Bizer, C., Heath, T., Berners-Lee, T.: Linked data – the story so far. Int. J Semant. Web Inf. Syst. 5(3), 1–22 (2009)

    Article  Google Scholar 

  5. Condorcet’s Theorem. http://en.wikipedia.org/wiki/Condorcet’s_jury_theorem

  6. Fan, J., Han, F., Liu, H.: Challenges of Big Data Analysis. Princeton University, Johns Hopkins University (2013). http://arxiv.org/pdf/1308.1479.pdf

  7. Fan, J., Guo, S., Hao, N.: Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 74(1), 37–65 (2012)

    Article  MathSciNet  Google Scholar 

  8. Fan, J., Fan, Y.: High dimensional classification using features annealed independence rules. Ann. Stat. 36(6), 2605–2637 (2008)

    Article  MATH  Google Scholar 

  9. Gorodetsky, V., Samoylov, V., Tushkanova, O.: Agent-based customer profile learning in 3G recommending systems. In: Proceedings of 9-th International Workshop on Agent and Data Mining Interaction (ADMI -2014) Associated with International Conference on Autonomous Agents and Multi-agent Systems (AAMAS -2014), Paris (2014)

    Google Scholar 

  10. Gorodetsky, V., Samoylov, V., Serebryakov, S.: Context–Driven Data and Information Fusion. In: Proceedings of International Conference on Information Fusion (Fusion 2012), pp. 1830–1837, Singapore (2012)

    Google Scholar 

  11. Gorodetsky, V., Samoylov, V., Serebryakov, S.: Ontology–based context–dependent personalization technology. In: Proceedings of WI/IAT/ACM International Conference, Associated Workshop “Web Personalization and Recommender Systems”, Toronto (2010)

    Google Scholar 

  12. Gorodetsky, V., Serebryakov, S.: Methods and algorithms of collective recognition. Autom. Remote Control. 69(11), 1821–1851 (2008)

    Article  MathSciNet  Google Scholar 

  13. Hall, P., Pittelkow, Y., Ghosh, M.: Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 70(1), 159–173 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  14. IBM Big Data Success Stories. http://public.dhe.ibm.com/software/data/sw-library/big-data/ibm-big-data-success.pdf

  15. InfoSphere BigInsights Enterprise Edition. http://www-03.ibm.com/software/products/ru/infobigienteedit/

  16. IBM Business Analytics for Big Data – Overview. http://www-01.ibm.com/software/analytics/solutions/big-data/

  17. InfoSphere Streams Technical Overview – Use Cases Big Data. http://www.slideshare.net/IBMInfoSphereUGFR/infosphere-streams-technical-overview-use-cases-big-data-jerome-chailloux

  18. Kuncheva, L., Whitaker, C.: Measures of diversity in classifier ensembles. Mach. Learn. 51, 181–207 (2003)

    Article  MATH  Google Scholar 

  19. Li, J., Le, T.D., Liu, L., Liu, J., Jin, Z., Sun, B.: Mining causal association rules. In: Proceedings of International ICDM-2013 Workshop on Causal Discovery, Dallas, USA (2013)

    Google Scholar 

  20. NineSigma REQUEST #69987. https://www.ninesights.com/docs/DOC-8380

  21. Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco (1991)

    Google Scholar 

  22. Pearl, J. and Verma, T.S.: A Theory of Inferred Causation. In: Proc. Second International Conference on the Principles of Knowledge Representation and Reasoning, pp. 441–452 (1991)

    Google Scholar 

  23. Silverstein, C., Brin, S., Motwani, R.: Scalable techniques for mining causal structures. In: Proceedings of 24th VLDB Conference, New York, USA, pp 594–605 (1998)

    Google Scholar 

  24. Skormin, V.A., Gorodetski, V.I., Popyack, L.J.: Data mining technology for failure prognostic of avionics. IEEE Trans. Aerosp. Electron. Syst. 38(2), 388–403 (2002)

    Article  Google Scholar 

Download references

Acknowledgment

This research is supported by the Project No. 1.12 of the Research Program entitled “Information Technologies and Methods for Complex System Analysis” supervised by Nano- and Information Technology Branch of the Russian Academy of Sciences.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Gorodetsky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Gorodetsky, V. (2014). Big Data: Opportunities, Challenges and Solutions. In: Ermolayev, V., Mayr, H., Nikitchenko, M., Spivakovsky, A., Zholtkevych, G. (eds) Information and Communication Technologies in Education, Research, and Industrial Applications. ICTERI 2014. Communications in Computer and Information Science, vol 469. Springer, Cham. https://doi.org/10.1007/978-3-319-13206-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13206-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13205-1

  • Online ISBN: 978-3-319-13206-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics