Skip to main content

From Big Data to Big Data Mining: Challenges, Issues, and Opportunities

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7827))

Abstract

While “big data” has become a highlighted buzzword since last year, “big data mining”, i.e., mining from big data, has almost immediately followed up as an emerging, interrelated research area. This paper provides an overview of big data mining and discusses the related challenges and the new opportunities. The discussion includes a review of state-of-the-art frameworks and platforms for processing and managing big data as well as the efforts expected on big data mining. We address broad issues related to big data and/or big data mining, and point out opportunities and research topics as they shall duly flesh out. We hope our effort will help reshape the subject area of today’s data mining technology toward solving tomorrow’s bigger challenges emerging in accordance with big data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fayyad, U.M., Gregory, P.S., Padhraic, S.: From Data Mining to Knowledge Discovery: an Overview. In: Advances in Knowledge Discovery and Data Mining, pp. 1–36. AAAI Press, Menlo Park (1996)

    Google Scholar 

  2. Berkovich, S., Liao, D.: On Clusterization of big data Streams. In: 3rd International Conference on Computing for Geospatial Research and Applications, article no. 26. ACM Press, New York (2012)

    Google Scholar 

  3. Beyer, M.A., Laney, D.: The Importance of ‘Big Data’: a Definition. Gartner (2012)

    Google Scholar 

  4. Madden, S.: From Databases to big data. IEEE Internet Computing 16(3), 4–6 (2012)

    Article  Google Scholar 

  5. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: 6th Symposium on Operating System Design and Implementation (OSDI), pp. 137–150 (2004)

    Google Scholar 

  6. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google File System. In: 19th ACM Symposium on Operating Systems Principles, Bolton Landing, New York, pp. 29–43 (2003)

    Google Scholar 

  7. Dean, J., Ghemawat, S.: MapReduce: a Flexible Data Processing Tool. Communication of the ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  8. Chang, F., Dean, J., Ghemawat, S., et al.: Bigtable: A Distributed Storage System for Structured Data. In: 7th Symposium on Operating Systems Design and Implementation, vol. 7, pp. 205–218. USENIX Association Berkeley, CA (2006)

    Google Scholar 

  9. DeCandia, G., Hastorun, D.: Jampani, et al: Dynamo: Amazon’s Highly Available Key-Value Store. In: 21st ACM SIGOPS Symposium on Operating Systems Principles, pp. 14–17. Stevenson, Washington (2007)

    Google Scholar 

  10. Shmueli, G., Patel, N.R., Bruce, P.C.: Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner, 2nd edn. Wiley & Sons, Hoboken (2010)

    Google Scholar 

  11. Ghoting, A., Kambadur, P., Pednault, E., Kannan, R.: NIMBLE: a Toolkit for the Implementation of Parallel Data Mining and Machine Learning Algorithms on MapReduce. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, USA, pp. 334–342 (2011)

    Google Scholar 

  12. Mahout, http://lucene.apache.org/mahout/

  13. Yu, L., Zheng, J., Shen, W.C., et al.: BC-PDM: Data Mining, Social Network Analysis and Text Mining System Based on Cloud Computing. In: 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1496–1499 (2012)

    Google Scholar 

  14. Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations. In: 9th IEEE International Conference on Data Mining, pp. 229–238 (2009)

    Google Scholar 

  15. Apache Giraph Project, http://giraph.apache.org/

  16. Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. VLDB Endowment 5(8), 71–727 (2012)

    Google Scholar 

  17. Brown, P.G.: Overview of SciDB: Large Scale Array Storage, Processing and Analysis. In: ACM SIGMOD International Conference on Management of Data, pp. 963–968 (2010)

    Google Scholar 

  18. Wu, K.: FastBit: An Efficient Indexing Technology for Accelerating Data-intensive Science. Journal of Physics, Conference Series 16, 550–560 (2005)

    Article  Google Scholar 

  19. Borkar, V.R., Carey, M.J., Li, C.: big data Platforms: What’s Next? ACM Crossroads 19(1), 44–49 (2012)

    Article  Google Scholar 

  20. Sun, Y., Han, J., Yan, X., Yu, P.S.: Mining Knowledge from Interconnected Data: A Heterogeneous Information Network Analysis Approach. VLDB Endowment 5(12), 2022–2023 (2012)

    Google Scholar 

  21. Obradovic, Z., Vucetic, S.: Challenges in Scientific Data Mining: Heterogeneous, Biased, and Large Samples. Technical Report, Center for Information Science and Technology Temple University, ch. 1, pp. 1–24 (2004)

    Google Scholar 

  22. Vucetic, S., Obradovic, Z.: Discovering Homogeneous Regions in Spatial Data through Competition. In: 17th International Conference of Machine Learning, Stanford, CA, pp. 1095–1102 (2000)

    Google Scholar 

  23. Wu, K., Ahern, S.: Bethel, et al: FastBit: Interactively Searching Massive Data. SciDAC 180 (2009)

    Google Scholar 

  24. Cai, D., Shao, Z., He, X., Yan, X., Han, J.: Mining Hidden Communities in Heterogeneous Social Network. In: 3rd International Workshop Link Discovery (LinkKDD), pp. 58–65 (2005)

    Google Scholar 

  25. Apache Hive, http://hive.apache.org/

  26. Berkeley Data Analytics Stack (BDAS), https://amplab.cs.berkeley.edu/bdas/

  27. Xin, R.S., Rosen, J., Zaharia, M., Franklin, M., Shenker, S., Stoica, I.: Shark: SQL and Rich Analytics at Scale. In: ACM SIGMOD Conference (accepted, 2013)

    Google Scholar 

  28. Agrawal, D., Bernstein, P., Bertino, E., et al.: Challenges and Opportunities With big data – A Community White Paper Developed by Leading Researchers Across the United States (2012), http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf

  29. Laney, D.: 3D Data Management: Controlling Data Volume, Velocity and Variety. Gartner (2001)

    Google Scholar 

  30. Zhang, X., Ai, J., Wang, Z., Lu, J., Meng, X.: An Efficient Multi-dimensional Index for Cloud Data Management. In: 1st International Workshop on Cloud Data Management, pp. 17–24. ACM Press, Hong Kong (2009)

    Google Scholar 

  31. Yin, X., Han, J., Yu, P.S.: Truth Discovery with Multiple Conflicting Information Providers on the Web. In: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, pp. 1048–1052 (2007)

    Google Scholar 

  32. Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating Conflicting Data: The Role of Source Dependence. VLDB Endowment 2(1), 550–561 (2009)

    Google Scholar 

  33. Yin, X., Tan, W.: Semi-Supervised Truth Discovery. In: 20th International Conference on World Wide Web, Hyderabad, India, pp. 217–226 (2011)

    Google Scholar 

  34. Tene, O., Polonetsky, J.: Privacy in the Age of big data: A Time for Big Decisions. Stanford Law Review Online 64, 63–69 (2012)

    Google Scholar 

  35. Pedreschi, D., Calders, T., Custers, B., et al.: big data Mining, Fairness and Privacy - A Vision Statement Towards an Interdisciplinary Roadmap of Research. Data Mining and Analytics Software, KDnuggets Review Online 11(26) (2011)

    Google Scholar 

  36. NewVantage Partners: Big Data Executive Survey (2013), http://newvantage.com/wp-content/uploads/2013/02/NVP-Big-Data-Survey-2013-Summary-Report.pdf

  37. Greenwald, M., Fredian, T., Schissel, D., Stillerman, J.: A Metadata Catalog for Organization and Systemization of Fusion Simulation Data. Fusion Engineering & Design 87(12), 2205–2208 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Che, D., Safran, M., Peng, Z. (2013). From Big Data to Big Data Mining: Challenges, Issues, and Opportunities. In: Hong, B., Meng, X., Chen, L., Winiwarter, W., Song, W. (eds) Database Systems for Advanced Applications. DASFAA 2013. Lecture Notes in Computer Science, vol 7827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40270-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40270-8_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40269-2

  • Online ISBN: 978-3-642-40270-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics