Skip to main content

An Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers

  • Conference paper
  • First Online:
Soft Computing in Data Science (SCDS 2019)

Abstract

Many real-world data are not only large in volume but also heterogeneous and fast generated. This type of data, known as big data, typically cannot be analyzed by using traditional software tools and techniques. Although an open-source software project, Apache Hadoop, has been successfully developed and used for handling big data, its setup and configuration complexity including its requirement to learn other additional related tools have hindered non-technical researchers and educators from actually entering the area of big data analytics. To support big-data community, this paper describes procedures and experiences gained from building a big data analytics framework, and demonstrates its usage on a popular case study, Twitter sentiment analysis. The framework comprises a cluster of four commodity computers run by Cloudera CDH 6.0.1 and RapidMiner Studio 9.3 with Text Processing, Hive Connector, and Radoop extensions. According to the study results, setting up a big data analytics framework on a cluster of computers does not require advanced computer knowledge but needs meticulous system configurations to satisfy system installation and software integration requirements. Once all setup and configurations are correctly done, data analysis can be readily performed using visual workflow designers provided by RapidMiner. Finally, the framework is further evaluated on a large data set of 185 million records, “TalkingData AdTracking Fraud Detection” data set. The outcome is very satisfied and proves that the framework is easy to use and can practically be deployed for big data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Altalhi, A.H., Luna, J.M., Vallejo, M.A., Ventura, S.: Evaluation and comparison of open source software suites for data mining and knowledge discovery: open source software suites for data mining and knowledge discovery. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 7(3), e1204 (2017)

    Article  Google Scholar 

  2. Bhathal, G.S., Singh, A.: Big data: Hadoop framework vulnerabilities, security issues and attacks. Array 1, 100002 (2019)

    Article  Google Scholar 

  3. Chennamsetty, H., Chalasani, S., Riley, D.: Predictive analytics on electronic health records (EHRs) using Hadoop and Hive. In: 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–5. IEEE, Coimbatore (March 2015)

    Google Scholar 

  4. Dubey, A.K., Jain, V., Mittal, A.P.: Stock market prediction using Hadoop MapReduce ecosystem, p. 6 (2015)

    Google Scholar 

  5. Feltrin, L.: KNIME an open source solution for predictive analytics in the geosciences [software and data sets]. IEEE Geosci. Remote Sens. Mag. 3(4), 28–38 (2015)

    Article  Google Scholar 

  6. Han, P., Wang, D.B., Zhao, Q.G.: The research on Chinese document clustering based on WEKA. In: 2011 International Conference on Machine Learning and Cybernetics, pp. 1953–1957. IEEE, Guilin (July 2011)

    Google Scholar 

  7. Ivanov, T., Niemann, R., Izberovic, S., Rosselli, M., Tolle, K., Zicari, R.V.: Performance evaluation of enterprise big data platforms with HiBench. In: 2015 IEEE Truscom/BigDataSE/ISPA, pp. 120–127. IEEE, Helsinki (August 2015)

    Google Scholar 

  8. Jovic, A., Brkic, K., Bogunovic, N.: An overview of free software tools for general data mining. In: 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1112–1117. IEEE, Opatija (May 2014)

    Google Scholar 

  9. Liu, F.C., Shen, F., Chau, D.H., Bright, N., Belgin, M.: Building a research data science platform from industrial machines. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2270–2275. IEEE, Washington DC (December 2016)

    Google Scholar 

  10. Nereu, J., Almeida, A., Bernardino, J.: Big data analytics: a preliminary study of open source platforms. In: Proceedings of the 12th International Conference on Software Technologies, pp. 435–440. SCITEPRESS - Science and Technology Publications, Madrid (2017)

    Google Scholar 

  11. Prekopcsák, Z., Makrai, G., Henk, T., Gáspár-Papanek, C.: Radoop: analyzing big data with RapidMiner and Hadoop. In: RCOMM 2011: RapidMiner Community Meeting and Conference, p. 13. Rapid-I (June 2011)

    Google Scholar 

  12. Sangeeta: Twitter data analysis using FLUME & HIVE on hadoop framework. Spec. Issue Int. J. Recent Adv. Eng. Technol. (IJRAET) 4(2), 119–123 (2016)

    Google Scholar 

  13. Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 8 (2015)

    Article  Google Scholar 

  14. Tripathi, P., Vishwakarma, S.K., Lala, A.: Sentiment analysis of english tweets using rapid miner. In: 2015 International Conference on Computational Intelligence and Communication Networks (CICN), pp. 668–672. IEEE, Jabalpur (December 2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Supphachai Thaicharoen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kunnakorntammanop, S., Thepwuttisathaphon, N., Thaicharoen, S. (2019). An Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers. In: Berry, M., Yap, B., Mohamed, A., Köppen, M. (eds) Soft Computing in Data Science. SCDS 2019. Communications in Computer and Information Science, vol 1100. Springer, Singapore. https://doi.org/10.1007/978-981-15-0399-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-0399-3_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-0398-6

  • Online ISBN: 978-981-15-0399-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics