An Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers

Kunnakorntammanop, Sittiporn; Thepwuttisathaphon, Netiphong; Thaicharoen, Supphachai

doi:10.1007/978-981-15-0399-3_17

Sittiporn Kunnakorntammanop¹¹,
Netiphong Thepwuttisathaphon¹¹ &
Supphachai Thaicharoen¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1100))

Included in the following conference series:

International Conference on Soft Computing in Data Science

804 Accesses
7 Citations

Abstract

Many real-world data are not only large in volume but also heterogeneous and fast generated. This type of data, known as big data, typically cannot be analyzed by using traditional software tools and techniques. Although an open-source software project, Apache Hadoop, has been successfully developed and used for handling big data, its setup and configuration complexity including its requirement to learn other additional related tools have hindered non-technical researchers and educators from actually entering the area of big data analytics. To support big-data community, this paper describes procedures and experiences gained from building a big data analytics framework, and demonstrates its usage on a popular case study, Twitter sentiment analysis. The framework comprises a cluster of four commodity computers run by Cloudera CDH 6.0.1 and RapidMiner Studio 9.3 with Text Processing, Hive Connector, and Radoop extensions. According to the study results, setting up a big data analytics framework on a cluster of computers does not require advanced computer knowledge but needs meticulous system configurations to satisfy system installation and software integration requirements. Once all setup and configurations are correctly done, data analysis can be readily performed using visual workflow designers provided by RapidMiner. Finally, the framework is further evaluated on a large data set of 185 million records, “TalkingData AdTracking Fraud Detection” data set. The outcome is very satisfied and proves that the framework is easy to use and can practically be deployed for big data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase

Big Data Storage and Management: Challenges and Opportunities

A Study of Cloud-Based Solution for Data Analytics

References

Altalhi, A.H., Luna, J.M., Vallejo, M.A., Ventura, S.: Evaluation and comparison of open source software suites for data mining and knowledge discovery: open source software suites for data mining and knowledge discovery. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 7(3), e1204 (2017)
Article Google Scholar
Bhathal, G.S., Singh, A.: Big data: Hadoop framework vulnerabilities, security issues and attacks. Array 1, 100002 (2019)
Article Google Scholar
Chennamsetty, H., Chalasani, S., Riley, D.: Predictive analytics on electronic health records (EHRs) using Hadoop and Hive. In: 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–5. IEEE, Coimbatore (March 2015)
Google Scholar
Dubey, A.K., Jain, V., Mittal, A.P.: Stock market prediction using Hadoop MapReduce ecosystem, p. 6 (2015)
Google Scholar
Feltrin, L.: KNIME an open source solution for predictive analytics in the geosciences [software and data sets]. IEEE Geosci. Remote Sens. Mag. 3(4), 28–38 (2015)
Article Google Scholar
Han, P., Wang, D.B., Zhao, Q.G.: The research on Chinese document clustering based on WEKA. In: 2011 International Conference on Machine Learning and Cybernetics, pp. 1953–1957. IEEE, Guilin (July 2011)
Google Scholar
Ivanov, T., Niemann, R., Izberovic, S., Rosselli, M., Tolle, K., Zicari, R.V.: Performance evaluation of enterprise big data platforms with HiBench. In: 2015 IEEE Truscom/BigDataSE/ISPA, pp. 120–127. IEEE, Helsinki (August 2015)
Google Scholar
Jovic, A., Brkic, K., Bogunovic, N.: An overview of free software tools for general data mining. In: 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1112–1117. IEEE, Opatija (May 2014)
Google Scholar
Liu, F.C., Shen, F., Chau, D.H., Bright, N., Belgin, M.: Building a research data science platform from industrial machines. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2270–2275. IEEE, Washington DC (December 2016)
Google Scholar
Nereu, J., Almeida, A., Bernardino, J.: Big data analytics: a preliminary study of open source platforms. In: Proceedings of the 12th International Conference on Software Technologies, pp. 435–440. SCITEPRESS - Science and Technology Publications, Madrid (2017)
Google Scholar
Prekopcsák, Z., Makrai, G., Henk, T., Gáspár-Papanek, C.: Radoop: analyzing big data with RapidMiner and Hadoop. In: RCOMM 2011: RapidMiner Community Meeting and Conference, p. 13. Rapid-I (June 2011)
Google Scholar
Sangeeta: Twitter data analysis using FLUME & HIVE on hadoop framework. Spec. Issue Int. J. Recent Adv. Eng. Technol. (IJRAET) 4(2), 119–123 (2016)
Google Scholar
Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 8 (2015)
Article Google Scholar
Tripathi, P., Vishwakarma, S.K., Lala, A.: Sentiment analysis of english tweets using rapid miner. In: 2015 International Conference on Computational Intelligence and Communication Networks (CICN), pp. 668–672. IEEE, Jabalpur (December 2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Science, Srinakharinwirot University, Bangkok, 10110, Thailand
Sittiporn Kunnakorntammanop, Netiphong Thepwuttisathaphon & Supphachai Thaicharoen

Authors

Sittiporn Kunnakorntammanop
View author publications
You can also search for this author in PubMed Google Scholar
Netiphong Thepwuttisathaphon
View author publications
You can also search for this author in PubMed Google Scholar
Supphachai Thaicharoen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Supphachai Thaicharoen .

Editor information

Editors and Affiliations

University of Tennessee, Knoxville, TN, USA
Michael W. Berry
Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia
Bee Wah Yap
Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia
Azlinah Mohamed
Kyushu Institute of Technology, Fukuoka, Japan
Mario Köppen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kunnakorntammanop, S., Thepwuttisathaphon, N., Thaicharoen, S. (2019). An Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers. In: Berry, M., Yap, B., Mohamed, A., Köppen, M. (eds) Soft Computing in Data Science. SCDS 2019. Communications in Computer and Information Science, vol 1100. Springer, Singapore. https://doi.org/10.1007/978-981-15-0399-3_17

Download citation

DOI: https://doi.org/10.1007/978-981-15-0399-3_17
Published: 24 September 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0398-6
Online ISBN: 978-981-15-0399-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers

Abstract

Access this chapter

Similar content being viewed by others

Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase

Big Data Storage and Management: Challenges and Opportunities

A Study of Cloud-Based Solution for Data Analytics

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers

Abstract

Access this chapter

Similar content being viewed by others

Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase

Big Data Storage and Management: Challenges and Opportunities

A Study of Cloud-Based Solution for Data Analytics

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation