Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- IT :
-
Information Technology
- t-SNE :
-
t-Distributed Stochastic Neighbor Embedding
- SOM :
-
Self-organizing Map
- NCD :
-
Normalized Compression Distance
- GHSOM :
-
Growing Hierarchical Self-organizing Map
- GCS :
-
Growing Cell Structure
- IGG :
-
Incremental Grid Growing
References
Vlahogianni, E.I., Karlaftis, M.G., Stathopoulos, A.: An extreme value based neural clustering approach for identifying traffic states. Intell. Transp. Syst., 320–325 (2005)
Jin, X., Wah, B., Cheng, X., Wang, Y.: Significance and challenges of big data research. Big Data Res. 2(2), 59–64 (2015)
Sarikaya, A., Correli, M., Dinis, J., O’Connor, D., Gleicher, M.: Visualizing co-occurrence of events in populations of viral genome sequences. Comput. Graph. Forum 35(3), 151–160 (2016)
Meena, K., Lawrance, R.: Semantic similarity based assessment of descriptive type answers. In: International Conference on Computing Technologies and Intelligent Data Engineering (ICCTIDE), pp. 1–7 (2016)
Medhane, D.V., Sangaiah, A.K.: ESCAPE: effective scalable clustering ap-proach for parallel execution of continuous position-based queries in position monitoring applications. IEEE Trans. Sustain. Comput. (2017). https://doi.org/10.1109/TSUSC.2017.2690378
Padua, L., Schulze, H., Matković, K., Delrieux, C.: Interactive exploration of parameter space in data mining: Comprehending the predictive quality of large decision tree collections. Comput. Graphics 41, 99–113 (2014)
Gulwani, S.: Programming by Examples (and its applications in Data Wrangling) (2016)
Heer, J., Hellerstein, J.M., Kandel, S.: Predictive interaction for data transformation (2015)
Terrizzano, I., Schwarz, P., Roth, M., Colino, J.E.: Data wrangling: the challenging journey from the wild to the lake (2015)
Endel, F., Piringer, H.: Data wrangling: making data useful again. IFAC-PapersOnLine 48(1), 111–112 (2015)
Savinov, A.: ConceptMix—self-service analytical data integration based on the concept-oriented model. In: Proceedings of 3rd International Conference on Data Management Technologies and Applications (2014)
Parisot, O., Vierke, G., Tamisier, T., Didry, Y., Rieder, H.: Visual analytics for supporting manufacturers and distributors in online sales (2014)
Blankenberg, D., Johnson, J., Taylor, J., Nekrutenko, A.: Wrangling galaxy’s reference data. Bioinformatics 30(13), 1917–1919 (2014)
Ceusters, W., Hsu, C.Y., Smith, B.: Clinical data wrangling using ontological realism and referent tracking (2014)
Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Enterprise data analysis and visualization: an interview study. IEEE Trans. Vis. Comput. Graphics 18(12), 2917–2926 (2012)
Grimes, M., Lee, W., van der Maaten, L., Shannon, P.: Wrangling phosphoproteomic data to elucidate cancer signaling pathways. PLoS ONE 8(1), e52884 (2013)
Kandel, S., Heer, J., Plaisant, C., Kennedy, J., van Ham, F., Riche, N., Weaver, C., Lee, B., Brodbeck, D., Buono, P.: Research directions in data wrangling: Visualizations and transformations for usable and credible data. Inf. Vis. 10(4), 271–288 (2011)
Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts (2011)
Zengin, K., Esgi, N., Erginer, E., Aksoy, M.: A sample study on applying data mining research techniques in educational science: Developing a more meaning of data. Proc. Soc. Behav. Sci. 15, 4028–4032 (2011)
Guo, P.J., Kandel, S., Hellerstein, J.M., Heer, J.: Proactive wrangling: mixed-initiative end-user programming of data transformation scripts (2011)
Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification (2010)
Wu, W., Leung, Y., Mi, J.: Granular computing and knowledge reduction in formal contexts. IEEE Trans. Knowl. Data Eng. 21(10), 1461–1474 (2009)
Tasdemir, K., Merenyi, E.: Exploiting data topology in visualization and clustering of self-organizing maps. IEEE Trans. Neural Netw. 20(4), 549–562 (2009)
Oehmen, C., Nieplocha, J.: ScalaBLAST: a scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis. IEEE Trans. Parallel Distrib. Syst. 17(8), 740–749 (2006)
Datta, S., Bhaduri, K., Giannella, C., Wolff, R., Kargupta, H.: Distributed data mining in peer-to-peer networks. IEEE Int. Comput. 10(4), 18–26 (2006)
Cilibrasi, R., Vitanyi, P.: Clustering by compression. IEEE Trans. Inf. Theor. 51(4), 1523–1545 (2005)
Saraiya, P., North, C., Duca, K.: An insight-based methodology for evaluating bioinformatics visualizations. IEEE Trans. Vis. Comput. Graphics 11(4), 443–456 (2005)
Au, W., Chan, K., Wong, A., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(2), 83–101 (2005)
Figueiredo, V., Rodrigues, F., Vale, Z., Gouveia, J.: An electric energy consumer characterization framework based on data mining techniques. IEEE Trans. Power Syst. 20(2), 596–602 (2005)
Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Trans. Knowl. Data Eng. 16(11), 1370–1386 (2004)
Pedrycz, W., Bargiela, A.: Granular clustering: a granular signature of data. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 32(2), 212–224 (2002)
Seo, J., Shneiderman, B.: Interactively exploring hierarchical clustering results [gene identification]. Computer 35(7), 80–86 (2002)
Rauber, A., Merkl, D., Dittenbach, M.: The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data. IEEE Trans. Neural Netw. 13(6), 1331–1341 (2002)
Alahakoon, D., Halgamuge, S., Srinivasan, B.: Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans. Neural Netw. 11(3), 601–614 (2000)
Karypis, G., Han, E., Kumar, V.: Chameleon: hierarchical clustering using dynamic modelling. Computer 32(8), 68–75 (1999)
Keim, D., Kriegel, H.: Visualization techniques for mining large databases: a comparison. IEEE Trans. Knowl. Data Eng. 8(6), 923–938 (1996)
Vargas, V., Syed, A., Mohammad, A., Halgamuge, M.N.: Pentaho and Jaspersoft: a comparative study of business intelligence open source tools processing big data to evaluate performances. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7(10), 20–29 (2016)
Kalid, S., Syed, A., Mohammad, A., Halgamuge, M. N.: Big-Data NoSQL databases: comparison and analysis of “Big-Table”, “DynamoDB”, and “Cassandra”. In: IEEE 2nd International Conference on Big Data Analysis (ICBDA 2017), pp 89–93, Beijing, China, 10–12 March (2017)
Kaur, K., Syed, A., Mohammad, A., Halgamuge, M. N.: Review: an evaluation of major threats in cloud computing associated with big data. In: IEEE 2nd International Conference on Big Data Analysis (ICBDA 2017), pp. 368–372, Beijing, China, 10–12 March (2017)
Pham, D.V., Syed, A., Mohammad, A., Halgamuge, M.N.: Threat analysis of portable hack tools from usb storage devices and protection solutions. In: International Conference on Information and Emerging Technologies (ICIET 2010), pp. 1–5, Karachi, Pakistan, 14–16 June (2010)
Gupta, A., Mohammad, A., Syed, A., Halgamuge, M.N.: A comparative study of classification algorithms using data mining: crime and accidents in denver city the USA. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7(7), 374–381 (2016)
Author information
Authors and Affiliations
Contributions
C. Bashyal and M.N. Halgamuge conceived the study idea and developed the analysis plan. C. Bashyal analyzed the data and wrote the initial paper. M.N. Halgamuge helped to prepare the figures and tables, and finalizing the manuscript. All authors read the manuscript.
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Bashya, C., Halgamuge, M.N., Mohammad, A. (2018). Review on Analysis of the Application Areas and Algorithms used in Data Wrangling in Big Data. In: Sangaiah, A., Thangavelu, A., Meenakshi Sundaram, V. (eds) Cognitive Computing for Big Data Systems Over IoT. Lecture Notes on Data Engineering and Communications Technologies, vol 14 . Springer, Cham. https://doi.org/10.1007/978-3-319-70688-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-70688-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70687-0
Online ISBN: 978-3-319-70688-7
eBook Packages: EngineeringEngineering (R0)