Abstract
With the exponentially increasing size and complexity of the data in present time, data quality has become a major concern with respect to data analytics. The potential capability of Natural Language Processing (NLP) is already known and being harnessed by various researchers to evolve up with some significant analytical process. However, there is less number of research works emphasizing on applying NLP over the data with complexity reported in current times in the area of big data. Therefore, the primary contribution of this manuscript is to review the most recent work towards NLP based approaches for data analysis where input data could be either text or non-textual too. The secondary contribution is to gauge the level of effectiveness from the existing research approach with NLP-based practices towards leveraging better data quality in data science.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kurdi, Z.: Natural Language Processing and Computational Linguistics 2: Semantics, Discourse and Applications, vol. 2. Wiley, Hoboken (2018)
Lane, H., Howard, C., Hapke, H.: Natural Language Processing in Action. Manning Publications, Shelter Island (2018)
Ardagna, C.A., Ceravolo, P., Damiani, E.: Big data analytics as-a-service: issues and challenges. In: 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, pp. 3638–3644 (2016)
Niño, M., Blanco, J.M., Illarramendi, A.: Business understanding, challenges and issues of Big Data Analytics for the servitization of a capital equipment manufacturer. In: 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, pp. 1368–1377 (2015)
Shuijing, H.: Big data analytics: key technologies and challenges. In: 2016 International Conference on Robots and Intelligent System (ICRIS), Zhangjiajie, pp. 141–145 (2016)
Barros, V.P., Notargiacomo, P.: Big data analytics in cloud gaming: players’ patterns recognition using artificial neural networks. In: 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, pp. 1680–1689 (2016)
Barga, R.S., Ekanayake, J., Lu, W.: Project Daytona: data analytics as a cloud service. In: 2012 IEEE 28th International Conference on Data Engineering, Washington, DC, pp. 1317–1320 (2012)
Schmid, S., Gerostathopoulos, I., Prehofer, C., Bures, T.: Self-adaptation based on big data analytics: a model problem and tool. In: 2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), Buenos Aires, pp. 102–108 (2017)
Makki, S., et al.: Fraud data analytics tools and techniques in Big Data era. In: 2017 International Conference on Cloud and Autonomic Computing (ICCAC), Tucson, AZ, pp. 186–187 (2017)
Schmid, S., Gerostathopoulos, I., Prehofer, C.: QryGraph: a graphical tool for Big Data analytics. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, pp. 004028–004033 (2016)
Grolinger, K., Hayes, M., Higashino, W.A., L’Heureux, A., Allison, D.S., Capretz, M.A.M.: Challenges for MapReduce in Big Data. In: 2014 IEEE World Congress on Services, Anchorage, AK, pp. 182–189 (2014)
Jayasingh, B.B., Patra, M.R., Mahesh, D.B.: Security issues and challenges of big data analytics and visualization. In: 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I), Noida, pp. 204–208 (2016)
Liu, Q., Ribeiro, B., Sung, A.H., Suryakumar, D.: Mining the Big Data: the critical feature dimension problem. In: 2014 IIAI 3rd International Conference on Advanced Applied Informatics, Kitakyushu, pp. 499–504 (2014)
Alam, A., Ahmed, J.: Hadoop architecture and its issues. In: 2014 International Conference on Computational Science and Computational Intelligence, Las Vegas, NV, pp. 288–291 (2014)
Hunckle, M., Article: This open-source AI voice assistant is challenging Siri and Alexa for market superiority. https://www.forbes.com/sites/matthunckler/2017/05/15/this-open-source-ai-voice-assistant-is-challenging-siri-and-alexa-for-market-superiority/#ed2d9e63ec01
Guiu, J.M.: Using latent semantic analyses and propositionalist methods in text comprehension. In: 2017 Computing Conference, London, pp. 187–191 (2017)
Geng, R., Jian, P., Zhang, Y., Huang, H.: Implicit discourse relation identification based on tree structure neural network. In: 2017 International Conference on Asian Language Processing (IALP), Singapore, pp. 334–337 (2017)
Punuru, J., Chen, J.: Learning taxonomical relations from domain texts using WordNet and word sense disambiguation. In: 2012 IEEE International Conference on Granular Computing, Hangzhou, China, pp. 382–387 (2012)
Cabezudo, M.A.S., Palomino, N.L.S., Perez, R.M.: Improving subjectivity detection for Spanish texts using subjectivity word sense disambiguation based on knowledge. In: 2015 Latin American Computing Conference (CLEI), Arequipa, pp. 1–7 (2015)
Shi, Z.: The design and implementation of domain-specific text summarization system based on co-reference resolution algorithm. In: 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, Yantai, Shandong, pp. 2390–2394 (2010)
Sleeman, J., Finin, T.: Type prediction for efficient coreference resolution in heterogeneous semantic graphs. In: 2013 IEEE Seventh International Conference on Semantic Computing, Irvine, CA, pp. 78–85 (2013)
Eletriby, M.R., Reynolds, T.L., Jain, R., Zheng, K.: Investigating named entity recognition of contextual information in online consumer health text. In: 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, pp. 396–402 (2017)
Yang, P., Chen, Y.: A survey on sentiment analysis by using machine learning methods. In: 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, pp. 117–121 (2017)
Tang, Y., Wu, X.: Scene text detection using superpixel based stroke feature transform and deep learning based region classification. In IEEE Transactions on Multimedia
Zhu, F., Liu, Q., Zhang, X., Shen, B.: Protein interaction network constructing based on text mining and reinforcement learning with application to prostate cancer. IET Syst. Biol. 9(4), 106–112 (2015)
Ali, I., Melton, A.: Semantic-based text document clustering using cognitive semantic learning and graph theory. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, pp. 243–247 (2018)
Tulu, C., Orhan, U.: PageRank based semantic similarity measure on a graph based Turkish WordNet. In: 2017 International Conference on Computer Science and Engineering (UBMK), Antalya, pp. 468–473 (2017)
Liu, H., Komandur, R., Verspoor, K.: From graphs to events: a subgraph matching approach for information extraction from biomedical text. In: Proceedings of the BioNLP Shared Task 2011 Workshop, pp. 164–172 (2011)
Al-Zaidy, R.A., Giles, C.L.: Extracting semantic relations for scholarly knowledge base construction. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, pp. 56–63 (2018)
Zhao, G., Zhang, X.: A domain-specific web document re-ranking algorithm. In: 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), Hamamatsu, pp. 385–390 (2017)
Fulda, J., Brehmel, M., Munzner, T.: TimeLineCurator: interactive authoring of visual timelines from unstructured text. IEEE Trans. Visual Comput. Graph. 22(1), 300–309 (2016)
Nafari, M., Weaver, C.: Query2Question: translating visualization interaction into natural language. IEEE Trans. Visual Comput. Graph. 21(6), 756–769 (2015)
Ramisa, A., Yan, F., Moreno-Noguer, F., Mikolajczyk, K.: BreakingNews: article annotation by image and text processing. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1072–1085 (2018)
Ki, W., Kim, K.: Generating information relation matrix using semantic patent mining for technology planning: a case of nano-sensor. IEEE Access 5, 26783–26797 (2017)
Poria, S., Cambria, E., Gelbukh, A., Bisio, F., Hussain, A.: Sentiment data flow analysis by means of dynamic linguistic patterns. IEEE Comput. Intell. Mag. 10(4), 26–36 (2015)
Tang, D., Wei, F., Qin, B., Yang, N., Liu, T., Zhou, M.: Sentiment embeddings with applications to sentiment analysis. IEEE Trans. Knowl. Data Eng. 28(2), 496–509 (2016)
Vioulès, M.J., Moulahi, B., Azé, J., Bringay, S.: Detection of suicide-related posts in Twitter data streams. IBM J. Res. Dev. 62(1), 7:1–7:12 (2018)
Qiu, L., Lei, Q., Zhang, Z.: Advanced sentiment classification of tibetan microblogs on smart campuses based on multi-feature fusion. IEEE Access 6, 17896–17904 (2018)
Yu, L.C., Wang, J., Lai, K.R., Zhang, X.: Refining word embeddings using intensity scores for sentiment analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 26(3), 671–681 (2018)
Salas, J.: Generating music from literature using topic extraction and sentiment analysis. IEEE Potentials 37(1), 15–18 (2018)
Fang, Y., Tan, H., Zhang, J.: Multi-strategy sentiment analysis of consumer reviews based on semantic fuzziness. IEEE Access 6, 20625–20631 (2018)
Sahare, P., Dhok, S.B.: Multilingual character segmentation and recognition schemes for indian document images. IEEE Access 6, 10603–10617 (2018)
Rodriguez, T., Aguilar, J.: Knowledge extraction system from unstructured documents. IEEE Latin Am. Trans. 16(2), 639–646 (2018)
Hassan, A., Mahmood, A.: Convolutional recurrent deep learning model for sentence classification. IEEE Access 6, 13949–13957 (2018)
Wu, D., Chi, M.: Long short-term memory with quadratic connections in recursive neural networks for representing compositional semantics. IEEE Access 5, 16077–16083 (2017)
Thenmozhi, D., Aravindan, C.: Paraphrase identification by using clause-based similarity features and machine translation metrics. Comput. J. 59(9), 1289–1302 (2016)
Whitehead, N.P., Scherer, W.T., Smith, M.C.: Use of natural language processing to discover evidence of systems thinking. IEEE Syst. J. 11(4), 2140–2149 (2017)
Dilawari, A., Khan, M.U.G., Farooq, A., Rehman, Z.U., Rho, S., Mehmood, I.: Natural language description of video streams using task-specific feature encoding. IEEE Access 6, 16639–16645 (2018)
Etter, D., Domeniconi, C.: Multi2Rank: multimedia multiview ranking. In: 2015 IEEE International Conference on Multimedia Big Data, Beijing, pp. 80–87 (2015)
Huang, Y.T., Tseng, Y.M., Sun, Y.S., Chen, M.C.: TEDQuiz: automatic quiz generation for TED talks video clips to assess listening comprehension. In: 2014 IEEE 14th International Conference on Advanced Learning Technologies, Athens, pp. 350–354 (2014)
Kucuktunc, O., Gudukbay, U., Ulusoy, O.: A natural language-based interface for querying a video database. IEEE Multimed. 14(1), 83–89 (2007)
Pouyanfar, S., Chen, S.C., Shyu, M.L.: An efficient deep residual-inception network for multimedia classification. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, pp. 373–378 (2017)
Wlodarczak, P., Soar, J., Ally, M.: Multimedia data mining using deep learning. In: 2015 Fifth International Conference on Digital Information Processing and Communications (ICDIPC), Sierre, pp. 190–196 (2015)
Zhang, D., Nunamaker, J.F.: A natural language approach to content-based video indexing and retrieval for interactive e-learning. IEEE Trans. Multimed. 6(3), 450–458 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Shruthi, J., Swamy, S. (2019). Effectiveness of Recent Research Approaches in Natural Language Processing on Data Science-An Insight. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds) Computational and Statistical Methods in Intelligent Systems. CoMeSySo 2018. Advances in Intelligent Systems and Computing, vol 859. Springer, Cham. https://doi.org/10.1007/978-3-030-00211-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-00211-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00210-7
Online ISBN: 978-3-030-00211-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)