Adaptive System for Handling Variety in Big Text

Pathak, Shantanu; Rajeshwar Rao, D.

doi:10.1007/978-981-10-5523-2_28

Adaptive System for Handling Variety in Big Text

Shantanu Pathak⁶ &
D. Rajeshwar Rao⁶

Conference paper
First Online: 24 October 2017

1192 Accesses
3 Citations

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 19))

Abstract

Today in every corporate, banking, judicial, or medical ecosystem varieties of text are generated like customer reviews, product manuals, white papers, system logs, and usage data. They vary in language, size, context, and formats. Handling such text using a single system is still a challenge. Traditionally, systems exist to handle each specific part of generated text, separately. So this work proposes a concrete step toward integrated solution to the challenge. The proposed system handles text with different formats, sizes, languages, and context seamlessly, encompassing text generated across the ecosystem. Implementation over heterogeneous dataset of text shows promising results. This integrated approach empowers analytics with an extra edge to learn hidden relational and contextual patterns over complete system.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

S. Kaisler, F. Armour, J. A. Espinosa, and W. Money, “Big data: Issues and challenges moving forward,” in System Sciences (HICSS), 2013 46th Hawaii International Conference on. IEEE, 2013, pp. 995–1004.
Google Scholar
B. Shneiderman and C. Plaisant, “Sharpening analytic focus to cope with big data volume and variety,” Computer Graphics and Applications, IEEE, vol. 35, no. 3, pp. 10–14, 2015.
Article Google Scholar
A. Sarker and G. Gonzalez, “Portable automatic text classification for adverse drug reaction detection via multi-corpus training,” Journal of biomedical informatics, vol. 53, pp. 196–207, 2015.
Article Google Scholar
Y. Zheng, W. Han, and C. Zhu, “A novel feature selection method based on category distribution and phrase attributes,” in Trustworthy Computing and Services. Springer, 2014, pp. 25–32.
Google Scholar
C.-P. Wei, C.-S. Yang, C.-H. Lee, H. Shi, and C. C. Yang, “Exploiting poly-lingual documents for improving text categorization effectiveness,” Decision Support Systems, vol. 57, pp. 64–76, 2014.
Article Google Scholar
W. Fan and A. Bifet, “Mining big data: current status, and forecast to the future,” ACM sIGKDD Explorations Newsletter, vol. 14, no. 2, pp. 1–5, 2013.
Article Google Scholar
F. Noorbehbahani, S. R. Mousavi, and A. Mirzaei, “An incremental mixed data clustering method using a new distance measure,” Soft Computing, vol. 19, no. 3, pp. 731–743, 2015.
Article Google Scholar
Z. Tufekci, “Big questions for social media big data: Representativeness, validity and other methodological pitfalls,” arXiv preprint arXiv:1403.7400, 2014.
T. Nguyen, D. Phung, B. Adams, and S. Venkatesh, “Mood sensing from social media texts and its applications,” Knowledge and information systems, vol. 39, no. 3, pp. 667–702, 2014.
Article Google Scholar
R. Zuech, T. M. Khoshgoftaar, and R. Wald, “Intrusion detection and big heterogeneous data: A survey,” Journal of Big Data, vol. 2, no. 1, pp. 1–41, 2015.
Article Google Scholar
Z. Miller, B. Dickinson, W. Deitrick, W. Hu, and A. H. Wang, “Twitter spammer detection using data stream clustering,” Information Sciences, vol. 260, pp. 64–73, 2014.
Article Google Scholar
J. Staš, J. Juhár, and D. Hládek, “Classification of heterogeneous text data for robust domain-specific language modeling,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2014, no. 1, pp. 1–12, 2014.
Article Google Scholar
A. Barua, S. W. Thomas, and A. E. Hassan, “What are developers talking about? an analysis of topics and trends in stack overflow,” Empirical Software Engineering, vol. 19, no. 3, pp. 619–654, 2014.
Article Google Scholar
J. Tang, M. Qu, and Q. Mei, “Pte: Predictive text embedding through large-scale heterogeneous text networks,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015, pp. 1165–1174.
Google Scholar
A. N. Harutyunyan, A. V. Poghosyan, N. M. Grigoryan, and M. A. Marvasti, “Abnormality analysis of streamed log data,” in Network Operations and Management Symposium (NOMS), 2014 IEEE. IEEE, 2014, pp. 1–7.
Google Scholar
S. Baccianella, A. Esuli, and F. Sebastiani, “Using micro-documents for feature selection: The case of ordinal text classification,” Expert Systems with Applications, vol. 40, no. 11, pp. 4687–4696, 2013.
Article Google Scholar
Q. Wang, Y. Qian, R. Song, Z. Dou, F. Zhang, T. Sakai, and Q. Zheng, “Mining subtopics from text fragments for a web query,” Information retrieval, vol. 16, no. 4, pp. 484–503, 2013.
Article Google Scholar
A. Tagarelli and G. Karypis, “A segment-based approach to clustering multi-topic documents,” Knowledge and information systems, vol. 34, no. 3, pp. 563–595, 2013.
Article Google Scholar
A. Awajan, “Semantic similarity based approach for reducing arabic texts dimensionality,” International Journal of Speech Technology, pp. 1–11, 2015.
Google Scholar
J. Tang, X. Wang, H. Gao, X. Hu, and H. Liu, “Enriching short text representation in microblog for clustering,” Frontiers of Computer Science, vol. 6, no. 1, pp. 88–101, 2012.
MATH MathSciNet Google Scholar
Y. Man, “Feature extension for short text categorization using frequent term sets,” Procedia Computer Science, vol. 31, pp. 663–670, 2014.
Article Google Scholar
X. Ni, X. Quan, Z. Lu, L. Wenyin, and B. Hua, “Short text clustering by finding core terms,” Knowledge and information systems, vol. 27, no. 3, pp. 345–365, 2011.
Article Google Scholar
B.-k. Wang, Y.-f. Huang, W.-x. Yang, and X. Li, “Short text classification based on strong feature thesaurus,” Journal of Zhejiang University SCIENCE C, vol. 13, no. 9, pp. 649–659, 2012.
Article Google Scholar
D. D. R. R. S Pathak, “Message manager (mm): A novel sms classification system,” International Journal of Advanced Computer Communications and Control, vol. 02, no. 02, p. 2, april 2014.
Google Scholar
K. P. Chand and G. Narsimha, “An integrated approach to improve the text categorization using semantic measures,” in Computational Intelligence in Data Mining-Volume 2. Springer, 2015, pp. 39–47.
Google Scholar
F. Ren and M. G. Sohrab, “Class-indexing-based term weighting for automatic text classification,” Information Sciences, vol. 236, pp. 109–125, 2013.
Article Google Scholar
D. Badawi and H. Altınçay, “A novel framework for termset selection and weighting in binary text classification,” Engineering Applications of Artificial Intelligence, vol. 35, pp. 38–53, 2014.
Article Google Scholar
X. Huang and Q. Wu, “Micro-blog commercial word extraction based on improved tf-idf algorithm,” in TENCON 2013-2013 IEEE Region 10 Conference (31194). IEEE, 2013, pp. 1–5.
Google Scholar
N. Chirawichitchai, “Developing term weighting scheme based on term occurrence ratio for sentiment analysis,” in Information Science and Applications. Springer, 2015, pp. 737–744.
Google Scholar
J. Zhang, L. Chen, and G. Guo, “Projected-prototype based classifier for text categorization,” Knowledge-Based Systems, vol. 49, pp. 179–189, 2013.
Article Google Scholar
D. D. R. R. S Pathak, “Extensive study on text representation models in text mining,” IJAER, vol. 10, no. 13, pp. 32 967–32 973, Oct 2015.
Google Scholar
G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information processing & management, vol. 24, no. 5, pp. 513–523, 1988.
Article Google Scholar
X. Zhou, Y. Hu, and L. Guo, “Text categorization based on clustering feature selection,” Procedia Computer Science, vol. 31, pp. 398–405, 2014.
Article Google Scholar
S. Jun, S.-S. Park, and D.-S. Jang, “Document clustering method using dimension reduction and support vector clustering to overcome sparseness,” Expert Systems with Applications, vol. 41, no. 7, pp. 3204–3212, 2014.
Article Google Scholar
T. A. Almeida, J. M. G. Hidalgo, and A. Yamakami, “Contributions to the study of sms spam filtering: New collection and results,” in Proceedings of the 11th ACM Symposium on Document Engineering, ser. DocEng ’11. New York, NY, USA: ACM, 2011, pp. 259–262. [Online]. Available: doi:10.1145/2034691.2034742
I. Androutsopoulos, J. Koutsias, K. V. Chandrinos, G. Paliouras, and C. D. Spyropoulos, “An evaluation of naive bayesian anti-spam filtering,” arXiv preprint arXiv:cs/0006013, 2000.

Download references

Author information

Authors and Affiliations

CSE Department, K L University (K L Education Foundation), Vijayawada, India
Shantanu Pathak & D. Rajeshwar Rao

Authors

Shantanu Pathak
View author publications
You can also search for this author in PubMed Google Scholar
D. Rajeshwar Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shantanu Pathak .

Editor information

Editors and Affiliations

Department of Computer Science and Information Management, Providence University, Taichung City, Taiwan
Yu-Chen Hu
CSED, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Shailesh Tiwari
Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Allahabad, Uttar Pradesh, India
Krishn K. Mishra
Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, India
Munesh C. Trivedi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pathak, S., Rajeshwar Rao, D. (2018). Adaptive System for Handling Variety in Big Text. In: Hu, YC., Tiwari, S., Mishra, K., Trivedi, M. (eds) Intelligent Communication and Computational Technologies. Lecture Notes in Networks and Systems, vol 19. Springer, Singapore. https://doi.org/10.1007/978-981-10-5523-2_28

Download citation

DOI: https://doi.org/10.1007/978-981-10-5523-2_28
Published: 24 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5522-5
Online ISBN: 978-981-10-5523-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics