Abstract
Today’s high-dimensional data, which is mostly unstructured, makes data patterns discovery (a.k.a. data mining) challenging and difficult for services engineers. Unstructured data mining deviates from existing information extraction methodologies that have been previously put forward due to the fact that recent data formation and storage has no standard schema; and the data is heterogeneous. At the storage level, the NoSQL database has been proposed as a preferred technology to accommodate the high-dimensional data, and the technology has received significant enterprise adoption. At the technology level, the query style of NoSQL databases differ from schema-based storages such as the RDBMS. Currently, there is lack of tools, technologies, and methodologies that can aid the community to support data patterns discovery in the big data epoch. Previously, an Analytics-as-a-Service (AaaS) framework is proposed for terms mining in document-based NoSQL systems. In this chapter, we provide comprehensive views about the performance of several algorithms that have been employed to achieve the topics and terms mining tasks. This chapter is a reproduction of several proposed algorithms which can enable the software engineering community to realize what has been done regarding the enhancement of accuracy of terms mining form document-based NoSQL systems.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
M.R. Wigan, R. Clarke, Big data’s big unintended consequences. Computer 46(6), 46–53 (2013). doi:10.1109/MC.2013.195
R. Akerkar, C. Badica, C. B. Burdescu, Desiderata for research in web intelligence, mining and semantics, in Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics (WIMS '12). ACM, New York, NY, USA, Article 0, 5 pages. DOI= 10.1145/2254129.2254131 http://doi.acm.org/10.1145/2254129.2254131
P. C. Zikopoulos, C. Eaton, D. de Roos, T. Deutsch, G. Lapis, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, Published by McGraw-Hill Companies, 2012. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Big%20Data%20University/page/FREE%20ebook%20-%20Understanding%20Big%20Data
K. Rupanagunta, D. Zakkam, H. Rao, How to Mine Unstructured Data, Article in Information Management, June 29 2012, http://www.information-management.com/newsletters/data-mining-unstructured-big-data-youtube--10022781-1.html
IBM Research, Analytics-as-a-Service Platform, Available: http://researcher.ibm.com/researcher/view_project.php?id=3992
J. Sequeda, D. P. Miranker, “Linked Data,” Linked Data tutorial at Semtech 2012, Jun 07, 2012. Available: http://www.slideshare.net/juansequeda/linked-data-tutorial-at-semtech-2012
Google Knowledge Graph, Available: http://www.google.ca/insidesearch/features/search/knowledge.html
NoSQL, http://nosql-database.org/
EMC, EMC Accelerates Journey to Big Data with Business Analytics-as-a-Service, http://www.emc.com/collateral/white-papers/h11259-emc-accelerates-journey-big-data-ba-wp.pdf
SAS, Analytics as a Service: Customer Experiences, http://www.sas.com/offices/europe/uk/resources/brochure/aaas_research_brief.pdf
X. Sun, B. Gao, L. Fan, W. An, A Cost-Effective Approach to Delivering Analytics as a Service, IEEE 19th International Conference on Web Services (ICWS 2012), vol., no., pp.512,519, 24–29 June 2012, doi: 10.1109/ICWS.2012.79
P. Deepak, P. M. Deshpande, K. Murthy, Configurable and Extensible Multi-flows for Providing Analytics as a Service on the Cloud, 2012 Annual SRII Global Conference (SRII), vol., no., pp.1,10, 24–27 July 2012, doi: 10.1109/SRII.2012.11
D. Keim, J. Kohlhammer, G. Ellis, F. Mansmann, Mastering the Information Age Solving Problems with Visual Analytics, Printed in Germany, Druckhaus “Thomas Müntzer” GmbH, Bad Langensalza ISBN 978-3-905673-77-7
F. S. Gharehchopogh, Z. A. Khalifelu, Analysis and evaluation of unstructured data: text mining versus natural language processing, Application of Information and Communication Technologies (AICT), 2011 5th International Conference on, vol., no., pp.1–4, 12–14 Oct. 2011, doi: 10.1109/ICAICT.2011.6111017
V. Tunali, T. T. Bilgin, PRETO: A High-performance Text Mining Tool for Preprocessing Turkish Texts, 2012 International Conference on Computer Systems and Technologies
S.V. Vinchurkar, S.M. Nirkhi, Feature extraction of product from customer feedback through blog. Int. J. Emerg. Technol. Adv. Eng. 2(1), 314–323 (2012). ISSN 2250-2459
D. Kuonen, Challenges in bioinformatics for statistical data miners. Bull. Swiss Stat. Soc. 46, 10–17 (2003)
J. Y. Hsu, W. Yih, Template-Based Information Mining from HTML Documents, American Association for Artificial Intelligence, July 1997
M. Delgado, M. Martín-Bautista, D. Sánchez, M. Vila, Mining Text Data: Special Features and Patterns, Pattern Detection and Discovery, Lecture Notes in Computer Science, 2002, Volume 2447/2002, 175-186, DOI: 10.1007/3-540-45728-3_11
Q. Zhao, S. S. Bhowmick, Association Rule Mining: A Survey, Technical Report, CAIS, Nanyang Technological University, Singapore, No. 2003116, 2003
W. Abramowicz, T. Kaczmarek, M. Kowalkiewicz, Supporting topic map creation using data mining techniques. Aust. J. Inf. Syst. 11(1), 63–78 (2003)
B. Janet, A. V. Reddy, Cube index for unstructured text analysis and mining, in Proceedings of the 2011 International Conference on Communication, Computing & Security (ICCCS '11). ACM, New York, NY, USA, 397–402
L. Han, T.O. Suzek, Y. Wang, S.H. Bryant, The text-mining based PubChem Bioassay neighboring analysis. BMC Bioinformatics 11, 549 (2010). doi:10.1186/1471-2105-11-549
L. Dey, S. K. M. Haque, Studying the effects of noisy text on text mining applications, in Proceedings of the Third Workshop on Analytics for Noisy Unstructured Text Data (AND '09). ACM, New York, NY, USA, 107–114
S. Godbole, I. Bhattacharya, A. Gupta, A. Vea, Building re-usable dictionary repositories for real-world text mining, in Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10). ACM, New York, NY, USA, 1189–1198
R. Feldman, M. Fresko, H. Hirsh, Y. Aumann, O. Liphstat, Y. Schler, M. Rajman, Knowledge Management: A Text Mining Approach, Proc. of the 2nd Int. Conf. on Practical Aspects of Knowledge Management (PAKM98), (Basel, Switzerland, 29–30 Oct 1998)
R. Feldman, M. Fresko, Y. Kinar, Y. Lindell, O. Liphstat, M. Rajman, Y. Schler, O. Zamir, Text mining at the term level, Proc. of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD'98)
J. C. Scholtes, Text-Mining: The next step in search technology, DESI-III Workshop Barcelona, June 8, 2009
J. Lee, D. Grossman, O. Frieder, M. C. Mccabe, Integrating structured data and text: a multi-dimensional approach, Proc. of Information Technology: Coding and Computing, 2000. International Conference on, vol., no., pp. 264–269, 2000
V. Gupta, G.S. Lehal, A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1), 60–76 (2009)
R.K. Lomotey, R. Deters, Analytics-as-a-Service framework for terms association mining in unstructured data. Int. J. Bus. Process Integrat. Manag. 7(1), 49–61 (2014)
Y. Gu, C. Kallas, J. Zhang, J. Marx, J. Tjoe, Automatic Patient Search Using Bernoulli Model. in Proc. of 2013 I.E. International Conference on Healthcare Informatics (ICHI 2013), pp. 517–522, Sept 9–11 2013, (Philadelphia, PA, USA, 2013)
R. K. Lomotey, R. Deters, Terms extraction from unstructured data silos, 8th International Conference on System of Systems Engineering (SoSE 13), (2013) pp. 19–24, 2–6 June 2013, doi: 10.1109/SYSoSE.2013.6575236
T. Scheffer, C. Decomain, S. Wrobel, Mining the Web with active hidden Markov models, ICDM 2001, Proceedings IEEE International Conference on Data Mining, vol., no., pp. 645–646, 2001, doi: 10.1109/ICDM.2001.989591
S. Mukherjee, S.J. Mitra, Hidden Markov Models, grammars, and biology: a tutorial. J. Bioinform. Comput. Biol. 3(2), 491–526 (2005)
R. K. Lomotey, R. Deters, Data Mining from NoSQL Document-Append Style Storages. Proc. of the 2014 I.E. International Conference on Web Services (ICWS 2014), pp. 385–392, June 27–July 02, 2014, (Anchorage, Alaska, USA, 2014)
R. K. Lomotey, R. Deters, RSenter: tool for topics and terms extraction from unstructured data debris. Proc. of the 2013 I.E. International Congress on Big Data, pp. 395–402, Santa Clara, California, 27 June–2 July 2013
S. Haiduc, G. Bavota, R. Oliveto, A. de Lucia, A. Marcus, Automatic Query Performance Assessment during the Retrieval of Software Artifacts, Automated Software Engineering 2012 (ASE ’12), September 3–7, 2012, Essen, Germany
A. Balinsky, H. Balinsky, S. Simske, On the Helmholtz Principle for Data Mining, Published by Hewlett-Packard Development Company, L.P. (2010). Available: http://www.hpl.hp.com/techreports/2010/HPL-2010-133.pdf
Erlang Programing Language, http://www.erlang.org/
Acknowledgement
• Special thanks to grad students in the MADMUC Lab, University of Saskatchewan.
• Thanks to Prof. Patrick Hung of the IT Security Unit, University of Ontario Institute of Technology.
• Final thanks to the Editors and Reviewers of this chapter for their feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Lomotey, R.K., Deters, R. (2016). Unstructured Data, NoSQL, and Terms Analytics. In: Hung, P. (eds) Big Data Applications and Use Cases. International Series on Computer Entertainment and Media Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-30146-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-30146-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30144-0
Online ISBN: 978-3-319-30146-4
eBook Packages: Computer ScienceComputer Science (R0)