Advertisement

TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data

  • Rabia Irfan
  • Sharifullah Khan
  • Kashif Rajpoot
  • Ali Mustafa Qamar
Article
  • 19 Downloads

Abstract

Taxonomy is generated to effectively organize and access large volume of data. A taxonomy is a way of representing concepts that exist in data. It needs to continuously evolve to reflect changes in data. Existing automatic taxonomy generation techniques do not handle the evolution of data; therefore, the generated taxonomies do not truly represent the data. The evolution of data can be handled by either regenerating taxonomy from scratch, or allowing taxonomy to incrementally evolve whenever changes occur in the data. The former approach is not economical in terms of time and resources. A taxonomy incremental evolution (TIE) algorithm, as proposed, is a novel attempt to handle the data that evolve in time. It serves as a layer over an existing clustering-based taxonomy generation technique and allows an existing taxonomy to incrementally evolve. The algorithm was evaluated in research articles selected from the computing domain. It was found that the taxonomy using the algorithm that evolved with data needed considerably shorter time, and had better quality per unit time as compared to the taxonomy regenerated from scratch.

Key words

Taxonomy Clustering algorithms Information science Knowledge management Machine learning 

CLC number

TP312 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates R, Ribeiro-Neto B, 2011. Modern Information Retrieval: the Concepts and Technology Behind (2nd Ed.). Pearson Education Limited, New York, USA.Google Scholar
  2. Blumberg R, Atre S, 2003. The problem with unstructured data. DM Rev, 13(2):42–46.Google Scholar
  3. Camiña SL, 2010. A Comparison of Taxonomy Generation Techniques Using Bibliometric Methods: Applied to Research Strategy Formulation. MS Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA.Google Scholar
  4. Carmel D, Roitman H, Zwerdling N, 2009. Enhancing cluster labeling using Wikipedia. Proc 32nd Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.139–146. https://doi.org/10.1145/1571941.1571967 Google Scholar
  5. Cha SH, 2007. Comprehensive survey on distance/similarity measures between probability density functions. Int J Math Models Methods Appl Sci, 1(4):300–307.MathSciNetGoogle Scholar
  6. Cimiano P, Hotho A, Staab S, 2005. Learning concept hierarchies from text corpora using formal concept analysis. J Artif Intell Res, 24(1):305–339.CrossRefzbMATHGoogle Scholar
  7. Dawelbait G, Mezher T, Woon WL, et al., 2010. Taxonomy based trend discovery of renewable energy technologies in desalination and power generation. Proc Technology Management for Global Economic Growth, p.1–8.Google Scholar
  8. Deerwester S, Dumais ST, Furnas GW, et al., 1990. Indexing by latent semantic analysis. J Am Soc Inform Sci Technol, 41(6):391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 CrossRefGoogle Scholar
  9. Dietz EA, Vandic D, Frasincar F, 2012. TaxoLearn: a semantic approach to domain taxonomy learning. Proc IEEE/ WIC/ACM Int Conf on Web Intelligence and Intelligent Agent Technology, p.58–65. https://doi.org/10.1109/WI-IAT.2012.129 Enhanced Taxonomy Generation. USA Patent 20 100 274 733.Google Scholar
  10. Fountain T, Lapata M, 2012. Taxonomy induction using hierarchical random graphs. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.466–476.Google Scholar
  11. Glover E, Pennock DM, Lawrence S, et al., 2002. Inferring hierarchical descriptions. Proc 11th Int Conf on Information and Knowledge Management, p.507–514. https://doi.org/10.1145/584792.584876 Google Scholar
  12. Hedden H, 2010. The Accidental Taxonomist. Information Today Inc., Medford, New Jersey, USA, p.18–28.Google Scholar
  13. Irfan R, Khan S, 2016. TIE: an algorithm for incrementally evolving taxonomy for text data. Proc 15th IEEE Int Conf on Machine Learning and Applications, p.687–692. https://doi.org/10.1109/ICMLA.2016.0121 Google Scholar
  14. Jain AK, Murty MN, Flynn PJ, 1999. Data clustering: a review. ACM Comput Surv, 31(3):264–323. https://doi.org/10.1145/331499.331504 CrossRefGoogle Scholar
  15. Kashyap V, Ramakrishnan C, Thomas C, et al., 2005. TaxaMiner: an experimentation framework for automated taxonomy bootstrapping. Int J Web Grid Serv, 1(2): 240–266. https://doi.org/10.1504/IJWGS.2005.008322 CrossRefGoogle Scholar
  16. Koff W, Gustafson P, 2011. Data Revolution. Technical Report, Computer Sciences Corporation Leading Edge Forum.Google Scholar
  17. Kumar AA, Chandrasekhar S, 2012. Text data pre-processing and dimensionality reduction techniques for document clustering. Int J Eng Res Technol, 1(5):1–6.Google Scholar
  18. Lefever E, 2015. LT3: a multi-modular approach to automatic taxonomy construction. Proc 9th Int Workshop on Semantic Evaluation, p.944–948.CrossRefGoogle Scholar
  19. Li T, Anand SS, 2009. Exploiting domain knowledge by automated taxonomy generation in recommender systems. Proc 10th Int Conf on E-commerce and Web Technologies, p.120–131.CrossRefGoogle Scholar
  20. Manning CD, Raghavan P, Schütze H, 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.CrossRefzbMATHGoogle Scholar
  21. Marcacini RM, Rezende SO, 2010. Incremental construction of topic hierarchies using hierarchical term clustering. Proc 22nd Int Conf on Software Engineering and Knowledge Engineering, p.553–558.Google Scholar
  22. Medelyan O, Manion S, Broekstra J, et al., 2013. Constructing a focused taxonomy from a document collection. Proc 10th Int Conf on the Semantic Web: Semantics and Big Data, p.367–381. https://doi.org/10.1007/978-3-642-38288-8_25 CrossRefGoogle Scholar
  23. Meijer K, Frasincar F, Hogenboom F, 2014. A semantic approach for extracting domain taxonomies from text. Dec Support Syst, 62:78–93. https://doi.org/10.1016/j.dss.2014.03.006 CrossRefGoogle Scholar
  24. Muller A, Dorre J, Gerstl P, et al., 1999. The TaxGen framework: automating the generation of a taxonomy for a large document collection. Proc 32nd Annual Hawaii Int Conf on Systems Sciences, Article 2034.Google Scholar
  25. Nadkarni PM, Ohno-Machado L, Chapman WW, 2011. Natural language processing: an introduction. J Am Med Inform Assoc, 18(5):544–551. https://doi.org/10.1136/amiajnl-2011-000464 CrossRefGoogle Scholar
  26. Neshati M, Alijamaat A, Abolhassani H, et al., 2007. Taxonomy learning using compound similarity measure. Proc IEEE/WIC/ACM Int Conf on Web Intelligence, p.487–490. https://doi.org/10.1109/WI.2007.135 CrossRefGoogle Scholar
  27. Paukkeri MS, García-Plaza AP, Fresno V, et al., 2012. Learning a taxonomy from a set of text documents. Appl Soft Comput, 12(3):1138–1148. https://doi.org/10.1016/j.asoc.2011.11.009 CrossRefGoogle Scholar
  28. Qi XG, Yin DW, Xue ZZ, et al., 2010. Choosing your own adventure: automatic taxonomy generation to permit many paths. Proc 19th ACM Int Conf on Information and Knowledge Management, p.1853–1856. https://doi.org/10.1145/1871437.1871746 Google Scholar
  29. Sánchez D, Moreno A, 2004. Automatic generation of taxonomies from the WWW. Proc 5th Int Conf on Practical Aspects of Knowledge Management, p.208–219. https://doi.org/10.1007/978-3-540-30545-3_20 CrossRefGoogle Scholar
  30. Sclano F, Velardi P, 2007. TermExtractor: a web application to learn the common terminology of interest groups and research communities. Proc 3rd Int Conf on Interoperability for Enterprise Software and Applications p.85–94.Google Scholar
  31. Spangler WS, Kreulen JT, Newswanger JF, 2006. Machines in the conversation: detecting themes and trends in informal communication streams. IBM Syst J, 45(4):785–799. https://doi.org/10.1147/sj.454.0785 CrossRefGoogle Scholar
  32. Steinbach M, Karypis G, Kumar V, 2000. A comparison of document clustering techniques. World Text Mining Conf, p.1–2.Google Scholar
  33. Sujatha R, Krishna Rao BR, 2011. Taxonomy construction techniques—issues and challenges. Ind J Comput Sci Eng, 2(5):661–671.Google Scholar
  34. Thada V, Jaglan DV, 2013. Comparison of jaccard, dice, cosine similarity coefficient to find best fitness value for Web retrieved documents using genetic algorithm. IntJ Innov Eng Technol, 2(4):202–205.Google Scholar
  35. Treeratpituk P, Callan J, 2006. Automatically labeling hierarchical clusters. Proc Int Conf on Digital Government Research, p.167–176. https://doi.org/10.1145/1146598.1146650 Google Scholar
  36. Turner V, Gantz J, Reinsel D, 2014. The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things. IDC White Paper, p.1–5. https://doi.org/10.7790/ajtde.v2n3.47 Google Scholar
  37. Velardi P, Faralli S, Navigli R, 2013. OntoLearn reloaded: a graph-based algorithm for taxonomy induction. Comput Ling, 39(3):665–707. https://doi.org/10.1162/COLI_a_00146 CrossRefGoogle Scholar
  38. Weng SS, Liu CK, 2004. Using text classification and multiple concepts to answer e-mails. Expert Syst Appl, 26(4): 529–543. https://doi.org/10.1016/j.eswa.2003.10.011 CrossRefGoogle Scholar
  39. Yang HC, Lee CH, Hsiao HW, 2015. Incorporating selforganizing map with text mining techniques for text hierarchy generation. Appl Soft Comput, 34:251–259. https://doi.org/10.1016/j.asoc.2015.05.005 CrossRefGoogle Scholar
  40. Yao JJ, Cui B, Cong G, et al., 2012. Evolutionary taxonomy construction from dynamic tag space. World Wide Web, 15(5-6):581–602. https://doi.org/10.1007/s11280-011-0150-4 CrossRefGoogle Scholar

Copyright information

© Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Electrical Engineering and Computer ScienceNational University of Sciences and TechnologyIslamabadPakistan
  2. 2.School of Computer ScienceUniversity of BirminghamBirminghamUK
  3. 3.Department of Computer Science, College of ComputerQassim University, Al MulaidaBuraydahSaudi Arabia

Personalised recommendations