Advertisement

Data Mining

  • Ke-Lin DuEmail author
  • M. N. S. Swamy
Chapter

Abstract

The wealth of information in huge databases or the Web has aroused tremendous interest in the area of data mining, also known as knowledge discovery in databases. This chapter introduces data mining. We first introduce neural network approach to data mining, and then address various data mining and information retrieval problems on the web.

References

  1. 1.
    Aggarwal, C. C., Gates, S. C., & Yu, P. S. (2004). On using partial supervision for text categorization. IEEE Transactions on Knowledge and Data Engineering, 16(2), 245–255.CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C. C., & Yu, P. S. (2009). A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 21(5), 609–623.CrossRefGoogle Scholar
  3. 3.
    Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (pp. 487–499). Santiago, Chile.Google Scholar
  4. 4.
    Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering (pp. 3–14). Tapei, Taiwan.Google Scholar
  5. 5.
    Allen, D., & Darwiche, A. (2008). RC_Link: Genetic linkage analysis using Bayesian networks. International Journal of Approximate Reasoning, 48, 499–525.CrossRefGoogle Scholar
  6. 6.
    Bekkerman, R., El-Yaniv, R., Tishby, N., & Winter, Y. (2003). Distributional word clusters vs. words for text categorization. Journal of Machine Learning Research, 3, 1183–1208.zbMATHGoogle Scholar
  7. 7.
    Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.zbMATHGoogle Scholar
  8. 8.
    Bollen, J., Rodriguez, M. A., & de Sompel, H. V. (2006). Journal status. Scientometrics, 69(3), 669–687.CrossRefGoogle Scholar
  9. 9.
    Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference (WWW) (pp. 107–117).Google Scholar
  10. 10.
    Buzzanca, M., Carchiolo, V., Longheu, A., Malgeri, M., & Mangioni, G. (2018). Black hole metric: Overcoming the pagerank normalization problem. Information Sciences, 438, 58–72.MathSciNetCrossRefGoogle Scholar
  11. 11.
    Cai, D., He, X., & Han, J. (2011). Locally consistent concept factorization for document clustering. IEEE Transactions on Knowledge and Data Engineering, 23(6), 902–913.CrossRefGoogle Scholar
  12. 12.
    Calado, P., da Silva, A. S., Vieira, R. C., Laender, A. H. F., & Ribeiro-Neto, B. A. (2002). Searching web databases by structuring keyword-based queries. In Proceedings of the 11th ACM International Conference on Information and Knowledge Management (pp. 26–33). McLean, VA.Google Scholar
  13. 13.
    Cancedda, N., Gaussier, E., Goutte, C., & Renders, J.-M. (2003). Word-sequence kernels. Journal of Machine Learning Research, 3, 1059–1082.MathSciNetzbMATHGoogle Scholar
  14. 14.
    Carterette, B., & Jones, R. (2008). Evaluating search engines by modeling the relationship between relevance and clicks. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems 20 (pp. 217–224). MIT Press.Google Scholar
  15. 15.
    Chang, E., Goh, K., Sychay, G., & Wu, G. (2003). CBSA: Content-based soft annotation for multimodal image retrieval using Bayes point machines. IEEE Transactions on Circuits and Systems for Video Technology, 13(1), 26–38.CrossRefGoogle Scholar
  16. 16.
    Chen, H.-L., Chuang, K.-T., & Chen, M.-S. (2008). On data labeling for clustering categorical data. IEEE Transactions on Knowledge and Data Engineering, 20(11), 1458–1471.CrossRefGoogle Scholar
  17. 17.
    Chirita, P.-A., Nejdl, W., Paiu, R., & Kohlschutter, C. (2005). Using ODP metadata to personalize search. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 178–185). Salvador, Brazil.Google Scholar
  18. 18.
    Chirita, P.-A., Diederich, J., & Nejdl, W. (2005). Mailrank: Using ranking for spam detection. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (pp. 373–380). Bremen, Germany.Google Scholar
  19. 19.
    Chomicki, J. (1995). Efficient checking of temporal integrity constraints using bounded history encoding. ACM Transactions on Database Systems, 20(2), 148–186.CrossRefGoogle Scholar
  20. 20.
    Cilibrasi, R., & Vitanyi, P. (2005). Clustering by compression. IEEE Transactions on Information Theory, 51(4), 1523–1545.MathSciNetzbMATHCrossRefGoogle Scholar
  21. 21.
    Clayton, R. (2007). Email Traffic: A quantitative snapshot. In Proceedings of the 4th Conference on Email and Anti-Spam. Mountain View, CA.Google Scholar
  22. 22.
    Coelho, T. A. S., Calado, P. P., Souza, L. V., Ribeiro-Neto, B., & Muntz, R. (2004). Image retrieval using multiple evidence ranking. IEEE Transactions on Knowledge and Data Engineering, 16(4), 408–417.CrossRefGoogle Scholar
  23. 23.
    Combarro, E. F., Montanes, E., Diaz, I., Ranilla, J., & Mones, R. (2005). Introducing a family of linear measures for feature selection in text categorization. IEEE Transactions on Knowledge and Data Engineering, 17(9), 1223–1232.CrossRefGoogle Scholar
  24. 24.
    Cottrell, M., Ibbou, S., & Letremy, P. (2004). SOM-based algorithms for qualitative variables. Neural Networks, 17, 1149–1167.zbMATHCrossRefGoogle Scholar
  25. 25.
    Cui, H., Wen, J., Nie, J., & Ma, W. (2003). Query expansion by mining user logs. IEEE Transactions on Knowledge and Data Engineering, 15(4), 829–839.CrossRefGoogle Scholar
  26. 26.
    Cunningham, H., Maynard, D., Bontcheva, K., & Tablan, V. (2002). Gate: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 168–175). Philadelphia, PA.Google Scholar
  27. 27.
    Dai, B.-R., Huang, J.-W., Yeh, M.-Y., & Chen, M.-S. (2006). Adaptive clustering for multiple evolving streams. IEEE Transactions on Knowledge and Data Engineering, 18(9), 1166–1180.CrossRefGoogle Scholar
  28. 28.
    Damiani, E., di Vimercati, S. D. C., Paraboschi, S., & Samarati, P. (2004). P2P-based collaborative spam detection and filtering. In Proceedings of the 4th IEEE International Conference on Peer-to-Peer Computing (pp. 176–183). Zurich, Switzerland.Google Scholar
  29. 29.
    Datta, S., Giannella, C. R., & Kargupta, H. (2009). Approximate distributed \(K\)-means clustering over a peer-to-peer network. IEEE Transactions on Knowledge and Data Engineering, 21(10), 1372–1388.CrossRefGoogle Scholar
  30. 30.
    de Campos, L. M., & Romero, A. E. (2009). Bayesian network models for hierarchical text classification from a thesaurus. International Journal of Approximate Reasoning, 50, 932–944.CrossRefGoogle Scholar
  31. 31.
    de Cristo, M. A. P., Calado, P. P., & de Lourdes da Silveira, M., Silva, I., Muntz, R., & Ribeiro-Neto, B., (2003). Bayesian belief networks for IR. International Journal of Approximate Reasoning, 34, 163–179.Google Scholar
  32. 32.
    Deerwester, S. C., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391–407.CrossRefGoogle Scholar
  33. 33.
    De Felipe, I., Hristidis, V., & Rishe, N. (2008). Keyword search on spatial databases. In Proceedings of 24th Int. Conf. Data Eng. (pp. 656–665).Google Scholar
  34. 34.
    Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 269–274). San Francisco, CA.Google Scholar
  35. 35.
    Diligenti, M., Gori, M., & Maggini, M. (2004). A unified probabilistic framework for web page scoring systems. IEEE Transactions on Knowledge and Data Engineering, 16(1), 4–16.CrossRefGoogle Scholar
  36. 36.
    Diligenti, M., Gori, M., & Maggini, M. (2011). A unified representation of web logs for mining applications. Information Retrieval, 14, 215–236.CrossRefGoogle Scholar
  37. 37.
    Ding, C. H. Q., Zha, H., He, X., Husbands, P., & Simon, H. D. (2004). Link analysis: Hubs and authorities on the World Wide Web. SIAM Review, 46(2), 256–268.MathSciNetzbMATHCrossRefGoogle Scholar
  38. 38.
    Dou, Z., Song, R., Wen, J.-R., & Yuan, X. (2009). Evaluating the effectiveness of personalized web search. IEEE Transactions on Knowledge and Data Engineering, 21(8), 1178–1190.CrossRefGoogle Scholar
  39. 39.
    Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support Vector Machines for spam categorization. IEEE Transactions on Neural Networks, 10(5), 1048–1054.CrossRefGoogle Scholar
  40. 40.
    Dumais, S. T., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th ACM International Conference on Information and Knowledge Management (pp. 148–155.). Bethesda, MA.Google Scholar
  41. 41.
    Elmagarmid, A. K., Ipeirotis, P. G., & Verykios, V. S. (2007). Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(1), 1–16.CrossRefGoogle Scholar
  42. 42.
    Flesca, S., Manco, G., Masciari, E., Pontieri, L., & Pugliese, A. (2005). Fast detection of XML structural similarity. IEEE Transactions on Knowledge and Data Engineering, 17(2), 160–175.CrossRefGoogle Scholar
  43. 43.
    Foulds, J. R., Boyles, L., DuBois, C., Smyth, P., & Welling, M. (2013). Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 446–454). Chicago, IL.Google Scholar
  44. 44.
    Fragopanagos, N., & Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Networks, 18(4), 389–406.CrossRefGoogle Scholar
  45. 45.
    Gambs, S., Kegl, B., & Aimeur, E. (2007). Privacy-preserving boosting. Data Mining and Knowledge Discovery, 14, 131–170.MathSciNetCrossRefGoogle Scholar
  46. 46.
    Gao, B., Liu, T.-Y., Liu, Y., Wang, T., Ma, Z.-M., & Li, H. (2011). Page importance computation based on Markov processes. Information Retrieval, 14(5), 488–514.CrossRefGoogle Scholar
  47. 47.
    Golub, K. (2006). Automated subject classification of textual web documents. Journal of Documentation, 62(3), 350–371.CrossRefGoogle Scholar
  48. 48.
    Gou, G., & Chirkova, R. (2007). Efficiently querying large XML data repositories: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(10), 1381–1403.CrossRefGoogle Scholar
  49. 49.
    Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the USA, 101(Suppl. 1), 5228–5235.CrossRefGoogle Scholar
  50. 50.
    Guerrero-Bote, V. P., Lopez-Pujalte, C., de Moya-Anegon, F., & Herrero-Solana, V. (2003). Comparison of neural models for document clustering. International Journal of Approximate Reasoning, 34, 287–305.zbMATHCrossRefGoogle Scholar
  51. 51.
    Guha, S., Meyerson, A., Mishra, N., Motwani, R., & O’Callaghan, L. (2003). Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 15(3), 515–528.CrossRefGoogle Scholar
  52. 52.
    Hammer, B., Micheli, A., Sperduti, A., & Strickert, M. (2004). Recursive self-organizing network models. Neural Networks, 17, 1061–1085.zbMATHCrossRefGoogle Scholar
  53. 53.
    Hammouda, K. M., & Kamel, M. S. (2009). Hierarchically distributed peer-to-peer document clustering and cluster summarization. IEEE Transactions on Knowledge and Data Engineering, 21(5), 681–698.CrossRefGoogle Scholar
  54. 54.
    Haveliwala, T. H. (2002). Topic-sensitive pagerank. In Proceedings of the 11th International World Wide Web Conference (WWW) (pp. 517–526). New York: ACM Press.Google Scholar
  55. 55.
    Haveliwala, T. H. (2003). Topic-sensitive PageRank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4), 784–796.Google Scholar
  56. 56.
    Hoffman, M., Blei, D., & Bach, F. (2010). Online learning for latent Dirichlet allocation. In Advances in neural information processing systems (Vol. 23, pp. 856–864).Google Scholar
  57. 57.
    Hovold, J. (2005). Naive Bayes spam filtering using word-position-based attributes. In Proceedings of the 2nd Conference on Email and Anti-Spam. Palo Alto, CA.Google Scholar
  58. 58.
    Isa, D., Lee, L. H., Kallimani, V. P., & RajKumar, R. (2008). Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Transactions on Knowledge and Data Engineering, 20(9), 1264–1272.CrossRefGoogle Scholar
  59. 59.
    Jarvelin, K., & Kekalainen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’00) (pp. 41–48). Athens, Greece.Google Scholar
  60. 60.
    Jing, Y., & Baluja, S. (2008). VisualRank: Applying PageRank to large-scale image search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1877–1890.CrossRefGoogle Scholar
  61. 61.
    Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of European Conference on Machine Learning, LNCS (Vol. 1398, pp. 137–142). Berlin: Springer Verlag.CrossRefGoogle Scholar
  62. 62.
    Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 133–142). Edmonton, Canada.Google Scholar
  63. 63.
    Kao, H.-Y., Lin, S.-H., Ho, J.-M., & Chen, M.-S. (2004). Mining web informative structures and contents based on entropy analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1), 41–55.Google Scholar
  64. 64.
    Khashman, A. (2008). A modified back propagation learning algorithm with added emotional coefficients. IEEE Transactions on Neural Networks, 19(11), 1896–1909.CrossRefGoogle Scholar
  65. 65.
    Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.MathSciNetzbMATHCrossRefGoogle Scholar
  66. 66.
    Korn, F., & Muthukrishnan, S. (2000). Influence sets based on reverse nearest neighbor queries. In Proceedings of ACM SIGMOD International Conference on Management of Data (pp. 201–212). Dallas, TX.Google Scholar
  67. 67.
    Lagus, K., Kaski, S., & Kohonen, T. (2004). Mining massive document collections by the WEBSOM method. Information Sciences, 163, 135–156.CrossRefGoogle Scholar
  68. 68.
    Lamberti, F., Sanna, A., & Demartini, C. (2009). A relation-based page rank algorithm for semantic web search engines. IEEE Transactions on Knowledge and Data Engineering, 21(1), 123–136.CrossRefGoogle Scholar
  69. 69.
    Lawrence, R. D., Almasi, G. S., & Rushmeier, H. E. (1999). A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems. Data Mining and Knowledge Discovery, 3, 171–195.CrossRefGoogle Scholar
  70. 70.
    LeDoux, J. (1996). The emotional brain. New York: Simon and Schuster.Google Scholar
  71. 71.
    Lee, R. S. T., & Liu, J. N. K. (2004). iJADE Web-Miner: An intelligent agent framework for Internet shopping. IEEE Transactions on Knowledge and Data Engineering, 16(4), 461–473.CrossRefGoogle Scholar
  72. 72.
    Leung, K. W.-T., Ng, W., & Lee, D. L. (2008). Personalized concept-based clustering of search engine queries. IEEE Transactions on Knowledge and Data Engineering, 20(11), 1505–1518.CrossRefGoogle Scholar
  73. 73.
    Li, X., Liu, B., & Yu, P. (2008). Time sensitive ranking with application to publication search. In Proceedings of the 8th IEEE International Conference on Data Mining (pp. 893–898). Pisa, Italy.Google Scholar
  74. 74.
    Lin, D. (1998). An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning (pp. 296–304). San Francisco, CA: Morgan Kaufmann.Google Scholar
  75. 75.
    Lin, K.-P., & Chen, M.-S. (2011). On the design and analysis of the privacy-preserving SVM classifier. IEEE Transactions on Knowledge and Data Engineering, 23(11), 1704–1717.CrossRefGoogle Scholar
  76. 76.
    Lin, Q. Y., Chen, Y. L., Chen, J. S., & Chen, Y. C. (2003). Mining inter-organizational retailing knowledge for an alliance formed by competitive firms. Information Management, 40(5), 431–442.CrossRefGoogle Scholar
  77. 77.
    Liu, F., Yu, C., & Meng, W. (2004). Personalized web search for improving retrieval effectiveness. IEEE Transactions on Knowledge and Data Engineering, 16(1), 28–40.CrossRefGoogle Scholar
  78. 78.
    Liu, Y., Gao, B., Liu, T., Zhang, Y., Ma, Z., He S., & Li, H. (2008). BrowseRank: Letting users vote for page importance. In Proceedings of the 31st Annual International ACM SIGIR Conference (pp. 451–458). Singpore.Google Scholar
  79. 79.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2, 419–444.zbMATHGoogle Scholar
  80. 80.
    Loia, V., Pedrycz, W., & Senatore, S. (2003). P-FCM: A proximity-based fuzzy clustering for user-centered web applications. International Journal of Approximate Reasoning, 34, 121–144.zbMATHCrossRefGoogle Scholar
  81. 81.
    Lotfi, E., & Akbarzadeh-T., M. R., (2013a). Brain emotional learning-based pattern recognizer. Cybernetics and Systems, 44(5), 402–421.Google Scholar
  82. 82.
    Lotfi, E., & Akbarzadeh-T., M.-R., (2014). Practical emotional neural networks. Neural Networks, 59, 61–72.Google Scholar
  83. 83.
    Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRefGoogle Scholar
  84. 84.
    Lu, Y., & Tan, C. L. (2004). Information retrieval in document image databases. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1398–1410.CrossRefGoogle Scholar
  85. 85.
    Ma, J., Saul, L. K., Savage, S., & Voelker, G. M. (2011). Learning to detect malicious URLs. ACM Transactions on Intelligent Systems and Technology, 2(3), Article No. 30, 24 pages.Google Scholar
  86. 86.
    Masseglia, F., Poncelet, P., Teisseire, M., & Marascu, A. (2008). Web usage mining: Extracting unexpected periods from web logs. Data Mining and Knowledge Discovery, 16, 39–65.MathSciNetCrossRefGoogle Scholar
  87. 87.
    Miller, G., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1990). Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3, 235–244.CrossRefGoogle Scholar
  88. 88.
    Mimno, D., Hoffman, M. D., & Blei, D. M. (2012). Sparse stochastic inference for latent Dirichlet allocation. In Proceedings of the 29th International Conference on Machine Learning (pp. 1599–1606). Edinburgh, UK.Google Scholar
  89. 89.
    Mitra, M., & Chaudhuri, B. B. (2000). Information retrieval from documents: A survey. Information Retrieval, 2, 141–163.CrossRefGoogle Scholar
  90. 90.
    Mobasher, B., Cooley, R., & Srivastava, J. (1999). Creating adaptive web sites through usage-based clustering of URLs. In Proceedings of Workshop on Knowledge and Data Engineering Exchange (pp. 19–25). Chicago, IL.Google Scholar
  91. 91.
    Morrison, J. L., Breitling, R., Higham, D. J., & Gilbert, D. R. (2005). GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics, 6, 233–246.CrossRefGoogle Scholar
  92. 92.
    Nasraoui, O., Soliman, M., Saka, E., Badia, A., & Germain, R. (2008). A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Transactions on Knowledge and Data Engineering, 20(2), 202–215.CrossRefGoogle Scholar
  93. 93.
    Natsev, A., Rastogi, R., & Shim, K. (2004). WALRUS: A similarity retrieval algorithm for image databases. IEEE Transactions on Knowledge and Data Engineering, 16(3), 301–316.CrossRefGoogle Scholar
  94. 94.
    Ordonez, C. (2006). Integrating \(K\)-means clustering with a relational DBMS using SQL. IEEE Transactions on Knowledge and Data Engineering, 18(2), 188–201.CrossRefGoogle Scholar
  95. 95.
    Ordonez, C., & Omiecinski, E. (2004). Efficient disk-based \(K\)-means clustering for relational databases. IEEE Transactions on Knowledge and Data Engineering, 16(8), 909–921.CrossRefGoogle Scholar
  96. 96.
    Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the Web. Technical Report 1999–66, Computer Science Department, Stanford University.Google Scholar
  97. 97.
    Pal, S. K., Narayan, B. L., & Dutta, S. (2005). A web surfer model incorporating topic continuity. IEEE Transactions on Knowledge and Data Engineering, 17(5), 726–729.CrossRefGoogle Scholar
  98. 98.
    Park, L. A. F., Ramamohanarao, K., & Palaniswami, M. (2004). Fourier domain scoring: A novel document ranking method. IEEE Transactions on Knowledge and Data Engineering, 16(5), 529–539.CrossRefGoogle Scholar
  99. 99.
    Park, L. A. F., Palaniswami, M., & Ramamohanarao, K. (2005). A novel document ranking method using the discrete cosine transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(1), 130–135.CrossRefGoogle Scholar
  100. 100.
    Petridou, S. G., Koutsonikola, V. A., Vakali, A. I., & Papadimitriou, G. I. (2008). Time aware web users clustering. IEEE Transactions on Knowledge and Data Engineering, 20(5), 653–667.CrossRefGoogle Scholar
  101. 101.
    Petrilis, D., & Halatsis, C. (2008). Two-level clustering of web sites using self-organizing maps. Neural Processing Letters, 27, 85–95.CrossRefGoogle Scholar
  102. 102.
    Ponniah, P. (2001). Data warehousing fundamentals. New York: John Wiley & Sons.CrossRefGoogle Scholar
  103. 103.
    Pretschner, A., & Gauch, S. (1999). Ontology based personalized search. In Proceedings of 11th 11th IEEE International Conference on Tools with Artificial Intelligence (pp. 391–398).Google Scholar
  104. 104.
    Recupero, D. R. (2007). A new unsupervised method for document clustering by using WordNet lexical and conceptual relations. Information Retrieval, 10, 563–579.CrossRefGoogle Scholar
  105. 105.
    Richardson, M., & Domingos, P. (2002). The intelligent surfer: Probabilistic combination of link and content information in Pagerank. In Advances in neural information processing systems 14 (pp. 1441–1448). MIT Press.Google Scholar
  106. 106.
    Rui, Y., Huang, T. S., Ortega, M., & Mehrotra, S. (1998). Relevance feedback: A power tool for interactive content-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 8(5), 644–655.CrossRefGoogle Scholar
  107. 107.
    Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.zbMATHGoogle Scholar
  108. 108.
    Salton, G., & Yang, C.-S. (1973). On the specification of term values in automatic indexing. Journal of Documentation, 29(4), 351–372.CrossRefGoogle Scholar
  109. 109.
    Scaringella, N., Zoia, G., & Mlynek, D. (2006). Automatic genre classification of music content: A survey. IEEE Signal Processing Magazine, 23(2), 133–141.CrossRefGoogle Scholar
  110. 110.
    Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34, 1–47.CrossRefGoogle Scholar
  111. 111.
    Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision (pp. 1470–1477). Nice, France.Google Scholar
  112. 112.
    Smeulders, A. W., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380.CrossRefGoogle Scholar
  113. 113.
    Speretta, M., & Gauch, S. (2005). Personalized search based on user search histories. In Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (pp. 622–628). Compiegne, France.Google Scholar
  114. 114.
    Spiliopoulou, M., & Faulstich, L. C. (1998). WUM: A web utilization miner. In Proceedings of International Workshop on The World Wide Web and Databases (pp. 109–115). Valencia, Spain.Google Scholar
  115. 115.
    Sugiyama, K., Hatano, K., & Yoshikawa, M. (2004). Adaptive Web search based on user profile constructed without any effort from users. In Proceedings of the 13th International World Wide Web Conference (WWW) (pp. 675–684).Google Scholar
  116. 116.
    Tanudjaja, F., & Mui, L. (2002). Persona: A contextualized and personalized web search. In Proceedings of the 35th Annual Hawaii International Conference on System Sciences (pp. 1232–1240). Big Island, HI.Google Scholar
  117. 117.
    Thomas, H. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42, 177–196.zbMATHCrossRefGoogle Scholar
  118. 118.
    Tseng, C.-Y., Sung, P.-C., & Chen, M.-S. (2011). Cosdes: A collaborative spam detection system with a novel e-mail abstraction scheme. IEEE Transactions on Knowledge and Data Engineering, 23(5), 669–682.CrossRefGoogle Scholar
  119. 119.
    Turtle, H. R., & Croft, W. B. (1990). Inference networks for document retrieval. In J.-L. Vidick (Ed.), Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1–24). Brussels, Belgium.Google Scholar
  120. 120.
    Varadarajan, R., Hristidis, V., & Li, T. (2008). Beyond single-page web search results. IEEE Transactions on Knowledge and Data Engineering, 20(3), 411–424.CrossRefGoogle Scholar
  121. 121.
    Vlachou, A., Doulkeridis, C., Kotidis, Y., & Norvag, K. (2010). Reverse top-\(k\) queries. In Proceedings of IEEE 26th International Conference on Data Engineering (pp. 365–376). Long Beach, CA.Google Scholar
  122. 122.
    Wang, X.-J., Zhang, L., Li, X., & Ma, W.-Y. (2008). Annotating images by mining image search results. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1919–1932.CrossRefGoogle Scholar
  123. 123.
    Wang, J., Sun, X., She, M. F. H., Kouzani, A., & Nahavandi, S. (2013). Unsupervised mining of long time series based on latent topic model. Neurocomputing, 103, 93–103.CrossRefGoogle Scholar
  124. 124.
    Wijsen, J. (2001). Trends in databases: Reasoning and mining. IEEE Transactions on Knowledge and Data Engineering, 13(3), 426–438.CrossRefGoogle Scholar
  125. 125.
    Wu, C.-H., & Tsai, C.-H. (2009). Robust classification for spam filtering by back-propagation neural networks using behavior-based features. Applied Intelligence, 31, 107–121.CrossRefGoogle Scholar
  126. 126.
    Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 267–273). Toronto, Canada.Google Scholar
  127. 127.
    Xu, W., & Gong, Y. (2004). Document clustering by concept factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 202–209). Sheffield, UK.Google Scholar
  128. 128.
    Yan, T., Jacobsen, M., Garcia-Molina, H., & Dayal, U. (1996). From user access patterns to dynamic hypertext linking. In Proceedings of the 5th International World Wide Web Conference (pp. 1007–1014). Paris, France.Google Scholar
  129. 129.
    Yang, Q., & Zhang, H. H. (2003). Web-log mining for predictive web caching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4), 1050–1053.Google Scholar
  130. 130.
    Yao, L., Mimno, D., & McCallum, A. (2009). Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 937–946). Paris, France.Google Scholar
  131. 131.
    Zeng, J., Cheung, W. K., & Liu, J. (2013). Learning topic models by belief propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5), 1121–1134.CrossRefGoogle Scholar
  132. 132.
    Zeng, J., Liu, Z.-Q., & Cao, X.-Q. (2016). Fast online EM for big topic modeling. IEEE Transactions on Knowledge and Data Engineering, 28(3), 675–688.CrossRefGoogle Scholar
  133. 133.
    Zheng, A. X., Ng, A. Y., & Jordan, M. I. (2001). Stable algorithms for link analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 258–266). New Orleans, LA.Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringConcordia UniversityMontrealCanada
  2. 2.Xonlink Inc.HangzhouChina

Personalised recommendations