Advertisement

Multimedia Tools and Applications

, Volume 51, Issue 1, pp 35–76 | Cite as

Multimedia data mining: state of the art and challenges

  • Chidansh Amitkumar BhattEmail author
  • Mohan S. Kankanhalli
Article

Abstract

Advances in multimedia data acquisition and storage technology have led to the growth of very large multimedia databases. Analyzing this huge amount of multimedia data to discover useful knowledge is a challenging problem. This challenge has opened the opportunity for research in Multimedia Data Mining (MDM). Multimedia data mining can be defined as the process of finding interesting patterns from media data such as audio, video, image and text that are not ordinarily accessible by basic queries and associated results. The motivation for doing MDM is to use the discovered patterns to improve decision making. MDM has therefore attracted significant research efforts in developing methods and tools to organize, manage, search and perform domain specific tasks for data from domains such as surveillance, meetings, broadcast news, sports, archives, movies, medical data, as well as personal and online media collections. This paper presents a survey on the problems and solutions in Multimedia Data Mining, approached from the following angles: feature extraction, transformation and representation techniques, data mining techniques, and current multimedia data mining systems in various application domains. We discuss main aspects of feature extraction, transformation and representation techniques. These aspects are: level of feature extraction, feature fusion, features synchronization, feature correlation discovery and accurate representation of multimedia data. Comparison of MDM techniques with state of the art video processing, audio processing and image processing techniques is also provided. Similarly, we compare MDM techniques with the state of the art data mining techniques involving clustering, classification, sequence pattern mining, association rule mining and visualization. We review current multimedia data mining systems in detail, grouping them according to problem formulations and approaches. The review includes supervised and unsupervised discovery of events and actions from one or more continuous sequences. We also do a detailed analysis to understand what has been achieved and what are the remaining gaps where future research efforts could be focussed. We then conclude this survey with a look at open research directions.

Keywords

Survey Multimodal data mining Probabilistic temporal multimedia data mining Video mining Audio mining Image mining Text mining 

References

  1. 1.
    Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data, pp 207–216Google Scholar
  2. 2.
    Agrawal R, Srikant R (1995) Mining sequential patterns. In: International conference on data engineeringGoogle Scholar
  3. 3.
    Ajmera J, McCowan I, Bourlard H (2002) Robust hmm-based speech/music segmentation. In: IEEE international conference on acoustics, speech and signal processing, pp 1746–1749Google Scholar
  4. 4.
    Aradhye H, Toderici G, Yagnik J (2009) Video2text: learning to annotate video content. In: International conference on data mining workshops, pp 144–151Google Scholar
  5. 5.
    Artigan JA (1975) Clustering algorithms. Wiley, New YorkGoogle Scholar
  6. 6.
    Baillie M, Jose JM (2004) An audio-based sports video segmentation and event detection algorithm. In: Workshop on event mining, detection and recognition of events in videoGoogle Scholar
  7. 7.
    Barnard K, Duygulu P, Forsyth DA, de Freitas N, Blei DM, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135zbMATHCrossRefGoogle Scholar
  8. 8.
    Benitez AB, Smith JR, Chang SF (2000) A multimedia information network for knowledge representation. SPIE, BellinghamGoogle Scholar
  9. 9.
    Box G, Jenkins GM, Reinsel G (1994) Time series analysis: forecasting and control. Pearson Education, PariszbMATHGoogle Scholar
  10. 10.
    Briggs F, Raich R, Fern X (2009) Audio classification of bird species: a statistical manifold approach. In: IEEE international conference on data mining (ICDM), pp 51–60Google Scholar
  11. 11.
    Chang E, Goh K, Sychay G, Wu G (2002) Content-based annotation for multimodal image retrieval using bayes point machines. IEEE Trans Circuits Syst Video Technol 13(1):26–38CrossRefGoogle Scholar
  12. 12.
    Chang E, Li C, Wang J (1999) Searching near replicas of image via clustering. In: SPIE multimedia storage and archiving systems, vol 6Google Scholar
  13. 13.
    Chen M, Chen SC, Shyu ML (2007) Hierarchical temporal association mining for video event detection in video databases. In: Multimedia databases and data managementGoogle Scholar
  14. 14.
    Chen M, Chen SC, Shyu ML, Wickramaratna K (2006) Semantic event detection via multimodal data mining. IEEE Signal Process Mag 23:38–46CrossRefGoogle Scholar
  15. 15.
    Chen SC, Shyu ML, Zhang C, Strickrott J (2001) Multimedia data mininig for traffic video sequenices. In: ACM SIGKDDGoogle Scholar
  16. 16.
    Chen SC, Shyu ML, Chen M, Zhang C (2004) A decision tree-based multimodal data mining framework for soccer goal detection. In: IEEE international conference multimedia and expo, pp 265–268Google Scholar
  17. 17.
    Dai K, Zhang J, Li G (2006) Video mining: concepts, approaches and applications. In: Multi-media modellingGoogle Scholar
  18. 18.
    Darrell T, Pentland A (1993) Space-time gestures. In: IEEE Computing Society conference on computer vision and pattern recognition, pp 335–340Google Scholar
  19. 19.
    Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366CrossRefGoogle Scholar
  20. 20.
    Dhillon I (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: ACM SIGKDDGoogle Scholar
  21. 21.
    Dimitriadis D, Maragos P (2003) Robust energy demodulation based on continuous models with application to speech recognition. In: European conference on speech communication and technologyGoogle Scholar
  22. 22.
    Duda R, Hart P, Stork D (2001) Pattern classification. Wiley, New YorkzbMATHGoogle Scholar
  23. 23.
    El-Maleh K, Klein M, Petrucci G, Kabal P (2000) Speech/music discrimination for multimedia application. In: International conference on acoustics, speech and signal processing, pp 2445–2448Google Scholar
  24. 24.
    Ellom BL, Hansen JHL (1998) Automatic segmentation of speech recorded in uknown noisy channel characteristics. Speech Commun 25:97–116CrossRefGoogle Scholar
  25. 25.
    Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International conference on knowledge discovery and data mining, pp 226–231Google Scholar
  26. 26.
    Fu CS, Chen W, Jianhao MH, Sundaram H, Zhong D (1998) A fully automated content based video search engine supporting spatio-temporal queries. IEEE Trans Circuits Syst Video Technol 8(5):602–615CrossRefGoogle Scholar
  27. 27.
    Faloutsos C, Equitz W, Flickner M, Niblack W, Petkovic D, Barber R (1994) Efficient and effective querying by image content. Journal of Intelligent Information Systems 3:231–262CrossRefGoogle Scholar
  28. 28.
    Fan J, Gao Y, Luo H (2007) Hierarchical classification for automatic image annotation. In: ACM SIGIR, pp 111–118Google Scholar
  29. 29.
    Fan J, Gao Y, Luo H, Jain R (2008) Mining multilevel image semantics via hierarchical classification. IEEE Trans Multimedia 10(2):167–187CrossRefGoogle Scholar
  30. 30.
    Fan J, Gao Y, Luo H, Xu G (2005) Statistical modeling and conceptualization of natural scenes. Pattern Recogn 38(6):865–885CrossRefGoogle Scholar
  31. 31.
    Fersini E, Messina E, Arosio G, Archetti F (2009) Audio-based emotion recognition in judicial domain: a multilayer support vector machines approach. In: Machine learning and data mining in pattern recognition (MLDM), pp 594–602Google Scholar
  32. 32.
    Foote JT (1997) Content-based retrieval of music and audio. SPIE 3229:138–147CrossRefGoogle Scholar
  33. 33.
    Forsati R, Mahdavi M (2010) Web text mining using harmony search. In: Recent advances in harmony search algorithm, pp 51–64Google Scholar
  34. 34.
    Frakes WB, Baeza-Yates R (1992) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood CliffsGoogle Scholar
  35. 35.
    Frigui H, Caudill J (2007) Mining visual and textual data for constructing a multi-modal thesaurus. In: SIAM international conference on data miningGoogle Scholar
  36. 36.
    Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272CrossRefGoogle Scholar
  37. 37.
    Gajic B, Paliwal KK (2001) Robust feature extraction using subband spectral centroid histograms. In: International conference on acoustics, speech and signal processing, vol 1, pp 85–88Google Scholar
  38. 38.
    Gao J, Sun Y, Suo H, Zhao Q, Yan Y (2009) Waps: an audio program surveillance system for large scale web data stream. In: International conference on web information systems and mining (WISM), pp 116–128Google Scholar
  39. 39.
    Gao Y, Fan J (2006) Incorporate concept ontology to enable probabilistic concept reasoning for multi-level image annotation. In: ACM MIRGoogle Scholar
  40. 40.
    Garner P, Fukadam T, Komori Y (2004) A differential spectral voice activity detector. In: International conference on acoustics, speech and signal processing, vol 1, pp 597–600Google Scholar
  41. 41.
    Ghitza O (1987) Auditory nerve representation as a front-end in a noisy environment. Comput Speech Lang 2(1):109–130Google Scholar
  42. 42.
    Goh KS, Miyahara K, Radhakrishan R, Xiong Z, Divakaran A (2004) Audio-visual event detection based on mining of semantic audio-visual labels. In: SPIE conference on storage and retrieval of multimedia databases, vol 5307, pp 292–299Google Scholar
  43. 43.
    Gold B, Morgan N (2000) Speech and audio signal processing: processing and perception of speech and music. Wiley, New YorkGoogle Scholar
  44. 44.
    Gool LV, Breitenstein MD, Gammeter S, Grabner H, Quack T (2009) Mining from large image sets. In: ACM international conference on image and video retrieval(CIVR), pp 1–8Google Scholar
  45. 45.
    Gorkani MM, Con R, Picard W (1994) Texture orientation for sorting photos at a glance. In: IEEE conference on pattern recognitionGoogle Scholar
  46. 46.
    Guo GD, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. IEEE Trans Neural Netw 14(1):209–215CrossRefGoogle Scholar
  47. 47.
    Guo Z, Zhang Z, Xing EP, Faloutsos C (2007) Enhanced max margin learning on multimodal data mining in a multimedia database. In: ACM international conference knowledge discovery and data miningGoogle Scholar
  48. 48.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Ian H (2009) The Weka data mining software: an update. In: SIGKDD explorations, vol 11Google Scholar
  49. 49.
    Han J, Kamber M (2006) Data mining concepts and techniques. Morgan Kaufmann, San MateoGoogle Scholar
  50. 50.
    Han J, Pei J (2000) Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explorations Newsletter 2(2):14–20CrossRefGoogle Scholar
  51. 51.
    Harb H, Chen L, Auloge JY (2001) Speech/music/silence and gender detection algorithm. In: International conference on distributed multimedia systems, pp 257–262Google Scholar
  52. 52.
    He R, Xiong N, Yang L, Park J (2010) Using multi-modal semantic association rules to fuse keywords and visual features automatically for web image retrieval. In: International conference on information fusionGoogle Scholar
  53. 53.
    He R, Zhan W (2009) Multi-modal mining in web image retrieval. In: Asia-Pacific conference on computational intelligence and industrial applicationsGoogle Scholar
  54. 54.
    Hermansky H (1987) An efficient speaker independent automatic speech recognition by simulation of some properties of human auditory perception. In: International conference on acoustics, speech and signal processing, pp 1156–1162Google Scholar
  55. 55.
    Hermansky H (1987) An efficient speaker independent automatic speech recognition by simulation of some properties of human auditory perception. In: IEEE international conference on acoustics, speech and signal processing, pp 1156–1162Google Scholar
  56. 56.
    Hermansky H (1990) Perceptual linear predictive (plp) analysis of speech. J Acoust Soc Am 87(4):1738–1752CrossRefGoogle Scholar
  57. 57.
    Hermansky H, Morgan N (1994) Rasta processing of speech. IEEE Trans Acoust Speech Signal Process 2(4):578–589Google Scholar
  58. 58.
    Hermansky H, Morgan N, Bayya A, Kohn, P (1991) Compensation for the effect of the communication channel in auditory-like analysis of speech. In: European conference on speech communication and technology pp, 578–589Google Scholar
  59. 59.
    Hermansky H, Sharma S (1998) Traps-classifiers of temporal patterns. In: International conference on speech and language processingGoogle Scholar
  60. 60.
    Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining a general survey and comparison. SIGKDD Explorations 2(2):1–58CrossRefGoogle Scholar
  61. 61.
    Huang J, Kumar S, Zabih R (1998) An automatic hierarchical image classification scheme. In: ACM multimediaGoogle Scholar
  62. 62.
    Hwan OJ, Lee JK, Kote S (2003) Real time video data mining for surveillance video streams. In: Pacific-Asia conference on knowledge discovery and data miningGoogle Scholar
  63. 63.
    Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323CrossRefGoogle Scholar
  64. 64.
    Jiang C, Coenena F, Sandersona R, Zito M (2010) Text classification using graph mining-based feature extraction. Knowl-based Syst 23(4):302–308CrossRefGoogle Scholar
  65. 65.
    Jiang T (2009) Learning image text associations. IEEE Trans Knowl Data Eng 21(2):161–177CrossRefGoogle Scholar
  66. 66.
    Juang BH, Rabiner L (1993) Fundamentals of speech recognition. Prentice-Hall, Englewood CliffsGoogle Scholar
  67. 67.
    Kemp T, Schmidt M, Westphal M, Waibel A (2000) Strategies for automatic segmentation of audio data. In: International conference on acoustics, speech and signal processingGoogle Scholar
  68. 68.
    Kotsiantis S, Kanellopoulos D (2006) Association rules mining: a recent overview. Int Trans Comput Sci Eng 32(1):71–82Google Scholar
  69. 69.
    Kruskal JB (1983) An overview of sequence comparison: timewarps, string edits and macromolecules. SIAM Rev 25:201–237zbMATHCrossRefMathSciNetGoogle Scholar
  70. 70.
    Kubin G, Kleijn WB (1994) Time-scale modification of speech based on a nonlinear oscillator model. In: IEEE international conference on acoustics, speech and signal processingGoogle Scholar
  71. 71.
    Kurabayashi S, Kiyoki Y (2010) Mediamatrix: A video stream retrieval system with mechanisms for mining contexts of query examples. In: Database systems for advanced applications (DASFAA)Google Scholar
  72. 72.
    Leavitt N (2002) Let’s hear it for audio mining. Computer 35:23–25CrossRefGoogle Scholar
  73. 73.
    Li D, Dimitrova N, Li M, Sethi KI (2003) Multimedia content processing through cross-modal association. In: ACM multimedia, pp 604–611Google Scholar
  74. 74.
    Li N, Wu DD (2010) Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis Support Syst 48:354–368CrossRefGoogle Scholar
  75. 75.
    Li SZ (2000) Content-based audio classification and retrieval using the nearest feature line method. In: International conference on acoustics, speech and signal processing, vol 8(5), pp 619–625Google Scholar
  76. 76.
    Li Y, Shapiro LG, Bilmes JA (2005) A generative/discriminative learning algorithm for image classification. In: IEEE international conference of computer visionGoogle Scholar
  77. 77.
    Lilt D, Kubala F (2004) Online speaker clustering. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP)Google Scholar
  78. 78.
    Lin L, Ravitz G, Shyu ML, Chen SC (2007) Video semantic concept discovery using multimodal-based association classification. In: IEEE international conference on multimedia and expo, pp 859–862Google Scholar
  79. 79.
    Lin L, Shyu ML (2009) Mining high-level features from video using associations and correlations. In: International conference on semantic computing, pp 137–144Google Scholar
  80. 80.
    Lin L, Shyu ML, Ravitz G, Chen SC (2009) Video semantic concept detection via associative classification. In: IEEE international conference on multimedia and expo, pp 418–421Google Scholar
  81. 81.
    Lin W, Jin R, Hauptmann AG (2002) Triggering memories of conversations using multimodal classifiers. In: Workshop on intelligent situation aware media and presentationGoogle Scholar
  82. 82.
    Lin WH, Hauptmann A (2003) Meta-classification: combining multimodal classifiers. Lect Notes Comput Sci 2797:217–231CrossRefGoogle Scholar
  83. 83.
    Lin WH, Jin R, Hauptmann AG (2002) News video classification using svm-based multimodal classifiers and combination strategies. In: ACM multimediaGoogle Scholar
  84. 84.
    Liu J, Jiang L, Wu Z, Zheng Q, Qian Y (2010) Mining preorder relation between knowledge elements from text. In: ACM symposium on applied computingGoogle Scholar
  85. 85.
    Liu Q, Sung A, Qiao M (2009) Spectrum steganalysis of wav audio streams. In: International conference on machine learning and data mining in pattern recognition (MLDM), pp 582–593Google Scholar
  86. 86.
    Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Mining Knowledge Discovery 1:259–289CrossRefGoogle Scholar
  87. 87.
    Maragos P (1991) Fractal aspects of speech signals: dimension and interpolation. In: IEEE international conference on acoustics, speech and signal processingGoogle Scholar
  88. 88.
    Maragos P, Potamianos A (1999) Fractal dimensios of speech sounds: computation and application to automatic speech recognition. J Acoust Soc Am 105(3):1925–1932CrossRefGoogle Scholar
  89. 89.
    Mase K, Sawamoto Y, Koyama Y, Suzuki T, Katsuyama K (2009) Interaction pattern and motif mining method for doctor-patient multi-modal dialog analysis. In: Multimodal sensor-based systems and mobile phones for social computing, pp 1–4Google Scholar
  90. 90.
    Matsuo Y, Shirahama K, Uehara K (2003) Video data mining: extracting cinematic rules from movies. In: International workshop on multimedia data mining, pp 18–27Google Scholar
  91. 91.
    Megalooikonomou V, Davataikos C, Herskovits EH (1999) Mining lesion-deficit associations in a brain image database. In: ACM SIGKDDGoogle Scholar
  92. 92.
    Meinedo H, Neto J (2005) A stream-based audio segmentation, classification and clustering pre-processing system for broadcast news using ann models. In: Interspeech—EurospeechGoogle Scholar
  93. 93.
    Mesgarani N, Shamma S, Slaney M (2004) Speech discrimination based on multiscale spectrotemporal modulations. In: International conference on acoustics, speech and signal processing, vol 1, pp 601–604Google Scholar
  94. 94.
    Messina A, Montagnuolo M (2009) A generalised cross-modal clustering method applied to multimedia news semantic indexing and retrieval. In: International conference on world wide web (WWW), pp 321–330Google Scholar
  95. 95.
    Montagnuolo M, Messina A, Ferri M (2010) Hmnews: a multimodal news data association framework. In: Symposium on applied computing (SAC), pp 1823–1824Google Scholar
  96. 96.
    Moreno PJ, Rifkin R (2000) Using the fisher kernel method for web audio classification. In: IEEE international conference on acoustics, speech and signal processingGoogle Scholar
  97. 97.
    Nørvåg K, Øivind Eriksen T, Skogstad KI (2006) Mining association rules in temporal document collections. In: International symposium on methodologies for intelligent systems (ISMIS), pp 745–754Google Scholar
  98. 98.
    Nørvåg K, Fivelstad OK (2009) Semantic-based temporal text-rule mining. In: International conference on computational linguistics and intelligent text processing, pp 442–455Google Scholar
  99. 99.
    Oates T, Cohen P (1996) Searching for structure in multiplestreams of data. In: International conference of machine learning, pp 346–354Google Scholar
  100. 100.
    Oh J, Bandi B (2002) Multimedia data mining framework for raw video sequences. In: International workshop on multimedia data mining (MDM/KDD), pp 1–10Google Scholar
  101. 101.
    Ordonez C, Omiecinski E (1999) Discovering association rules based on image content. In: IEEE advances in digital libraries conferenceGoogle Scholar
  102. 102.
    Pan J, Faloutsos C (2002) Videocube: a novel tool for video mining and classification. In: International conference on Asian digital libraries (ICADL), pp 194–205Google Scholar
  103. 103.
    Pan JY, Yang HJ, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: ACM SIGKDD conference on knowledge discovery and data miningGoogle Scholar
  104. 104.
    Patel N, Sethi I (2007) Multimedia data mining: an overview. In: Multimedia data mining and knowledge discovery. SpringerGoogle Scholar
  105. 105.
    Pentland A, Picard RW, Sclaroff S (1996) Photobook: content-based manipulation of image databases. Int J Comput Vis 18:233–254CrossRefGoogle Scholar
  106. 106.
    Pfeiffer S, Fischer S, Effelsberg W (1996) Automatic audio content analysis. In: ACM multimedia, pp 21–30Google Scholar
  107. 107.
    Pinquier J, Rouas JL, Andre-Obrecht R (2002) Robust speech/music classification in audio documents. In: International conference on speech and language processing, vol 3, pp 2005–2008Google Scholar
  108. 108.
    Porter M (1980) An algorithm for suffix stripping. Program 14(3):130–137Google Scholar
  109. 109.
    Quatieri TF, Hofstetter EM (1990) Short-time signal representation by nonlinear difference equations. In: International conference on acoustics, speech and signal processingGoogle Scholar
  110. 110.
    Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San MateoGoogle Scholar
  111. 111.
    Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRefGoogle Scholar
  112. 112.
    Rajendran P, Madheswaran M (2009) An improved image mining technique for brain tumour classification using efficient classifier. International Journal of Computer Science and Information Security (IJCSIS) 6(3):107–116Google Scholar
  113. 113.
    Ramachandran C, Malik R, Jin X, Gao J, Nahrstedt K, Han J (2009) Videomule: a consensus learning approach to multi-label classification from noisy user-generated videos. In: ACM international conference on multimedia, pp 721–724Google Scholar
  114. 114.
    Ribeiro MX, Balan AGR, Felipe JC, Traina AJM, Traina C (2009) Mining statistical association rules to select the most relevant medical image features. In: Mining complex data. Springer, pp 113–131Google Scholar
  115. 115.
    Rijsbergen CJV (1986) A non-classical logic for information retrieval. Comput J 29(6):481–485zbMATHCrossRefGoogle Scholar
  116. 116.
    Robertson SE (1977) The probability ranking principle. J Doc 33:294–304CrossRefGoogle Scholar
  117. 117.
    Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620zbMATHCrossRefGoogle Scholar
  118. 118.
    Saraceno C, Leonardi R (1997) Audio as a support to scene change detection and characterization of video sequences. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), vol 4, pp 2597–2600Google Scholar
  119. 119.
    Saunders J (1996) Real-time discrimination of broadcast speech/music. ICASSP 2:993–996Google Scholar
  120. 120.
    Sclaroff S, Kollios G, Betke M, Rosales R (2001) Motion mining. In: International workshop on multimedia databases and image communicationGoogle Scholar
  121. 121.
    Seneff S (1984) Pitch and spectral estimation of speech based on an auditory synchrony model. In: IEEE international conference on acoustics, speech and signal processing, pp 3621–3624Google Scholar
  122. 122.
    Seneff S (1988) A joint synchrony/mean-rate model of auditory speech processing. J Phon 16(1):57–76Google Scholar
  123. 123.
    Shao X, Xu C, Kankanhalli MS (2003) Applying neural network on content based audio classification. In: IEEE Pacific-Rim conference on multimediaGoogle Scholar
  124. 124.
    Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. In: International conference on very large data bases (VLDB), pp 428–439Google Scholar
  125. 125.
    Shirahama K, Ideno K, Uehara K (2005) Video data mining: mining semantic patterns with temporal constraints from movies. In: IEEE international symposium on multimediaGoogle Scholar
  126. 126.
    Shirahama K, Ideno K, Uehara K (2008) A time constrained sequential pattern mining for extracting semantic events in videoss. In: Multimedia data mining. Springer LinkGoogle Scholar
  127. 127.
    Shirahama K, Iwamoto K, Uehara K (2004) Video data mining: rhythms in a movie. In: International conference on multimedia and expoGoogle Scholar
  128. 128.
    Shirahama K, Sugihara C, Matsumura K, Matsuoka Y, Uehara K (2009) Mining event definitions from queries for video retrieval on the internet. In: International conference on data mining workshops, pp 176–183Google Scholar
  129. 129.
    Shyu ML, Xie Z, Chen M, Chen SC (2008) Video semantic event concept detection using a subspace based multimedia data mining framework. IEEE Trans Multimedia 10(2):252–259CrossRefGoogle Scholar
  130. 130.
    Smith JR, Chang SF (1996) Local color and texture extraction and spatial query. IEEE Int Conf Image Proc 3:1011–1014Google Scholar
  131. 131.
    Sohn J, Kim NS, Sun W (1999) A statistical model-based voice activity detection. IEEE Signal Process Lett 6(1):1–3CrossRefGoogle Scholar
  132. 132.
    Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: ACM SIGKDD world text mining conferenceGoogle Scholar
  133. 133.
    Stembridge B, Corish B (2004) Patent data mining and effective portfolio management. Intellect Asset ManageGoogle Scholar
  134. 134.
    Stricker M, Orengo M (1995) Similarity of color images. Storage retr image video databases (SPIE) 2420:381–392Google Scholar
  135. 135.
    Swain MJ, Ballard DH Color indexing. Int J Comput Vis 7(7):11–32Google Scholar
  136. 136.
    Tada T, Nagashima T, Okada Y (2009) Rule-based classification for audio data based on closed itemset mining. In: International multiconference of engineers and computer scientists (IMECS)Google Scholar
  137. 137.
    Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: ACM multimediaGoogle Scholar
  138. 138.
    Townshend B (1990) Nonlinear prediction of speech signals. In: IEEE international conference on acoustics, speech and signal processingGoogle Scholar
  139. 139.
    Trippe A (2003) Patinformatics: tasks to tools. World Pat Inf 25:211–221CrossRefGoogle Scholar
  140. 140.
    Vailaya A, Figueiredo M, Jain AK, Zhang HJ (1998) A bayesian framework for semantic classification of outdoor vacation images. In: SPIE, vol 3656Google Scholar
  141. 141.
    Vapnik V (1995) The nature of statistical learning theory. Springer, BerlinzbMATHGoogle Scholar
  142. 142.
    Victor SP, Peter SJ (2010) A novel minimum spanning tree based clustering algorithm for image mining. European Journal of Scientific Research (EJSR) 40(4):540–546Google Scholar
  143. 143.
    Wang JZ, Li J, Wiederhold G, Firschein O (2001) Classifying objectionable websites based on image content. In: Lecture notes in computer science, pp 232–242Google Scholar
  144. 144.
    Wei S, Zhao Y, Zhu Z, Liu N (2009) Multimodal fusion for video search reranking. IEEE Trans Knowl Data Eng 99(1):1191–1199Google Scholar
  145. 145.
    Williams G, Ellis D (1999) Speech/music discrimination based on posterior probability features. In: EurospeechGoogle Scholar
  146. 146.
    Wu Y, Chang EY, Tseng BL (2005) Multimodal metadata fusion using causal strength. In: ACM multimedia, pp 872–881Google Scholar
  147. 147.
    Wynne H, Lee ML, Zhang J (2002) Image mining: trends and developments. J Intell Inf Syst 19(1):7–23CrossRefGoogle Scholar
  148. 148.
    Xie L, Kennedy L, Chang SF, Lin CY, Divakaran A, Sun H (2004) Discover meaningful multimedia patterns with audio-visual concepts and associated text. In: IEEE international conference on image processingGoogle Scholar
  149. 149.
    Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hiddenmarkov model. In: IEEE Computing Society conference on computer vision and pattern recognition, pp 379–385Google Scholar
  150. 150.
    Yan R, Yang J, Hauptmann AG (2004) Learning query class dependent weights in automatic video retrieval. In: ACM multimedia, pp 548–555Google Scholar
  151. 151.
    Yang Y, Akers L, Klose T, Yang CB (2008) Text mining and visualization tools—impressions of emerging capabilities. World Pat Inf 30:280–293CrossRefGoogle Scholar
  152. 152.
    Yeung M, Yeo BL, Liu B (2001) Extracting story units from long programs for video browsing and navigation. In: Readings in multimedia computing and networking. Morgan Kaufmann, San MateoGoogle Scholar
  153. 153.
    Yeung MM, Yeo BL (1996) Time-constrained clustering for segmentation of video into story unites. Int Conf Pattern Recognit 3:375–380CrossRefGoogle Scholar
  154. 154.
    Zaiane O, Han J, Li Z, Chee S, Chiang J (1998) Multimediaminer: a system prototype for multimedia data mining. In: ACM SIGMOD, pp 581–583Google Scholar
  155. 155.
    Zhang C, Chen WB, Chen X, Tiwari R, Yang L, Warner G (2009) A multimodal data mining framework for revealing common sources of spam images. J Multimedia 4(5):321–330Google Scholar
  156. 156.
    Zhang HJ, Zhong D (1995) A scheme for visual feature based image indexing. In: SPIE conference on storage and retrieval for image and video databasesGoogle Scholar
  157. 157.
    Zhang R, Zhang Z, Li M, Ma WY, Zhang HJ (2005) A probabilistic semantic model for image annotation and multi-modal image retrieval. In: IEEE international conference of computer visionGoogle Scholar
  158. 158.
    Zhang T, Kuo CCJ (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 9(4):441–457CrossRefGoogle Scholar
  159. 159.
    Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: SIGMOD conference, pp 103–114Google Scholar
  160. 160.
    Zhu R, Yao M, Liu Y (2009) Image classification approach based on manifold learning in web image mining. In: International conference on advanced data mining and applications (ADMA), pp 780–787Google Scholar
  161. 161.
    Zhu X, Wu X, Elmagarmid AK, Wu L (2005) Video data mining: semantic indexing and event detection from the association perspective. IEEE Trans Knowl Data Eng 17(5):665–677CrossRefGoogle Scholar
  162. 162.
    Ziang J, Ward W, Pellom B (2002) Phone based voice activity detection using online bayesian adaptation with conjugate normal distributions. In: International conference on acoustics, speech and signal processingGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Chidansh Amitkumar Bhatt
    • 1
    Email author
  • Mohan S. Kankanhalli
    • 1
  1. 1.School of ComputingNational University of SingaporeSingaporeSingapore

Personalised recommendations