Multimedia Tools and Applications

, Volume 75, Issue 3, pp 1495–1508 | Cite as

Event-based cross media question answering



User generated content, available in massive amounts on the Internet, is receiving increased attention due to its many potential applications. One of such applications is the representation of events using multimedia data. In this paper, an event-based cross media question answering system, which retrieves and summarizes events on a given topic is proposed. In other words, we present a framework for leveraging social media data to extract and illustrate social events automatically on any given query. The system is built in three steps. First, the input query is parsed semantically to identify the topic, location, and time information related to the News of interest. Then, we use the parsed information to mine the latest and hottest related News from social news web services. Third, to identify a unique event, we model the News content by latent Dirichlet Allocation and cluster the News using the DBSCAN algorithm. In the end, for each event, we retrieve both textual and visual content of News that refer the same event. The resulting documents are shown within a vivid interface featuring both event description, tag cloud and photo collage.


Events Social media Illustration Cross media Question answering 



This work was supported by the 973 Program of China (No. 2013CB329604), and the European Commission under contracts FP7-287911 LinkedTV and FP7-318101 MediaMixer, as well as.


  1. 1.
    Allan J, Carbonell J, Doddington G, Yamron J, Yang Y (1998) Topic detection and tracking pilot study: final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA, USA, Feb. 007, pp 194–218Google Scholar
  2. 2.
    Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In SODA’07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035, Philadelphia, PA, USA. Society for Industrial and Applied MathematicsGoogle Scholar
  3. 3.
    Bao B-K, Min W, Sang J, Xu C (2012) Multimedia news digger on emerging topics from social streams. In Proceedings of the 20th ACM international conference on Multimedia, MM’12, pp 1357–1358Google Scholar
  4. 4.
    Becker H, Iter D, Naaman M, Gravano L (2012) Identifying content for planned events across social media sites. In ACM conference on WSDMGoogle Scholar
  5. 5.
    Chen L, Roy A (2009) Event detection from flickr data through wavelet-based spatial analysis. In ACM conference on CIKMGoogle Scholar
  6. 6.
    Chengjie S, Yi G (2004) A statistical approach for content extraction from web page. J Chin Inf Process 18(5):17–22Google Scholar
  7. 7.
    Delgado D, Magalhães JA, Correia N (2010) Assisted News Reading with Automated Illustrations. In ACM conference on Multimedia, pp 1647–1650Google Scholar
  8. 8.
    Firan CS, Georgescu M, Nejdl W, Paiu R (2010) Bringing order to your photos: Event-Driven Classification of Flickr Images Based on Social Knowledge. In Proceedings of the 19th ACM international conference on Information and knowledge management, New York, USA, pp 189Google Scholar
  9. 9.
    Gao Y, Wang M, Zha Z-J, Shen J, Li X, Wu X (2013) Visual-textual joint relevance learning for tag-based social image search. IEEE Trans Image Process 22(1):363–376CrossRefMathSciNetGoogle Scholar
  10. 10.
    Hong R, Wang M, Li G, Nie L, Zha Z-J, Chua T-S (2012) Multimedia question answering. MultiMedia IEEE 19(4):72–78CrossRefGoogle Scholar
  11. 11.
    Joshi D, Wang JZ, Li J (2006) The story picturing engine—a system for automatic text illustration. ACM Trans Multimed Comput Commun Appl 2(1):68–89CrossRefGoogle Scholar
  12. 12.
    Li G, Ming Z, Li H, Chua T-S (2009) Video reference: question answering on youtube. In Proceedings of the 17th ACM international conference on Multimedia, pp 773–776Google Scholar
  13. 13.
    Li H, Tang J, Wang Y, Liu B (2012) Looking into the world on google maps with view direction estimated photos. Neurocomput 95:72–77CrossRefGoogle Scholar
  14. 14.
    Liu X, Huet B, Troncy R (2011) Eurecom@ mediaeval 2011 social event detection task. In MediaEvalGoogle Scholar
  15. 15.
    Liu X, Huet B, Troncy R (2011) Eurecom@ MediaEval 2011 social event detection task. In Proceedings of the MediaEval 2011 WorkshopGoogle Scholar
  16. 16.
    Liu X, Troncy R, Huet B (2011) Finding Media Illustrating Events. In ACM International Conference on ICMR, Trento, ItalyGoogle Scholar
  17. 17.
    Manning CD, Raghavan P, Schütze H (2008) Introduction to Information Retrieval. 1 edition, JulyGoogle Scholar
  18. 18.
    Mei T, Yang B, Yang S-Q, Hua X-S (2009) Video collage: presenting a video sequence using a single image. Visual Comput 39–51Google Scholar
  19. 19.
    Nie L, Wang M, Gao Y, Zha Z-J, Chua T-S (2013) Beyond text qa: multimedia answer generation by harvesting web information. IEEE Trans Multimedia 15(2):426–441CrossRefGoogle Scholar
  20. 20.
    Pan C-C, Mitra P (2011) Event detection with spatial latent Dirichlet allocation. In Proceeding of the 11th annual international ACM/IEEE joint conference on Digital libraries, page 349, New York, USA, June.Google Scholar
  21. 21.
    Quack T, Leibe B, Van Gool L (2008) World-scale mining of objects and events from community photo collections. In Proceedings of the 2008 international conference on Content-based image and video retrieval, page 47, New York, USA, JulyGoogle Scholar
  22. 22.
    Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In International conference on World Wide Web, Raleigh, North Carolina, USAGoogle Scholar
  23. 23.
    Shi J, Malik J (2000) Normalized cuts and image segmentation. Pattern Anal Mach Intell IEEE Trans 22(8):888–905CrossRefGoogle Scholar
  24. 24.
    Wang M, Hong R, Li G, Zha Z-J, Yan S, Chua T-S (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Trans Multimedia 14(4):975–985CrossRefGoogle Scholar
  25. 25.
    Wang M, Li G, Lu Z, Gao Y, Chua T-S (2013) When amazon meets google: product visualization by exploring multiple web sources. ACM Trans Internet Technol 12(4):12:1–12:17CrossRefGoogle Scholar
  26. 26.
    Weng J, Lee F (2011) Event detection in twitter. In AAAI conference on Weblogs and Social Media, Barcelona, SpainGoogle Scholar
  27. 27.
    Yahiaoui I, Mérialdo B, Huet B (2003) Comparison of multi-episode video summarization algorithms. EURASIP Journal on applied signal processing special issue on multimedia signal processing - Volume 2003 N°1, January 2003, 01Google Scholar
  28. 28.
    Zha Z-J, Yang L, Mei T, Wang M, Wang Z, Chua T-S, Hua X-S (2010) Visual query suggestion: towards capturing user intent in internet image search. ACM Trans Multimedia Comput Commun Appl 6(3):13:1–13:19CrossRefGoogle Scholar
  29. 29.
    Zha Z-J, Zhang H, Wang M, Luan H, Chua T-S (2013) Detecting group activities with multi-camera context. IEEE Trans Circ Syst Video Technol 23(5):856–869CrossRefGoogle Scholar
  30. 30.
    Zhu X, Goldberg AB, Eldawy M, Dyer CR, Strock B (2007) A text-to-picture synthesis system for augmenting communication. In Proceedings of the 22nd national conference on Artificial intelligence, number 2, p 1590–1595Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Hefei University of TechnologyHefeiChina
  2. 2.EURECOMSophia-AntipolisFrance

Personalised recommendations