Abstract
The ability to locate relevant information in cyberspace has been the great value-add of modern search engines. The same approach has also been applied to files stored on local memory to great effect. The ever-increasing volume of information means that the need for effective search engines will only increase with time.
Most search engines focus on locating textual information stored in accessible file formats. The search through other modes of information representation, such as image and video, has also been gaining prominence recently with varying degrees of success. However, one format of information storage is still not prominently accessible through search engines, namely multimedia files that store audio and video content.
Examples of information stored in such files include lectures, news broadcasts and voice notes. The information contained within such files is of a type that is typically sought. For example, people who rely on voice notes would benefit from a platform that allowed them to search quickly through their notes and hyperlink to the relevant audio segments. Students searching for specific material in many video lectures would benefit from being able to quickly search, link to and watch the relevant segments. Similarly, researchers and reporters looking for references about specific topics in news broadcasts, for example, would benefit from such a search engine to quickly identify the relevant content to watch or listen to.
The usefulness of such a platform varies depending on the cultural context of the content being searched. Arabic cultures are highly oral, with much of the information sought after by researchers and students being contained in speeches, lectures and interviews. As such, the usefulness of a search engine as described above is quite high when it comes to organizing Arabic information.
This chapter describes the design, development and testing of a web-based platform that allows for searching through Arabic multimedia content. The platform allows for searching through a corpus of video or audio files for keywords and provides hyperlinks to the relevant audio segments where those keywords are mentioned. As such, the platform relies on an Arabic language transcriber. The approach taken in the development of this platform is that of system integration. Whilst this approach has the advantage of enabling the use of existing resources, it has highlighted the need for more accurate and more easily accessible Arabic language transcribers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
J. Burgess and J. Green, “Uses of YouTube: Digital literacy and the growth of knowledge,” in YouTube: Online video and participatory cultureUK: Polity press, 2009, pp. 126–143.
K. Schoeffmann and C. Cobârzan, “An evaluation of interactive search with modern video players,” in Multimedia and Expo Workshops (ICMEW), 2013 IEEE International Conference on, 2013, pp. 1–4: IEEE.
K. Schoeffmann and F. Hopfgartner, “Interactive video search,” in Proceedings of the 23rd ACM international conference on Multimedia, 2015, pp. 1321–1322: ACM.
T. Tommasi et al., “Beyond metadata: searching your archive based on its audio-visual content,” in IBC2014, Amsterdam, Netherlands, 2014, pp. 1–3.
H. Yang and C. Meinel, “Content based lecture video retrieval using speech and video text information,” IEEE Transactions on Learning Technologies, vol. 7, no. 2, pp. 142-154, 2014.
J. D. V. Miró, J. A. Silvestre-Cerdà, J. Civera, C. Turró, and A. Juan, “Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories,” Speech Communication, vol. 74, pp. 65–75, 2015.
M. Eskevich et al., “Multimedia information seeking through search and hyperlinking,” in Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, 2013, pp. 287–294: ACM.
M. Eskevich, R. Aly, D. Racca, R. Ordelman, S. Chen, and G. J. Jones, “The search and hyperlinking task at MediaEval 2014,” 2014.
R. Kubat, P. DeCamp, B. Roy, and D. Roy, “Totalrecall: visualization and semi-automatic annotation of very large audio-visual corpora,” in ICMI, 2007, vol. 7, pp. 208-215.
S. Kairam, N. H. Riche, S. Drucker, R. Fernandez, and J. Heer, “Refinery: Visual exploration of large, heterogeneous networks through associative browsing,” in Computer Graphics Forum, 2015, vol. 34, no. 3, pp. 301–310: Wiley Online Library.
G. Awad et al., “Trecvid 2016: Evaluating video search, video event detection, localization, and hyperlinking,” in Proceedings of TRECVID, 2016, vol. 2016.
A. Chakraborty, P. Liu, and L. Hsu, “Method and apparatus for authoring and linking video documents,” USA Patent US6462754 B1, Oct 8, 2002.
P. Galuščáková and P. Pecina, “Audio Information for Hyperlinking of TV Content,” in Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia, 2015, pp. 27-30: ACM.
Z. Cheng, X. Li, J. Shen, and A. G. Hauptmann, “Which information sources are more effective and reliable in video search,” in Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016, pp. 1069–1072: ACM.
R. J. Ordelman, M. Eskevich, R. Aly, B. Huet, and G. Jones, “Defining and evaluating video hyperlinking for navigating multimedia archives,” in Proceedings of the 24th International Conference on World Wide Web, 2015, pp. 727–732: ACM.
M. Garnier-Rizet et al., “CallSurf: Automatic Transcription, Indexing and Structuration of Call Center Conversational Speech for Knowledge Extraction and Query by Content,” in LREC, 2008.
C. Barras, A. Allauzen, L. Lamel, and J.-L. Gauvain, “Transcribing audio-video archives,” in Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, 2002, vol. 1, pp. I-13–I-16: IEEE.
C. Gaida, P. Lange, R. Petrick, P. Proba, A. Malatawy, and D. Suendermann-Oeft, “Comparing open-source speech recognition toolkits,” Tech. Rep., DHBW Stuttgart, 2014.
P. Lamere et al., “The CMU SPHINX-4 speech recognition system,” in IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2003), Hong Kong, 2003, vol. 1, pp. 2–5.
D. Povey et al., “The Kaldi Speech Recognition Toolkit,” presented at the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, Hilton Waikoloa Village, Big Island, Hawaii, US, 2011.
O. Mubin, C. Bartneck, L. Feijs, H. Hooft van Huysduynen, J. Hu, and J. Muelver, “Improving speech recognition with the robot interaction language,” Disruptive Science and Technology, vol. 1, no. 2, pp. 79–88, 2012.
B. C. Roy and D. Roy, “Fast transcription of unstructured audio recordings,” presented at the The 10th Annual Conference of the International Speech Communication Association, INTERSPEECH, Brighton, UK, September, 2009, 2009. Available: http://hdl.handle.net/1721.1/67363
K. Almeman, M. Lee, and A. A. Almiman, “Multi dialect Arabic speech parallel corpora,” in Communications, Signal Processing, and their Applications (ICCSPA), 2013 1st International Conference on, 2013, pp. 1–6: IEEE.
K. Almeman. (December 25). Arabic resources. Available: http://www.almeman.com/arabic-resources.html
N. Halabi. (2017, December 25). Arabic speech corpus. Available: http://en.arabicspeechcorpus.com/
A. Chotimongkol, K. Saykhum, P. Chootrakool, N. Thatphithakkul, and C. Wutiwiwatchai, “LOTUS-BN: A Thai broadcast news corpus and its research applications,” in Speech Database and Assessments, 2009 Oriental COCOSDA International Conference on, 2009, pp. 44–50: IEEE.
R. Fromont and J. Hay, “ONZE Miner: the development of a browser-based research tool,” Corpora, vol. 3, no. 2, pp. 173–193, 2008.
F. Torreira, M. Adda-Decker, and M. Ernestus, “The Nijmegen corpus of casual French,” Speech Communication, vol. 52, no. 3, pp. 201–212, 2010.
A. Vandecatseye et al., “The COST278 Pan-European Broadcast News Database,” in LREC, 2004.
H.-M. Wang, B. Chen, J.-W. Kuo, and S.-S. Cheng, “MATBN: A Mandarin Chinese broadcast news corpus,” International Journal of Computational Linguistics and Chinese Language Processing, vol. 10, no. 2, pp. 219–236, 2005.
C. Barras, E. Geoffrois, Z. Wu, and M. Liberman, “Transcriber: development and use of a tool for assisting speech corpora production,” Speech Communication, vol. 33, no. 1, pp. 5–22, 2001.
K. Boudahmane, M. Manta, F. Antoine, S. Galliano, and C. Barras. (2008, December, 25). Transcriber. Available: http://trans.sourceforge.net/en/presentation.php
DGA. (2014, December, 25). TranscriberAG. Available: http://transag.sourceforge.net/
B. Bigi, “SPPAS-multi-lingual approaches to the automatic annotation of speech,” Phonetician, vol. 2015-I-II, no. 111-112, pp. 54 - 69, 2015.
J. Carletta, S. Evert, U. Heid, J. Kilgour, J. Robertson, and H. Voormann, “The NITE XML toolkit: flexible annotation for multimodal language data,” Behavior Research Methods, vol. 35, no. 3, pp. 353–363, 2003.
M. N. Al Laham, I. Ayass, M. Ghareeb, Z. El-Bazzal, and M. Raad, “Audio indexing for YouTube,” in Digital Information and Communication Technology and its Applications (DICTAP), 2015 Fifth International Conference on, 2015, pp. 111–114: IEEE.
W. Walker et al., “Sphinx-4: A Flexible Open Source Framework for Speech Recognition,” Sun Microsystems, Inc., Technical Report 2004.
(2017). FFMPEG. Available: http://ffmpeg.org/
IBM. (2017). IBM Watson speech to text cloud service. Available: https://www.ibm.com/watson/services/speech-to-text/
J. S. Hare, S. Samangooei, and D. P. Dupplaw, “OpenIMAJ and ImageTerrier: Java libraries and tools for scalable multimedia analysis and indexing of images,” presented at the Proceedings of the 19th ACM international conference on Multimedia, Scottsdale, Arizona, USA, 2011.
N. V. Yushmanov, The structure of the Arabic language. 1961.
K. Versteegh, The arabic language. Edinburgh University Press, 2014.
Acknowledgement
The work presented in this chapter was kindly supported by a CNRS-L “Grant Research Programme 2016” (Lebanon) fund.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Raad, M., Bayan, M., Dalloul, Y., Ghareeb, M., Haj-Ali, A. (2018). Arabic Multimedia Search Platform. In: Alja’am, J., El Saddik, A., Sadka, A. (eds) Recent Trends in Computer Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-89914-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-89914-5_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89913-8
Online ISBN: 978-3-319-89914-5
eBook Packages: Computer ScienceComputer Science (R0)