Abstract
Generating visual representation of textual information is a challenging yet interesting topic with many potential applications. In this paper, we propose a novel approach to visualize natural language sentences using ImageNet to enhance language education. Currently the focus is to assist English language learners in building their vocabulary of common nouns and developing an in-depth understanding of the various prepositions of locations. To achieve this goal, real-world images representing nouns are obtained from ImageNet and their foreground objects of interest are extracted using image segmentation. The objects are then re-arranged on a canvas based on their spatial relationship specified in the sentence. To demonstrate the effectiveness of the proposed approach, we have developed a mobile application that uses the RESTful API to retrieve the images from the web service that operate the image generation program. The prototype mobile application can create visual representations of natural language sentences and a text description of the spatial relationship of objects to assist in learning new vocabulary and spatial prepositions during language education.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Omaggio, A.C.: Pictures and second language comprehension: do they help? Foreign Lang. Ann. 12(2), 107–116 (1979)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
Miller, G.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Delgado, D., Magalhaes, J., Correia, N.: Assisted news reading with automated illustration. In: Proceedings of the International Conference on Multimedia – MM 2010 (2010)
Inaba, S. Kanezaki, A., Harada, T.: Automatic image synthesis from keywords using scene context. Ibn: Proceedings of the ACM International Conference on Multimedia – MM 2014 (2014)
Zitnick, C., Parikh, D., Vanderwende, L.: Learning the visual interpretation of sentences. In: 2013 IEEE International Conference on Computer Vision (2013)
Coyne, B., Sproat, R.: WordsEye. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques – SIGGRAPH 2001 (2001)
Mano, T., Yamane, H. Harada, T.: Scene image synthesis from natural sentences using hierarchical syntactic analysis. In: Proceedings of the 2016 ACM on Multimedia Conference – MM 2016 (2016)
Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009)
Rother, C., Kolmogorov, V., Blake, A.: GrabCut. ACM Trans. Graph. 23(3), 309 (2004)
Efros, A.: Image Compositing and Blending, Carnegie Mellon University (2007). http://graphics.cs.cmu.edu/courses/15-463/2007_fall/Lectures/blending.pdf. Accessed 4 Feb 2019
Apple Developer Documentation Web Page. https://developer.apple.com/documentation/speech. Accessed 4 Feb 2019
Apple Developer Documentation Web Page. https://developer.apple.com/documentation/foundation/urlsession. Accessed 4 Feb 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Kang, R., Sunil, A., Chen, M. (2019). Mobile App for Text-to-Image Synthesis. In: Yin, Y., Li, Y., Gao, H., Zhang, J. (eds) Mobile Computing, Applications, and Services. MobiCASE 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 290. Springer, Cham. https://doi.org/10.1007/978-3-030-28468-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-28468-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28467-1
Online ISBN: 978-3-030-28468-8
eBook Packages: Computer ScienceComputer Science (R0)