2D freehand sketch labeling using CNN and CRF

  • Xianyi Zhu
  • Yi XiaoEmail author
  • Yan Zheng


Accurate and fast sketch segmentation and labeling is a hard task, since sketches have much fewer features than natural images. This paper proposes a novel hybrid approach for fast automatic sketch labeling, which is based on convolutional neural network (CNN) and conditional random field (CRF). Firstly, we design a CNN for stroke classification. The CNN is equipped with larger first layer filters and larger pooling, which is suitable for extracting descriptive features from strokes. Secondly, we integrate each stroke with its host sketch to construct a more informative input for the CNN model. Finally, we leverage the spatio-temporal relations among strokes in the same sketch to create a connected graph, based on which we apply a CRF model to further refine the result of the CNN. We evaluate our method on two public benchmark datasets. Experimental results demonstrate that our method achieves the state-of-the-art level on both accuracy and runtime.


2D sketch labeling Stroke classification Convolutional neural network Conditional random field Connected graph creation 



The work is supported by the National Key R&D Program of China (2018YFB0203904), NSFC from PRC (61872137, 61502158, 61803150), Hunan NSF (2017JJ3042, 2018JJ3067).


  1. 1.
    Besag J (1986) On the statistical analysis of dirty pictures. J R Stat Soc Ser B Methodol 48(3):259–302MathSciNetzbMATHGoogle Scholar
  2. 2.
    Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM Trans. Graph 31(4):44:1–44:10Google Scholar
  3. 3.
    Fan L, Wang R, Xu L, Deng J, Liu L (2013) Modeling by drawing with shadow guidance. Comput Graphics Forum 32(7):157–166CrossRefGoogle Scholar
  4. 4.
    Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J, Chen T (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377. CrossRefGoogle Scholar
  5. 5.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
  6. 6.
    He JY, Wu X, Jiang YG, Zhao B, Peng Q (2017) Sketch recognition with deep visual-sequential fusion model. In: Proceedings of the 2017 ACM on multimedia conference. ACM, pp 448–456Google Scholar
  7. 7.
    Hu M, Ou B, Xiao Y (2017) Efficient image colorization based on seed pixel selection. Multimedia Tools Appl 76(22):23567–23588CrossRefGoogle Scholar
  8. 8.
    Huang Z, Fu H, Lau RW (2014) Data-driven segmentation and labeling of freehand sketches. ACM Trans Graph 33(6):175:1–175:10CrossRefGoogle Scholar
  9. 9.
    Kim B, Wang O, Öztireli AC, Gross M (2018) Semantic segmentation for line drawing vectorization using neural networks. Comput Graphics Forum 37(2):329–338CrossRefGoogle Scholar
  10. 10.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  11. 11.
    Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence dataGoogle Scholar
  12. 12.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  13. 13.
    Li SZ (1994) Markov random field models in computer vision. In: European conference on computer vision. Springer, pp 361–370Google Scholar
  14. 14.
    Li B, Lu Y, Johan H, Fares R (2017) Sketch-based 3d model retrieval utilizing adaptive view clustering and semantic information. Multimed Tools Appl 76 (24):26603–26631CrossRefGoogle Scholar
  15. 15.
    Li Y, Lei H, Lin S, Luo G (2018) A new sketch-based 3d model retrieval method by using composite features. Multimed Tools Appl 77(2):2921–2944CrossRefGoogle Scholar
  16. 16.
    Li L, Fu H, Tai C (2019) Fast sketch segmentation and labeling with deep learning. IEEE Comput Graph Appl 39(2):38–51. CrossRefGoogle Scholar
  17. 17.
    Liu L, Wiliem A, Chen S, Lovell BC (2016) Automatic and quantitative evaluation of attribute discovery methods. In: 2016 IEEE winter conference on applications of computer vision, WACV 2016, Lake Placid, NY, USA, March 7-10, pp 1–9Google Scholar
  18. 18.
    Liu L, Shen F, Shen Y, Liu X, Shao L (2017) Deep sketch hashing: fast free-hand sketch-based image retrieval. In: Proceedings of CVPR, pp 2862–2871Google Scholar
  19. 19.
    Liu L, Wiliem A, Chen S, Lovell BC (2017) What is the best way for extracting meaningful attributes from pictures? Pattern Recogn 64:314–326CrossRefGoogle Scholar
  20. 20.
    Liu L, Nie F, Wiliem A, Li Z, Zhang T, Lovell BC (2018) Multi-modal joint clustering with application for unsupervised attribute discovery. IEEE Trans Image Process 27(9):4345–4356MathSciNetCrossRefGoogle Scholar
  21. 21.
    Lowe DG (1999) Object recognition from local scale-invariant features. In: IEEE international conference on computer vision. IEEE, pp 1150–1157Google Scholar
  22. 22.
    Mark S (2015) UGM: Matlab code for undirected graphical models.
  23. 23.
    Noris G, Sỳkora D, Shamir A, Coros S, Whited B, Simmons M, Hornung A, Gross M, Sumner R (2012) Smart scribbles for sketch segmentation. Comput Graphics Forum 31(8):2516–2527CrossRefGoogle Scholar
  24. 24.
    Qi Y, Guo J, Li Y, Zhang H, Xiang T, Song YZ (2013) Sketching by perceptual grouping. In: 2013 20th IEEE international conference on image processing (ICIP). IEEE, pp 270–274Google Scholar
  25. 25.
    Qi Y, Song YZ, Xiang T, Zhang H, Hospedales T, Li Y, Guo J (2015) Making better use of edges via perceptual grouping. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1856–1865Google Scholar
  26. 26.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Li F (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRefGoogle Scholar
  27. 27.
    Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245MathSciNetCrossRefGoogle Scholar
  28. 28.
    Sangkloy P, Burnell N, Ham C, Hays J (2016) The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans Graph (TOG) 35(4):119CrossRefGoogle Scholar
  29. 29.
    Sangkloy P, Lu J, Fang C, Yu F, Hays J (2017) Scribbler: controlling deep image synthesis with sketch and color. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 2Google Scholar
  30. 30.
    Sarvadevabhatla RK, Dwivedi I, Biswas A, Manocha S et al (2017) Sketchparse: towards rich descriptions for poorly drawn sketches using multi-task hierarchical deep networks. In: Proceedings of the 2017 ACM on multimedia conference. ACM, pp 10–18Google Scholar
  31. 31.
    Schneider RG, Tuytelaars T (2016) Example-based sketch segmentation and labeling using crfs. ACM Trans Graph 35(5):151:1–151:9CrossRefGoogle Scholar
  32. 32.
    Seddati O, Dupont S, Mahmoudi S (2017) Deepsketch 3. Multimed Tools Appl 76(21):22333–22359CrossRefGoogle Scholar
  33. 33.
    Shang C, Liu Q, Chen KS, Sun J, Lu J, Yi J, Bi J (2018) Edge attention-based multi-relational graph convolutional networks. arXiv preprint arXiv:180204944
  34. 34.
    Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representationsGoogle Scholar
  35. 35.
    Sun Z, Wang C, Zhang L, Zhang L (2012) Free hand-drawn sketch segmentation. In: European conference on computer vision. Springer, pp 626–639Google Scholar
  36. 36.
    Tan G, Chen H, Qi J (2016) A novel image matting method using sparse manual clicks. Multimed Tools Appl 75(17):10213–10225CrossRefGoogle Scholar
  37. 37.
    Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp 1799–1807Google Scholar
  38. 38.
    Wan L, Xiao Y, Dou N, Leung C, Lai Y (2018) Scribble-based gradient mesh recoloring. Multimed Tools Appl 77(11):13753–13771CrossRefGoogle Scholar
  39. 39.
    Wang C, Yang H, Bartz C, Meinel C (2016) Image captioning with deep bidirectional lstms. In: Proceedings of the 24th ACM international conference on multimedia. ACM, pp 988–997Google Scholar
  40. 40.
    Wang C, Yang H, Meinel C (2016) A deep semantic framework for multimodal representation learning. Multimed Tools Appl 75(15):9255–9276CrossRefGoogle Scholar
  41. 41.
    Wang C, Niepert M, Li H (2018) LRMM: learning to recommend with missing modalities. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31 - November 4, pp 3360–3370Google Scholar
  42. 42.
    Wang C, Yang H, Meinel C (2018) Image captioning with deep bidirectional lstms and multi-task learning. ACM Trans Multimed Comput Commun Appl (TOMM) 14(2s):40Google Scholar
  43. 43.
    Wang SH, Muhammad K, Hong J, Sangaiah AK, Zhang YD (2019) Alcoholism identification via convolutional neural network based on parametric relu, dropout, and batch normalization. Neural Comput & Applic, pp 1–16.
  44. 44.
    Xu K, Chen K, Fu H, Sun WL, Hu SM (2013) Sketch2scene: sketch-based co-retrieval and co-placement of 3d models. ACM Trans Graph (TOG) 32(4):123CrossRefGoogle Scholar
  45. 45.
    Xu B, Chang W, Sheffer A, Bousseau A, McCrae J, Singh K (2014) True2form: 3d curve networks from 2d sketches via selective regularization. ACM Trans Graph 33(4):131:1–131:13Google Scholar
  46. 46.
    Yin W (2009) Gurobi mex: a matlab interface for gurobi.
  47. 47.
    Yu Q, Yang Y, Liu F, Song YZ, Xiang T, Hospedales TM (2017) Sketch-a-net: a deep neural network that beats humans. Int J Comput Vis 122 (3):411–425MathSciNetCrossRefGoogle Scholar
  48. 48.
    Zhang YD, Muhammad K, Tang C (2018) Twelve-layer deep convolutional neural network with stochastic pooling for tea category classification on gpu platform. Multimed Tools Appl 77(17):22821–22839CrossRefGoogle Scholar
  49. 49.
    Zheng Y, Cao X, Xiao Y, Zhu X, Yuan J (2019) Joint residual pyramid for joint image super-resolution. J Vis Commun Image Represent 58:53–62CrossRefGoogle Scholar
  50. 50.
    Zhou S, Zhou C, Xiao Y, Tan G (2018) Patchswapper: a novel real-time single-image editing technique by region-swapping. Comput Graph 73:80–87CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Computer Science and Electronic EngineeringHunan UniversityChangshaPeople’s Republic of China
  2. 2.College of Electrical and Information EngineeringHunan UniversityChangshaPeople’s Republic of China

Personalised recommendations