Advertisement

Online separation of handwriting from freehand drawing using extreme learning machines

  • Danilo AvolaEmail author
  • Marco Bernardi
  • Luigi Cinque
  • Gian Luca Foresti
  • Cristiano Massaroni
Article
  • 39 Downloads

Abstract

Online separation between handwriting and freehand drawing is still an active research area in the field of sketch-based interfaces. In the last years, most approaches in this area have been focused on the use of statistical separation methods, which have achieved significant results in terms of performance. More recently, Machine Learning (ML) techniques have proven to be even more effective by treating the separation problem like a classification task. Despite this, also in the use of these techniques several aspects can be still considered open problems, including: 1) the trade-off between separation performance and training time; 2) the separation of handwriting from different types of freehand drawings. To address the just reported drawbacks, in this paper a novel separation algorithm based on a set of original features and an Extreme Learning Machine (ELM) is proposed. Extensive experiments on a wide range of sketched schemes (i.e., text and graphical symbols), more numerous than those usually tested in any key work of the current literature, have highlighted the effectiveness of the proposed approach. Finally, measurements on accuracy and speed of computation, during both training and testing stages, have shown that the ELM can be considered, in this research area, the better choice even if compared with other popular ML techniques.

Keywords

Extreme learning machines Handwriting Freehand drawing Online separation algorithm Sketch-based interfaces 

1 Introduction

In recent years, sketch-based interfaces have become a very active area of interest due to the increasing needs of the modern human-computer interaction (HCI) [19, 30]. The widespread of advanced devices (e.g., smartphones, tablets, interactive tables) has transformed freehand drawing and handwriting in a suitable interaction to transmit ideas, commands, and concepts in a very intuitive and natural way [4, 7, 18]. In this paper, for freehand drawings we mean a set of sketched 2D graphical symbols by which to define logical schemes (an example is shown in the top-side of the Fig. 1a). Instead, for handwritings we mean a collection of handwritten words derived by using the Roman alphabet in lower-case style (an example is shown in the bottom-side of the Fig. 1a). To highlight the potential of the sketch-based interfaces, in Fig. 1b and c an example representing a Graphical User Interface (GUI) builder by sketch is reported. In this kind of applications, a user can quickly sketch the elements of a GUI, which are subsequently recognized, coded, and implemented by means of an ad-hoc framework [34]. Currently, sketch-based interfaces are profitably used in a wide range of applications, including educational interfaces [32], information retrieval [1], mobile robot control [13], structure designing [47], and many others.
Fig. 1

Examples of drawn sketches: a twelve freehand drawings (i.e., 2D graphical symbols) and nine handwritings (i.e., five words of length five and four words of length ten), b sketched GUI, and c implemented GUI

In the contexts reported above, the task to automatically distinguish handwriting from freehand drawing, within the same sketched scheme, is assuming an increasing interest. Indeed, this task, also known as domain separation problem [6, 8, 9], plays a key role in enabling and supporting subsequent processing, such as handwriting recognition, freehand drawing vectorization, graphical analysis, and so on. Considering again the example reported in Fig. 1b and c, the importance of the automatic separation can be highlighted. In fact, while a set of sketched pixels recognized as freehand drawing must be subsequently processed by a geometrical recognition algorithm to define the shape (e.g., circle, square) and the related properties (e.g., size, orientation) [6], a set of sketched pixels recognized as handwriting must be subsequently processed by an Optical Character Recognition (OCR) algorithm to determine the exact alphanumeric symbol [6].

Domain separation problem presents numerous difficulties due to the high variability of factors such as sketch ambiguities [2, 5], personal stroke styles [4, 22], and variety of graphical symbols [8, 53]. In the last two decades, many research groups have proposed very interesting solutions, often based on statistical approaches, able to achieve remarkable results [6, 10, 16, 37]. Despite this, the continuous advancements of the current sketch-based interfaces are promoting the searching for always more effective solutions. Among them, the use of ML techniques seems to be rather promising in terms of performance [9, 12, 41]. A recent work, reported in [9], has shown how the domain separation problem can be treated like a binary classification task. By a set of discriminative features, tested on both different schemes (i.e., electronic circuits, mind maps, Venn diagrams, use cases, flowcharts, and entity-relationship diagrams) and several handwriting styles, the authors have used a Support Vector Machine (SVM) to achieve a high level of accuracy. Despite the obtained results, the work reported in [9] also presents some drawbacks, including: 1) the need to have a high number of training tests to achieve a good performance during the online separation stage; 2) the increasing of the classification error tied to the augmenting of the types of freehand drawing schemes.

To address the issues reported above and to treat the domain separation problem like a classification one, in this paper wide improvements and extensions of the work described in [9] are introduced. In particular, in this work, a novel algorithm based on a set of original features and an ELM is proposed [25]. ELM was chosen because several state-of-the-art works had already shown how, in general, it could learn faster than an SVM [14, 26, 35]. Our hope, later confirmed by the experimental results (see Section 4), was that even in the domain separation problem an ELM could achieve optimal results as previously occurred in different fields [36, 46, 48], including speech recognition [33], object enhancement [29], object recognition [28, 49], and many others. Anyway, for completeness, in Section 3.5 a brief but effective analysis of why an ELM learns faster than an SVM is reported. In the proposed approach, each acquired stroke is preliminarily processed to reduce the noise caused by small involuntary hand movements or inaccuracy of the acquisition device. During the user online sketching, spatio-temporal constraints are computed in real-time to distinguish the different objects, i.e., sets of strokes, present within the schema. Subsequently, from each object, a set of highly discriminative features are extracted and normalized to be aggregated in a single feature vector. Finally, the ELM is used to classify each generated feature vector in one of the two classes: handwriting or freehand drawing. Concluding, the main contributions of the proposed algorithm, compared to both the work in [9] and the current literature, can be summarized as follows:
  • the extension of the set of features proposed in [9] with two original features suitably studied to reduce the ambiguity problems due to the similarity of some textual and graphical styles;

  • the extension of the overall architecture proposed in [9], with a new module to standardize the range values of the feature vectors;

  • for the first time in the state-of-the-art, the use of an ELM, instead of an SVM, to improve the performance of the domain separation algorithm;

  • the extension of the experiments reported in [9] with the introduction of two new challenging schemes, the augmenting of test users, and the increasing of the sketched instances, i.e., text or graphical symbols, for each scheme.

The obtained results show that the proposed approach is a concrete contribute to the current literature. In particular, the comparison between it and the SVM based method reported in [9], proves that the ELM can be considered the best choice in terms of accuracy and speed of computation. To be sure to provide performance results really comparable with the current state-of-the-art, we have built a dataset containing all the schemas used by the other works with which we compare the approach we propose.

The rest of the paper is structured as follows. Section 2 provides a discussion on the main works focused on the domain separation problem. Section 3 shows, in detail, the architecture of the proposed method. Section 4 reports the experimental results. Finally, Section 5 concludes the paper.

2 Related work

Undoubtedly, several aspects of the domain separation problem still require of further investigations. Through the years, many authors have been focused on more standard issues, such as the recognition of specific handwritten languages [37, 41, 54], but only few works have regarded the separation of handwriting from freehand drawing, especially with a high number of different graphical schemes [9, 12]. In general, the techniques used to address the domain separation problem can be divided in two main classes: statistical-based [6, 8, 17, 37] and learning-based [9, 12, 24, 41]. In the first, several features are extracted from representative sets of textual and graphical objects and, then, they are used to define a statistical model able to distinguish the different sketched elements. Instead, in the second, the same features are used to train an ML algorithm. Certainly, features and separation methods play a key role in the overall performance of a classification system. In our context, the proposed features have been implemented to catch the distinctive attributes of a set of strokes (e.g., curvature, linearity), similarly to how are studied spline trajectories [44] and calligraphy properties [55] produced by a set of contiguous drawn pixels. Instead, the ML method has been chosen to speed up the binary classification, not only during the training stage, but also during the testing one since the system requires to work in online mode. In Table 1, an overview about feature types, separation methods, interactive modalities, and number of contexts, treated by several key works of the current literature is reported.
Table 1

Main characteristics of the state-of-the-art approaches

Works

Feature types

Separation methods

Online

Multiple contexts

Machii et al. [37]

Geometric

Statistical

No

No

Ouyang et al. [40]

Geometric, Mathematical

Statistical

No

No

Avola et al. [6]

Mathematical

Statistical

Yes

No

Bath et al. [10]

Mathematical

Statistical

No

No

Hammond et al. [23]

Geometric, Mathematical

Learning

No

No

Blagojevic et al. [12]

Geometric

Learning

No

Yes

Patil et al. [42]

Geometric

Learning

No

No

Avola et al. [8]

Mathematical

Statistical

Yes

No

Tanvir et al. [41]

Geometric

Learning

Yes

No

Herold et al. [24]

Geometric

Learning

No

No

Avola et al. [9]

Mathematical

Learning

Yes

Yes

Dahake et al. [17]

Mathematical

Statistical

Yes

No

2.1 Statistical-based methods

A first example of statistical-based method to distinguish text and drawing patterns was proposed in [37], where text elements were formed by Japanese characters. In the method, geometric features were used. In detail, each pattern was considered as a set of segments and the features were computed by the relationships among them, including segment length, number of segments, and bounding-box size (i.e., the rectangular area that contains all the segments). In [6, 8], the evolution of a framework to automatically separate handwriting from freehand drawing based on four mathematical features (i.e., curvature, entropy, interception, and closeness) was reported. In the first release [6], the authors focused on detecting number and type of the different objects in a same schema. In addition, in the second release [8], they added two further modules to perform both text recognition and freehand drawing identification. Another interesting solution was presented in [10]. In the work, the authors described how the entropy rate could be used as a discriminative feature for the classification of sketched objects. As discussed in [41], statistical-based approaches can have effective results in controlled conditions, but they can show great limitations in practical scenarios, due to their lack of robustness in the management of sketched objects affected by noise.

2.2 Learning-based methods

Thanks to the remarkable results provided by ML techniques in many application fields, especially in classification tasks, recent years have seen the use of several solutions, based on learning, to explore the domain separation problem. An engaging overview was proposed in [12], where the authors analysed a wide range of ML algorithms, including Bootstrap Aggregating (BA), LADTree, Logistic Model Tree (LMT), LogitBoost, Multi Layer Perceptron (MLP), Random Forest (RF), and Sequential Minimal Optimization (SMO), to test the separation of handwriting from freehand drawing on the basis of a set of established features. The latter, based on [10] and derived by the geometric properties of the grounded theory [12], was enriched by a new feature that took into consideration the pressure exerted by the user pen during the sketching. Also in this case, the authors made considerable efforts to search for an additional new feature and a suitable ML technique to confirm the success of the learning paradigm in the case of domain separation problem. Regarding Nearest Neighbour (NN) classifiers, a separation method between Arabic letters and freehand drawing, by using approximated polygons, was presented in [41]. In this work, several features computed by geometric properties of the strokes were extracted. On the same family of NN classifiers, in [42] an algorithm based on geometric and mathematical features as shape roundness, stroke orientation, compactness, and aspect ratios was reported. The use of statistical ML techniques was explored in [24], where authors adopted a feature vector composed by curvature, and other geometrical measures, to appropriately segment sketched strokes. More recently, a different solution was proposed in [9], where the features extracted from the sketch were considered as points of a hyperplane, and the domain separation problem was treated like a binary classification task. The goodness of the obtained results pointed out the suitability of the used approach. Unlike the just reported literature, the solution we propose in this paper adopts the ELM to train and test the separation of sketched elements on the basis of multiple graphical schemes.

3 ELM applied to the domain separation problem

In this section, the proposed ELM-based method to address the domain separation problem is presented. In Fig. 2 the overall architecture is reported.
Fig. 2

The logical architecture of the proposed ELM-based method is divided in four stages: pre-processing, feature extraction, feature normalization, and sketched object classification

3.1 Overall architecture

The developed algorithm is divided in four stages. In the first, each acquired stroke is cleaned by inaccuracies and noise. Moreover, each set of strokes that respects an established number of spatio-temporal constraints is joined in a single object. In the second, from each object, a set of features is extracted to form a representative vector of the object itself. In the third, a normalization process is adopted to standardize the range values of each vector. Finally, in the last, a trained ELM is used to correctly classify each object as handwriting or freehand drawing. The whole classification stage occurs in online mode.

3.2 Pre-processing stage

Since small involuntary hand movements, personal stroke style, and inaccuracy of the acquisition device can introduce a certain level of noise in a sketch, some ad-hoc algorithms must be used to provide, as much as possible, a clean acquisition. The pre-processing stage in the proposed approach is composed of three low-level processing algorithms. In the first, inherited by the works reported in [3, 5], isolated pixels and very short strokes are automatically deleted from the sketch layout according to a fixed threshold. The latter represents the minimum admissible stroke size in terms of pixels and mainly depends by aspects such as drawable area dimension and tool used to draw the sketch. A suitable threshold value can influence the performance of the system, since a too small value could maintain strokes derived by noise, while a too large value could remove important strokes. In the proposed work, we have fixed the threshold value (i.e., 4 pixels) according to some empirical tests performed by using the dataset presented in Section 4.1. In the second, unlike the work we extend [9], a Bezier based normalization method [50] is implemented to reconstruct the possible missing pixels due to the lack of acquisition or inaccuracy [52]. Finally, as reported in [4, 6, 7], strokes that present spatio-temporal relationships are joined in a single object. These relationships, based mainly on spatial intersection and time-line of the sketched strokes, are aimed to build a homogeneous candidate object that require to be analysed.

3.3 Feature extraction stage

Feature extraction stage is a critical step of the domain separation problem that can seriously influence the performance of the method. This work inherits six discriminative features (i.e., entropy, band-ratio, direction, intersection, x-scan, and projection y-t) recently reported in [9] and presents two original features (i.e., curvature, linearity) suitably studied to reduce the ambiguity problems due to the similarity of some textual and graphical styles. The introduction of these last two features was inspired by several state-of-the-art works, in which different authors analysed trajectories and patterns of the strokes to identify multi-scale handwriting [20], multi-language handwriting [45], and basic drawn 2D shapes [31]. The main idea was that to exploit the kind of analysis just reported to quantify the variations of strokes in terms of amount of direction changes (i.e., curvature, see Section 3.3.1) and amount of horizontal variability (i.e., linearity, see Section 3.3.2), thus supporting the interpretation of ambiguous cases such as the separation of a set of aligned similar shapes (e.g., circles, triangles) from squared lower-case personal style.

3.3.1 Curvature feature

In general, the stroke curvature can be defined as the sum of the inner angles of the drawn curves according to the direction of the tracing itself [6]. The most common geometric symbols and frequent characters provide a high level of similarity on the total ranges of the related curvature values. For this reason, inspired by [51], in the proposed curvature feature these ranges are divided into 14 discrete classes (identified by a single code) based on the uniform frequency of the curvature values, as shown in Table 2.
Table 2

Classes of curvature values

Code

Curvature angle

1

0 < 𝜃 ≤ 60

2

60 < 𝜃 ≤ 90

3

90 < 𝜃 ≤ 108

4

108 < 𝜃 ≤ 120

5

120 < 𝜃 ≤ 128

6

128 < 𝜃 ≤ 135

7

135 < 𝜃 ≤ 140

8

140 < 𝜃 ≤ 180

9

180 < 𝜃 ≤ 210

10

210 < 𝜃 ≤ 240

11

240 < 𝜃 ≤ 270

12

270 < 𝜃 ≤ 300

13

300 < 𝜃 ≤ 330

14

330 < 𝜃 ≤ 360

3.3.2 Linearity feature

The first step of the linearity feature is to calculate the set of bounding boxes, containing the stroke, in correspondence of its maximum and minimum values (in the bi-dimensional space defined by the drawable area). In Fig. 3a an example of stroke and related bounding boxes is shown. The linearity degree of the stroke, depicted in Fig. 3b, is measured by the horizontal alignment of the bounding box barycenters according to a specific threshold (also in this case, established by the size of the drawable aree). Notice that, the implemented feature is not evaluated on the whole stroke, but on short segments of it. In this way, the fast variations of the linearity values can be accurately measured. Summarizing, for each bounding box i, the barycenters ci is computed and the linearity feature ls of the stroke s can be quantified as follows:
Fig. 3

Example of linearity feature: a bounding boxes on the original stroke; b maxima and minima computation

$$ l_{s} = \left| \sum\limits_{i = 0}^{N} \frac{ \left[ \frac{x_{max_{i}} - x_{min_{i}}}{2} + x_{max_{i}} \right] - c_{i}} {\left[ \frac{x_{max_{i}} - x_{min_{i}}}{2} + x_{max_{i}} \right]} \right| $$
(1)
where, N is the number of bounding boxes, while \(x_{max_{i}}\) and \(x_{min_{i}}\) are the i-th relative, or absolute, maximum/minimum values of the i-th bounding box.

3.4 Feature normalization stage

Since the range values of the proposed features can vary widely, in some ML methods, including the ELM, the activation functions may not work properly. So, a feature normalization step is required to obtain a more robust and reliable classification task. In this work, given a feature vector f(s) for a sketched stroke s, a Z-score normalization [38] is applied as follows:

$$ \hat{f_{i}}= \frac{f_{i} - \mu}{\sigma} $$
(2)
where, fi is the i-th feature vector, while μ and σ are the average and the standard deviation, respectively, of fi with respect to the whole training set. Concluding, the Z-score normalization returns a feature vector \(\hat {f}(s)\) based on a common scale, with μ = 0 and σ = 1.

3.5 ELM based method and classification stage

Recently, ELMs have gained increasing interest in various research areas. They were introduced by Huang et al. [25], as a generalization of the Single-hidden Layer Feed-forward Networks (SLFNs), with the aim to overcome their weaknesses about the learning speed. Usually, SLFNs use a gradient-based learning algorithm for the training step, where all the parameters are interactively computed. To speed-up this aspect, ELMs adopt a minimum norm Least-Squares (LS) based solution [38]. Another key difference between SLFNs and ELMs regards the random selection of both hidden node weight and bias. As reported in Section 1, in our previous work, an SVM was used in the classification stage. In addition, during the learning phase, the training data were used to adjust both classification and kernel parameters, thus making the SVM performance strongly dependent on the parameters that required to be tuned. Due to the time-consuming needed to manually supervise this kind of optimization, in the current state-of-the-art several techniques have been proposed for automatically tuning these parameters [21, 43]. An example is represented by the Grid algorithm, which is one of the most popular methods to find the best parameter setting. Unfortunately, it has a high computational cost and can provide non-optimal solutions [43]. Unlike the SVM, the ELM needs to tune only two parameters: number of hidden nodes and regularization factor [28]. Since both hidden node weight and bias are chosen randomly, ELM allows to avoid extended parameter tuning phases, thus reaching, compared with other learning methods [15, 35, 38], a quick estimation of the output weights.

The method we propose uses an ELM with multiple output nodes [27]. In particular, in our context, the output layer is composed of two nodes, as shown in Fig. 4, a node for each class (i.e., handwriting and freehand drawing). Given a trained ELM and as input a set of normalized features \(\hat {f_{i}}\) of a sketched object o (1 ≤ im, where m is the number of the features), the value of each output node yj can be computed as follows:
$$ y_{j} = \sum\limits_{v = 1}^{n}\beta_{v,j} h_{v}(\hat{f_{i}}) $$
(3)
where, n is the number of nodes in the hidden layer, βv,j is the weight of the connection between the hidden node hv and the output layer yj, and the term \(h_{v}(\hat {f_{i}})\) is the hidden node output computed by:
$$ h_{v}(\hat{f_{i}}) = g(w_{i,v}\hat{f_{i}}+b_{v}) $$
(4)
where, wi,v is the weight vector connecting hv and the i-th input nodes, bv is the bias of v-th hidden node, and g(∙) is the ReLU [39] activation function. Finally, by the following classification function:
$$ label(s) = arg \max_{j \in \{0,1\}} y_{j}(s) $$
(5)
the predicted class label related to the object o is established.
Fig. 4

Logical scheme of the ELM with multiple output nodes

4 Experimental results

In this section, the experimental results and related discussions are reported. The aim of the experiments has been to verify the effectiveness of the proposed method in terms of training time, with respect to the work proposed in [9], and performance, with respect to the selected key works of the current literature, including [9]. Particular attention has been given to the effectiveness of the new set of features. We have also focused on the evaluation of the proposed feature vector by means of the ELM classifier. The experiments were performed by using an Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz, 32GB DDR3 RAM, NVIDIA GeForce GTX 980m, and a graphic table Wacom CTL-4100K-S.

The rest of the experimental section is organized as follows. In Section 4.1, the implemented ad-hoc dataset is described. In Section 4.2, the analysis about the optimal number of the hidden nodes used in the ELM configuration is provided. In addition, a comparison between the ELM and SVM computation performance is presented. In Section 4.3, a discussion about the effectiveness of proposed set of features is presented. Finally, in Section 4.4, the discussion of the results and the comparison of the proposed method with selected key works of the current literature are reported.

4.1 Dataset

In this work, a new challenging dataset was built to prove the robustness of the proposed approach. In addition to the scenarios used in [9], for the present work 2 new challenging study cases were added: chemical diagrams [40] and action diagrams [23]. In particular, the latter can be considered a very difficult test due to the complexity of the non-standard drawn patterns and shapes. Examples of the 8 adopted scenarios are reported in Fig. 5. The new dataset was realized by 42 users, divided in 28 males and 14 females aged from 20 up to 42 years (6 of the total users were left-handed). Each user was asked to draw templates of the whole set of graphical symbols for 20 times and a set of 2000 words, with different lengths, for 10 times.
Fig. 5

Examples of adopted scenarios: 1) electronic circuits, 2) mind maps, 3) Venn diagrams, 4) chemical diagrams, 5) use cases, 6) flowcharts, 7) entity-relationship diagrams, and 8) action diagrams

4.2 ELM performance analysis

To select the optimal number of hidden units within the ELM, several tests was performed. In Fig. 6a, it can be observed that the accuracy increases with the number of hidden nodes, but, reached a threshold, the results tend to converge. On the basis of this threshold, we established the number of hidden units to 30. By using the just reported hidden node configuration, a further test was performed to estimate the difference of the training time between ELM and SVM. The results, summarized in Table 3, highlight that, on the same set of scenarios, the ELM significantly outperforms the SVM. In particular, the ELM improves the time performance on both training and test phases.
Fig. 6

Performance analysis: a ELM accuracy with respect to the number of hidden units; b accuracy between the proposed approach and that proposed in [9], by using the new dataset

Table 3

Comparison between SVM and ELM training and test time

Algorithms

Training time (s)

Test time (s)

SVM

0.1658

0.0100

ELM

0.0932

0.0009

4.3 ELM feature analysis

In order to evaluate the effectiveness of the proposed set of features, we have trained our model across different combination of features. Table 4 shows the results by using combinations of 7, 6, and 5 features, respectively. In these experiments, as evaluation parameter, we calculated the average accuracy across all possible combinations. Comparing the results with those presented in Table 5, we can observe that the method achieves significant and consistent outcomes according to the increasing number of the features. Actually, the reported experience can be well understood considering that each feature has been specifically designed to catch a discriminative characteristic of a handwritten stroke. Anyway, we have also observed that under 5 features the accuracy of the method drastically decreases.
Table 4

Average separation accuracy of the proposed method compared to different combination of features

Number of features

Accuracy

7

97.9%

6

96.7%

5

86.3%

Table 5

Comparison of the accuracy among state-of-the-art approaches

Method

Accuracy

Proposed method

98.4%

Avola et al. [9]

97.3%

Bishop et al. [11]

97%

Bhat et al. [10]

92.1%

Hammond et al. [23]

90%

Blagojevic et al. [12]

90.5%

Machii et al. [37]

88%

Avola et al. [8]

85%

4.4 Comparative analysis

The improvements of the proposed method compared to both the work cited in [9] and the current literature, have been highlighted by the use of the new dataset. Regarding the work reported in [9], the obtained results, shown in Fig. 6b, have pointed out that the accuracy of the ELM-based method is of the 98.17%, against to that of the SVM-based method of the 97.2%. In particular, remarkable enhancements have been obtained in Venn and E-R diagrams tests, with an increase of the 1.3% in the first case, and an increase of the 2.7% in the last. Excellent results have been also obtained on the use case scenario, with an accuracy of over the 99.0%. Only in the electronic circuits case, the proposed work does not exceed the work reported in [9].

The lack of a public dataset does not allow a direct comparison between the work we propose and the current state-of-the-art. However, unlike the other key works in literature, we have built the new dataset just for fill this gap. In particular, we have implemented the dataset considering all the scenarios used by the other works. In this way, we have could compare our results with the others, as reported in Table 5, where, in addition to that we propose, the accuracy of the following selected key works is reported [8, 9, 10, 11, 12, 23, 37].

As discussed above, the other solutions that we have examined present the analysis of a limited number of scenarios. For example, Machii et al. [37] focused only on a dataset containing 18 graphical patterns. Bishop et al. [11], instead, worked on the separation of text from graphical strokes, by using only data collected among the employees at Microsoft Research in Cambridge. The tests, moreover, were not applied on challenging scenarios, but only on generic drawings. Blagojevic et al. [12] performed extensive evaluations on diagrams from only 6 different domains (4 of which are used in our experiments). Finally, Hammond et al. [23] collected a large amount of samples drawn by researchers and experts of action diagrams, but they did not analyse other scenarios. Unlike these works, we propose a more complete and versatile solution. For completeness, in Fig. 7 the whole step by step sketch processing is reported. Even if it is not a focus of the presented paper, with the aim to provide a complete overview of the working of the proposed system, Fig. 7 shows also the vectorization of the sketch, a characteristic inherited and implemented, within our framework, by the works proposed in [6, 8].
Fig. 7

Step by step processing of a drawn sketch: a user sketch, b object detection, c object merger, d handwriting, e freehand drawing, f handwriting recognition and vectorization, g freehand drawing recognition and vectorization, and h final layout

5 Conclusion

In this paper, a novel learning-based approach to address the domain separation problem is reported. For the first time in the literature, this problem is faced with an ELM supported by a set of original and discriminative features. To prove the effectiveness of the proposed method we have built a new challenging dataset. The dataset has been used to compare the proposed method with that presented in [9], highlighting that the first outperforms the second in terms of accuracy and learning time. Moreover, the dataset has been also used to reproduce all the sketches reported in selected key works of the current state-of-the-art, whose accuracy has been compared with the method we propose, showing that the proposed ELM-based approach can be considered a concrete contribute in this research area. Future work will be focused on using this technology to drive complex interfaces, e.g., graphical queries for advanced Geographic Information Systems (GISs) or graphical commands for Military Strategy Simulations (MSSs). Moreover, we are also planning to adopt this technology to interact with mobile devices equipped with RGB cameras for the interpretation of complex schemes.

Notes

Acknowledgements

This work was supported in part by the MIUR under grant “Departments of Excellence 2018-2022” of the Department of Computer Science of Sapienza University.

References

  1. 1.
    Al Kabary I, Schuldt H (2014) Enhancing sketch-based sport video retrieval by suggesting relevant motion paths. In: Proceedings of the international ACM SIGIR conference on research & development in information retrieval (SIGIR), pp 1227–1230Google Scholar
  2. 2.
    Alvarado C, Davis R (2004) Sketchread: A multi-domain sketch recognition engine. In: Proceedings of the annual ACM symposium on user interface software and technology (UIST), pp 23–32Google Scholar
  3. 3.
    Avola D, Caschera MC, Grifoni P (2006) Solving ambiguities for sketch-based interaction in mobile environments. In: Proceedings of the international conference on the move to meaningful internet systems (OTM), pp 904–915Google Scholar
  4. 4.
    Avola D, Ferri F, Grifoni P (2007a) Formalizing recognition of sketching styles in human centered systems. In: Proceedings of the international conference on knowledge-based intelligent information and engineering systems (KES), pp 369–376Google Scholar
  5. 5.
    Avola D, Ferri F, Grifoni P, Caschera MC (2007b) Ambiguities in sketch-based interfaces. In: Proceedings of the annual Hawaii international conference on system sciences (HICSS), pp 1–10Google Scholar
  6. 6.
    Avola D, Del Buono A, Del Nostro P, Wang R (2009a) A novel online textual/graphical domain separation approach for sketch-based interfaces. In: Proceedings of the international conference on new directions in intelligent interactive multimedia systems and services (KES-IIMSS), pp 167–176Google Scholar
  7. 7.
    Avola D, Del Buono A, Gianforme G, Paolozzi S, Wang R (2009b) Sketchml a representation language for novel sketch recognition approach. In: Proceedings of the international conference on pervasive technologies related to assistive environments (PETRA), pp 1–8Google Scholar
  8. 8.
    Avola D, Cinque L, Placidi G (2013) Sketchspore: A sketch based domain separation and recognition system for interactive interfaces. In: Proceedings of the international conference on image analysis and processing (ICIAP), pp 181–190Google Scholar
  9. 9.
    Avola D, Bernardi M, Cinque L, Foresti GL, Marini MR, Massaroni C (2017) A machine learning approach for the online separation of handwriting from freehand drawing. In: Proceedings of the international conference on image analysis and processing (ICIAP), pp 223–232Google Scholar
  10. 10.
    Bhat A, Hammond T (2009) Using entropy to distinguish shape versus text in hand-drawn diagrams. In: Proceedings of the international jont conference on artifical intelligence (IJCAI), pp 1395–1400Google Scholar
  11. 11.
    Bishop CM, Svensen M, Hinton GE (2004) Distinguishing text from graphics in on-line handwritten ink. In: Proceedings of the international workshop on frontiers in handwriting recognition (ICFHR), pp 142–147Google Scholar
  12. 12.
    Blagojevic R, Plimmer B, Grundy J, Wang Y (2011) Using data mining for digital ink recognition: Dividing text and shapes in sketched diagrams. Comput Graph 35(5):976–991CrossRefGoogle Scholar
  13. 13.
    Boniardi F, Valada A, Burgard W, Tipaldi G (2016) Autonomous indoor robot navigation using a sketch interface for drawing maps and routes. In: Proceedings of the international conference on robotics and automation (ICRA), pp 2896–2901Google Scholar
  14. 14.
    Bucurica M, Dogaru R, Dogaru I (2015) A comparison of extreme learning machine and support vector machine classifiers. In: Proceedings of the international conference on intelligent computer communication and processing (ICCP), pp 471–474Google Scholar
  15. 15.
    Cao J, Zhang K, Luo M, Yin C, Lai X (2016) Extreme learning machine and adaptive sparse representation for image classification. Neural Netw 81:91–102CrossRefGoogle Scholar
  16. 16.
    Costagliola G, Rosa MD, Fuccella V (2014) Local context-based recognition of sketched diagrams. J Vis Lang Comput 25(6):955–962CrossRefGoogle Scholar
  17. 17.
    Dahake D, Sharma RK, Singh H (2017) On segmentation of words from online handwritten gurmukhi sentences. In: Proceedings of the international conference on man and machine interfacing (MAMI), pp 1–6Google Scholar
  18. 18.
    Deufemia V, Risi M, Tortora G (2014) Sketched symbol recognition using latent-dynamic conditional random fields and distance-based clustering. Pattern Recogn 47(3):1159–1171CrossRefGoogle Scholar
  19. 19.
    Ding C, Liu L (2016) A survey of sketch based modeling systems. Front Comp Sci 10(6):985–999CrossRefGoogle Scholar
  20. 20.
    Eglin V, Bres S, Rivero C (2004) Multiscale handwriting characterization for writers’ classification. In: Proceedings of the international conference on document analysis systems (DAS), pp 337–341Google Scholar
  21. 21.
    Eitrich T, Lang B (2005) Parallel tuning of support vector machine learning parameters for large and unbalanced data sets. In: Proceedings of the international conference on computational life sciences (ICCLS), pp 253–264Google Scholar
  22. 22.
    Hammond T, Logsdon D, Peschel J, Johnston J, Taele P, Wolin A, Paulson B (2010) A sketch recognition interface that recognizes hundreds of shapes in course-of-action diagrams. In: Extended abstracts on human factors in computing systems (CHI-EA), pp 4213–4218Google Scholar
  23. 23.
    Hammond TA, Logsdon D, Paulson B, Johnston J, Peschel J, Wolin A, Taele P (2010) A sketch recognition system for recognizing free-hand course of action diagrams. In: Proceedings of the innovative applications of artificial intelligence conference (IAAI), pp 1–6Google Scholar
  24. 24.
    Herold J, Stahovich TF (2014) A machine learning approach to automatic stroke segmentation. Comput Graph 38:357–364CrossRefGoogle Scholar
  25. 25.
    Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: Theory and applications. Neurocomputing 70(1):489–501CrossRefGoogle Scholar
  26. 26.
    Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122CrossRefGoogle Scholar
  27. 27.
    Huang GB, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529CrossRefGoogle Scholar
  28. 28.
    Huang Z, Yu Y, Gu J, Liu H (2017) An efficient method for traffic sign recognition based on extreme learning machine. IEEE Trans Cybern 47(4):920–933CrossRefGoogle Scholar
  29. 29.
    Hussain T, Siniscalchi SM, Lee CC, Wang SS, Tsao Y, Liao WH (2017) Experimental study on extreme learning machine applications for speech enhancement. IEEE Access 5:25542–25554CrossRefGoogle Scholar
  30. 30.
    Jahani-Fariman H, Kavakli M, Boyali A (2018) Matrack: block sparse bayesian learning for a sketch recognition approach. Multimed Tools Appl 77(2):1997–2012CrossRefGoogle Scholar
  31. 31.
    Keysers D, Deselaers T, Rowley HA, Wang L, Carbune V (2017) Multi-language online handwriting recognition. IEEE Trans Pattern Anal Mach Intell 39(6):1180–1194CrossRefGoogle Scholar
  32. 32.
    Kim HH, Taele P, Valentine S, McTigue E, Hammond T (2013) Kimchi: A sketch-based developmental skill classifier to enhance pen-driven educational interfaces for children. In: Proceedings of the international symposium on sketch-based interfaces and modeling (SBIM), pp 33–42Google Scholar
  33. 33.
    Lan Y, Hu Z, Soh YC, Huang GB (2013) An extreme learning machine approach for speaker recognition. Neural Comput Appl 22(3-4):417–425CrossRefGoogle Scholar
  34. 34.
    Landay JA, Myers BA (2001) Sketching interfaces: toward more human interface design. Computer 34(3):56–64CrossRefGoogle Scholar
  35. 35.
    Liu X, Gao C, Li P (2012) A comparative analysis of support vector machines and extreme learning machines. Neural Netw 33:58–66CrossRefGoogle Scholar
  36. 36.
    Lu T, Guan Y, Zhang Y, Qu S, Xiong Z (2018) Robust and efficient face recognition via low-rank supported extreme learning machine. Multimed Tools Appl 77 (9):11219–11240CrossRefGoogle Scholar
  37. 37.
    Machii K, Fukushima H, Nakagawa M (1993) On-line text/drawings segmentation of handwritten patterns. In: Proceedings of the international conference on document analysis and recognition (DAR), pp 710–713Google Scholar
  38. 38.
    Mahdiyah U, Irawan MI, Imah EM (2015) Integrating data selection and extreme learning machine for imbalanced data. Procedia Comput Sci 59:221–229CrossRefGoogle Scholar
  39. 39.
    Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the international conference on machine learning (ICML), pp 807–814Google Scholar
  40. 40.
    Ouyang TY, Davis R (2007) Recognition of hand drawn chemical diagrams. In: Proceedings of the national conference on artificial intelligence (AAAI), pp 846–851Google Scholar
  41. 41.
    Parvez MT, Mahmoud SA (2013) Arabic handwriting recognition using structural and syntactic pattern attributes. Pattern Recogn 46(1):141–154CrossRefGoogle Scholar
  42. 42.
    Patil U, Begum M (2012) Word level handwritten and printed text separation based on shape features. Int J Emerging Technol Adv Eng 2(4):590–594Google Scholar
  43. 43.
    Phan AV, Nguyen ML, Bui LT (2017) Feature weighting and svm parameters optimization based on genetic algorithms for classification problems. Appl Intell 46 (2):455–469CrossRefGoogle Scholar
  44. 44.
    Phang SK, Lai S, Wang F, Lan M, Chen BM (2015) Systems design and implementation with jerk-optimized trajectory generation for uav calligraphy. Mechatronics 30:65–75CrossRefGoogle Scholar
  45. 45.
    Qin S (2005) Intelligent classification of sketch strokes. In: Proceedings of the international conference on computer as a tool (EUROCON), pp 1374–1377Google Scholar
  46. 46.
    Shi LC, Lu BL (2013) Eeg-based vigilance estimation using extreme learning machines. Neurocomputing 102:135–143CrossRefGoogle Scholar
  47. 47.
    Sun P, Chen Y, Lyu X, Wang B, Qu J, Tang Z (2018) A free-sketch recognition method for chemical structural formula. In: Proceedings of the international workshop on document analysis systems (DAS), pp 157–162Google Scholar
  48. 48.
    Suresh S, Babu RV, Kim H (2009) No-reference image quality assessment using modified extreme learning machine classifier. Appl Soft Comput 9(2):541–552CrossRefGoogle Scholar
  49. 49.
    Tang J, Deng C, Huang GB, Zhao B (2015) Compressed-domain ship detection on spaceborne optical image using deep neural network and extreme learning machine. IEEE Trans Geosci Remote Sens 53(3):1174–1185CrossRefGoogle Scholar
  50. 50.
    Unser M, Aldroubi A, Eden M (1993) B-spline signal processing. i. theory. IEEE Trans Signal Process 41(2):821–833CrossRefGoogle Scholar
  51. 51.
    Verma K, Sharma RK (2017) Comparison of hmm- and svm-based stroke classifiers for gurmukhi script. Neural Comput Appl 28(1):51–63CrossRefGoogle Scholar
  52. 52.
    Wadhwa D, Verma K (2012) Online handwriting recognition of hindi numerals using svm. Int J Comput Appl 48(11):590–594Google Scholar
  53. 53.
    Yank E, Sezgin TM (2015) Active learning for sketch recognition. Comput Graph 52:93–105CrossRefGoogle Scholar
  54. 54.
    Zhang XY, Bengio Y, Liu CL (2017) Online and offline handwritten chinese character recognition: A comprehensive study and new benchmark. Pattern Recogn 61:348–360CrossRefGoogle Scholar
  55. 55.
    Zheng X, Miao Q, Shi Z, Fan Y, Shui W (2016) A new artistic information extraction method with multi channels and guided filters for calligraphy works. Multimed Tools Appl 75(14):8719–8744CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceSapienza UniversityRomeItaly
  2. 2.Department of Mathematics, Computer Science and PhysicsUniversity of UdineUdineItaly

Personalised recommendations