Abstract
This chapter describes the fundamental algorithms of artificial vision to automatically recognize the objects of the scene, essential property of all vision systems of living organisms. While a human observer also recognizes complex objects, apparently in an easy and timely manner, in a machine vision system, the recognition process is difficult, requires considerable computing time, and the results are not always optimal. Fundamental to the process of object recognition become the algorithms for selecting and extracting features. Over the years, there have been various disciplinary sectors (machine learning, image analysis, object recognition, information research, bioinformatics, biomedicine, intelligent data analysis, data mining, etc.) and the application sectors (robotics, surveillance, medical, remote sensing, artificial vision, etc.) for which different researchers have proposed different methods of recognition and developed different algorithms based on different classification models. Although the proposed algorithms have a unique purpose, they differ in the property attributed to the classes of objects (the clusters) and the model with which these classes are defined (connectivity, statistical distribution, density estimation, etc.). The topics of this chapter overlap between aspects related to machine learning and those of recognition based on statistical learning methods. For simplicity, the described algorithms are broken down according to the methods of classifying objects in a supervised fashion (based on deterministic, statistical, neural, and nonmetric models such as syntactic models and decision trees) and unsupervised methods, i.e., methods that do not use any prior knowledge related to the true state of nature the patterns belong to.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
From now on, the two words “characteristic” and “feature” will be used interchangeably.
- 2.
Literally, it means automatic data extraction, normally coming from a large population of data.
- 3.
Without loosing generality, this is achieved by expressing both the input variables \( x_i \) and output \( y_i \), in terms of deviations from the mean.
- 4.
Indeed, an effective way to represent the graph of the normal multivariate density function \(N(0,\mathbf {\Sigma })\) is made by curves of level c. In this case, the function is positive and the level curves to be examined concern values of \( c> 0 \) with a positive and invertible covariance matrix. It is shown that the equation of an ellipsoid results to be \(\mathbf {x}^T\mathbf {\Sigma }^{-1}\mathbf {x}=c\) centered in the origin. In the reference system of the principal components, these are expressed in the bases of the eigenvectors of the covariance matrix \(\mathbf {\Sigma }\) and the equation of the ellipsoid becomes \(\frac{y_1^2}{\lambda _1}+\cdots +\frac{y_M^2}{\lambda _M}=c\), with the length of the semi-axes equal to \(\sqrt{\lambda _1},\ldots ,\sqrt{\lambda _M}\), where \(\lambda _{i}\) are the eigenvalues of covariance matrix. For \( M = 2 \), we have elliptic contour lines. If \(\mu \ne 0\), the ellipsoid is centered in \( \mu \).
- 5.
Function of real variables and real values dependent exclusively on the distance from a fixed point, called centroid \( \mathbf {x} _c \). An RBF function is expressed in the form \(\phi :\mathcal{R}^M\rightarrow \mathcal{R}\) such that \(\phi (\mathbf {x})=\phi (|\mathbf {x}-\mathbf {x}_c|)\).
- 6.
Also called the Voronoi diagram (from the name of Georgij Voronoi), it is a particular type of decomposition of a metric space, determined by the distances with respect to a given finite set of space points. For example, in the plane, given a finite set of points S, the Dirichlet tessellation for S is the partition of the plane that associates a region R(p) to each point \( p \in S \), so such that, all points of R(p) are closer to p than to any other point in S.
- 7.
Given two patterns \( \mathbf {x} \) and \( \mathbf {y} \), a measure of similarity \( S (mathbf {x}, \mathbf {y}) \) can be defined as \(\lim _{x \rightarrow y}S(\mathbf {x},\mathbf {y})=0\ \Rightarrow \ \mathbf {x}=\mathbf {y}\).
- 8.
Suppose at the moment we know the number K of groups to search for.
- 9.
The distance between all the observations belonging to the same group with the representative prototype of the group.
- 10.
The final A is added to simplify the pronunciation.
- 11.
The Bayes theorem can be derived from the definition of conditional probability and the total probability theorem. If A and B are two events, the probability of the event A when the event B has already occurred is given by
$$\begin{aligned} p(A|B)=\frac{p(A\cap B)}{p(B)}\ \text {if}\ p(B)>0 \end{aligned}$$and is called conditional probability of A conditioned on B or simply probability of A given B. The denominator p(B) simply normalizes the joint probability p(A, B) of the events that occur together with B. If we consider the space S of the events partitioned into \( B_1, \ldots , B_K \), any event A can be represented as
$$\begin{aligned} A=A\cap S=A\cap (B_1\cup B_2,\ldots ,B_K)=(A\cap B_1)\cup (A\cap B_2),\ldots ,(A\cap B_K). \end{aligned}$$If \( B_1, \ldots , B_K \) are mutually exclusive, we have that
$$\begin{aligned} p(A)=p(A\cap B_1)+,\cdots +p(A\cap B_K) \end{aligned}$$and replacing the conditional probabilities, the total probability of any A event is given by
$$\begin{aligned} p(A)=p(A|B_1)P(B_1)+\cdots +p(A|B_K)p(B_K)=\sum _{k=1}^{K} p(A|B_k)p(B_k) \end{aligned}$$By combining the definitions of conditional probability and the total probability theorem, we obtain the probability of the event \( B_i \), if we suppose that the event A happened, with the following:
$$\begin{aligned} p(B_i|A)=\frac{p(A\cap B_i)}{p(A)}=\frac{p(A|B_i)p(B_i)}{\sum _{k=1}^{K}p(A|B_k)p(B_k)} \end{aligned}$$known as the Bayes Rule or Theorem which represents one of the most important relations in the field of statistics.
- 12.
In the Bayesian statistic, MAP (Maximum A Posteriori) indicates an estimate of an unknown quantity, that equals the mode of the probability posterior distribution. In essence, mode is the value that happens frequently in a distribution (peak value).
- 13.
Implies that the patterns all have the same probability distribution and are all statistically independent.
- 14.
If \(\mathbf {\Sigma }=\mathbf {I}\), where \(\mathbf {I}\) is the identity matrix, the (1.128) becomes the Euclidean distance (norm 2). If \( \mathbf {\Sigma } \) is diagonal, the resulting measure becomes the normalized Euclidean distance given by \(D(\mathbf {x},\varvec{\mu })=\sqrt{\sum _{i=1}^{d}\frac{(x_i-\mu _i)^2}{\sigma _i^2}}\). It should also be pointed out that the Mahalanobis distance can also be defined as a dissimilarity measure between two vector patterns \( \mathbf {x} \) and \( \mathbf {y} \) with the same probability density function and with covariance matrix \(\mathbf {\Sigma }\), defined as \(D(\mathbf {x},\varvec{\mu })=\sqrt{(\mathbf {x}-\mathbf {y})^T\mathbf {\Sigma }^{-1}(\mathbf {x}-\mathbf {y})}\).
- 15.
The whitening transform is always possible and the method used is still based on the eigen decomposition of the covariance matrix \(\mathbf {\Sigma }=\mathbf {\Phi }\mathbf {\Lambda }\mathbf {\Phi }^T\) calculated on the input patterns \(\mathbf {x}\). It is shown that the whitening transformation is given by \(\mathbf {y}=\mathbf {\Phi }\mathbf {\Lambda }^{-1/2}\mathbf {\Phi }^T\mathbf {x}\), which in fact is equivalent to first executing the orthogonal transform \(\mathbf {y}=\mathbf {A}^T\mathbf {x}=\mathbf {\Phi }^T\mathbf {x}\) and then normalizing the result with \(\mathbf {\Lambda }^{-1/2}\). In other words, with the first transformation, we have the principal components and with normalization, the distribution of the data is made symmetrical. The direct transformation (whitening) is \(\mathbf {y}=\mathbf {A}_w\mathbf {x}= \mathbf {\Lambda }^{-1/2}\mathbf {\Phi }^T\mathbf {x}\).
- 16.
In the geometric and mathematical discipline, we define quadric surface a hypersurface of a space d-dimensional on real (or complex) numbers represented by a second-order polynomial equation. The hypersurface can take various forms: hyperplane, hyperellipsoid, hyperspheroid, hypersphere (special case of the hyperspheroid), hyperparaboloid (elliptic or circular), hyperboloid (one or two sheets).
- 17.
If we knew them, we would group all \( \mathbf {x} _i \) based on their \(\mathbf {z}_i\) and we would model each grouping with a single Gaussian.
- 18.
The logarithm of the average is greater than the averages of the logarithms.
- 19.
Cross-validation is a statistical technique that can be used in the presence of an acceptable number of the observed sample (training set). In essence, it is a statistical method to validate a predictive model. Taken a sample of data, it is divided into subsets, some of which are used for the construction of the model (the training sets) and others to be compared with the predictions of the model (the validation set). By mediating the quality of the predictions between the various validation sets, we have a measure of the accuracy of the predictions. In the context of classification, the training set consists of samples of which the class to which they belong is known in advance, ensuring that this set is significant and complete, i.e., with a sufficient number of representative samples of all classes. For the verification of the recognition method, a validation set is used, also consisting of samples whose class is known, used to check the generalization of the results. It consists of a set of samples different from those of the training set.
- 20.
In this context, the bias is seen as a constant that makes the perceptron more flexible. It has a function analogous to the constant b of a linear function \( y = ax + b \) that representing a line geometrically allows to position the line not necessarily passing from the origin (0, 0) . In the context of the perceptron, it allows a more flexible displacement of the line to adapt the prediction with the optimal data.
- 21.
By definition, the scalar or inner product between two vectors \( \mathbf {x} \) and \( \mathbf {w} \) belonging to a vector space \(\mathcal {R}^N\) is a symmetric bilinear form that associates these vectors to a scalar in the real number field \(\mathcal {R}\), indicated in analytic geometry with:
$$\begin{aligned} {<}\mathbf {w}\mathbf {x}{>}=\mathbf {w}\cdot \mathbf {x}=(\mathbf {w},\mathbf {x})=\sum _{i=1}^{N}w_ix_i \end{aligned}$$In matrix notation, considering the product among matrices, where \(\mathbf {w}\) and \(\mathbf {x}\) are seen as matrices \(N\times 1\), the formal scalar product is written
$$\begin{aligned} \mathbf {w}^T\mathbf {x}=\sum _{i=1}^{N}w_ix_i \end{aligned}$$The (convex) angle \(\theta \) between the two vectors in any Euclidean space is given by
$$\begin{aligned} \theta =\arccos \frac{\mathbf {w}^T\mathbf {x}}{|\mathbf {w}||\mathbf {x}|} \end{aligned}$$from which a useful geometric interpretation can be derived, namely to find the orthogonal projection of one vector on the other (without calculating the angle \( \theta \)), for example, considering that \(x_{\mathbf {w}}=|\mathbf {x}|\cos \theta \) is the length of the orthogonal projection of \( \mathbf {x} \) over \( \mathbf {w} \) (or vice versa calculate \(w_{\mathbf {x}}\)), this projection is obtained considering that \(\varvec{w}^T\varvec{x}=|\varvec{w}|\cdot |\varvec{x}|\cos \theta =|\varvec{w}| \cdot x_{\mathbf {w}}\), from which we have
$$\begin{aligned} x_{\mathbf {w}}=\frac{\mathbf {w}^T\mathbf {x}}{|\mathbf {w}|} \end{aligned}$$ - 22.
It should be noted that the optimization approach based on the gradient descent guarantees to find the local minimum of a function. It can also be used to search for a global minimum, randomly choosing a new starting point once a local minimum has been found, and repeating the operation many times. In general, if the number of minimums of the function is limited and the number of attempts is very high, there is a good chance of converging toward the global minimum.
- 23.
In this context, the pattern vectors of the training set \( \mathcal {P} \), besides being augmented (\( x_0 = 1) \), are also normalized, that is, all the patterns belonging to the class \(\omega _2\) are placed with their negative vector:
$$\begin{aligned} \mathbf {x}_j=-\mathbf {x}_j\quad \quad \forall \ \mathbf {x}_{j}\in \omega _2 \end{aligned}$$It follows that a sample is classified incorrectly if:
$$\begin{aligned} \mathbf {w}^T\mathbf {x}_j=\sum _{k=0}^{N}w_{kj}{x}_{kj}<0. \end{aligned}$$ - 24.
The sigmoid or sigmoid curve function (in the shape of an S) is often used as a transfer function in neural networks considering its nonlinearity and easy differentiability. In fact, the derivative is given by
$$\begin{aligned} \frac{d \sigma (x) }{\partial x}=\frac{\text {d} }{\text {d} x}\Biggl [\frac{1}{1+\exp (-x)}\Biggr ]=\sigma (x)(1-\sigma (x)) \end{aligned}$$and is easily implementable.
- 25.
In the fields of machine learning and inverse problems, the regularization consists of the introduction of additional information or regularity conditions in order to solve an ill-conditioned problem or to prevent overfitting.
- 26.
Normally, the logarithm values are given with respect to the base 10 and to the number of Nepero e. With the base change, we can have the logarithms in base 2 of a number x, that is, \(\log _2=\frac{\log _{10} (x)}{\log _{10} (2)}\). The above calculated entropy values are obtained considering that \((1)\log _2 (1)=0\); \(\log _2 (2)=1\); \(\log _2 (1/2)=-1\); and \((1/2)\log _2 (1/2)=(1/2)(-1)=-1/2\).
- 27.
In the context of supervised learning, a learning algorithm uses the training set samples to predict the class to which other samples belong in the test phase, not presented during the learning phase. In other words, it is assumed that the learning model is able to generalize. It can happen instead, especially when the learning is too adapted to the training samples or when there is a limited number of training samples, that the model adapts to characteristics that are specific only to the training set, but that does not have the same capacity prediction (for example, to classify) with the samples of the test phase. We are, therefore, in the presence of overfitting, where performance (ie the ability to adapt/predict) on training data increases, while performance on unseen data will be worse. In general, the problem of overfitting can be limited with cross-validation in statistics or with the early stop in the learning context. Decision trees that are too large are not easily understood and often there is the overfitting known in literature as the violation of Occam’s Razor (philosophical principle) which suggests the futility of formulating more hypotheses than those that are strictly necessary to explain a given phenomenon when the initial ones may be sufficient.
- 28.
Decision trees that predict a categorical variable (i.e., the class to which a pattern vector belongs) are commonly called classification trees, while those that predict continuous-type variables (i.e., real numbers) and not a class are named trees of regression. However, classification trees can describe the attributes of a pattern also in the form of discrete intervals.
- 29.
Introduced by the Italian statistician Corrado Gini in 1912 in Variability and Mutability to represent a measure of inequality of a distribution of a random variable. Indicated with a number from 0 to 1. Low index values indicate a homogeneous distribution, with the value 0 corresponding to the pure equitable distribution, while high index values indicate a very unequal distribution.
- 30.
In the theory of computational complexity, the different problem solutions are grouped into two classes: the problems P and NP. The first includes all those decision problems that can be solved with a computational load (deterministic Turing machine) in a time that is polynomial with respect to the size of the problem, that is, they admit algorithms that in the worst case scenario is polynomial and includes all treatable problems. The second class is one of the problems that cannot be solved in polynomial time. For this last class, it can be verified that every resolving algorithm would require an exponential calculation time (in the worst case) or however asymptotically superior to the polynomial one. In the latter case, we have problems also called intractable in terms of calculation time. The class of NP-complete problems are instead the most difficult problems of the class NP nondeterministic in polynomial times. In fact, if you find an algorithm that solves any NP-complete problem in a reasonable time (i.e., in polynomial times), then it could be used to reasonably solve every NP problem. The theory of complexity has not yet given answers if the class NP is more general than the class P or if both coincide.
- 31.
In graph theory, having a spanning tree with weighted arcs, it is also possible to define the minimum spanning tree (MST), that is, a spanning tree so that adding the weights of the arcs, a minimum value is obtained.
- 32.
The characters of “*” and “\( \circ \)” have the following meaning. If \( \mathcal{V} \) is an alphabet that defines strings or words as a sequence of characters (symbols) of \( \mathcal{V} \), the set of all strings defined on the alphabet \( \mathcal{V} \) (including the empty string) is normally denoted by \( \mathcal{V} ^ * \). The string 110011001 is a string of length 9 defined on the alphabet \(\{0,1\}\) and therefore belongs to \(\{0,1\}^*\). The symbol “\(\circ \)” instead defines the concatenation or product operation given by \(\circ :\mathcal{V}^*\times \mathcal{V}^*\rightarrow \mathcal{V}^*\), which consists in juxtaposing two words of \(\mathcal{V}^*\). This operation is not commutative but only associative (for example, \(mono \circ block = monoblock\) and \(abb\circ bba=abbbba\ne bba\circ abb=bbaabb)\)). It should also be noted that an empty string x consists of 0 symbols, therefore of length \(|x|=0\), normally denoted also with the neutral symbol \(\epsilon \). It follows that \(x\circ \epsilon =\epsilon \circ x=x,\ \forall x\in \mathcal{V}^*\); and besides \(|\epsilon |=0\). It can be shown that given an alphabet \(\mathcal{V}\), the triad \(<\mathcal{V}^*,\circ ,\epsilon>\) is a monoid, that is a closed set with respect to the concatenation operator “\(\circ \)” and for which \(\epsilon \) is the neutral element. It is called syntactic monoid defined on \(\mathcal{V}\) because it is the basis of the syntactic definition of languages. The set of non-empty strings are indicated with \(\mathcal{V}^+\) and it follows that \(\mathcal{V}^*=\mathcal{V}^{+}\bigcup \{\epsilon \}\).
- 33.
We define isomorphism between two complex structures when trying to evaluate the correspondence between the two structures or a similarity level between their structural elements. In mathematical terms, isomorphism is a f bijective application between two sets consisting of similar structures belonging to the same structural nature such that both f and its inverse preserve the same structural characteristics.
- 34.
Prefix and suffix string formalism. A string \(\mathbf {y}\in \mathcal{V}\) is a substring of \(\mathbf {x}\) if there are two strings \(\alpha \) and \(\beta \) on the alphabet \(\mathcal{V}\) such that \(\mathbf {x}=\alpha \circ \mathbf {y}\circ \beta \) (concatenated strings) and we can say that \(\mathbf {y}\) occurs in \(\mathbf {x}\). It follows that the string \(\alpha \) is a prefix of \(\mathbf {x}\) (and is denoted by \(\alpha \subset \mathbf {x}\)), i.e., it corresponds to the initial characters of \(\mathbf {x}\). Similarly, it is defined that \(\beta \) is a suffix of \(\mathbf {x}\) (and is denoted by \(\beta \supset \mathbf {x}\)) and coincides with the final characters of \(\mathbf {x}\). In this context, a good suffix is defined as the substring suffix of the pattern \(\mathbf {x}\) that occurs in the text T for a given value of the shift s starting from the character \(j + 1 + s\). It should be noted that the relations \(\subset \) and \(\supset \) enjoy the transitive property.
- 35.
Let \(\alpha \) and \(\beta \) be two strings, we define a similarity relation \(\alpha \sim \beta \) (we read \(\alpha \) is similar to \(\beta \)), with the meaning that \(\alpha \supset \beta \) (where we recall that the symbol \(\supset \) has the meaning of suffix). It follows that, if two strings are similar, we can align them with their identical characters further to the right, and no pair of aligned characters will be discordant. The similarity relation \(\sim \) is symmetric, that is, \(\alpha \sim \beta \) if and only if \(\alpha \sim \beta \). It is also shown that the following implication is had:
$$\begin{aligned} \alpha \supset \beta \quad \text {and}\quad \mathbf {y}\supset \beta \Longrightarrow \alpha \sim \mathbf {y}. \end{aligned}$$
References
R.B. Cattell, The description of personality: basic traits resolved into clusters. J. Abnorm. Soc. Psychol. 38, 476–506 (1943)
R.C. Tryon, Cluster Analysis: Correlation Profile and Orthometric (Factor) Analysis for the Isolation of Unities in Mind and Personality (Edward Brothers Inc., Ann Arbor, Michigan, 1939)
K. Pearson, On lines and planes of closest fit to systems of points in space. Philos. Mag. 2(11), 559–572 (1901)
H. Hotelling, Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441 and 498–520 (1933)
W. Rudin, Real and Complex Analysis (Mladinska Knjiga McGraw-Hill, 1970). ISBN 0-07-054234-1
R. Larsen, R.T. Warne, Estimating confidence intervals for eigenvalues in exploratory factor analysis. Behav. Res. Methods 42, 871–876 (2010)
M. Friedman, A. Kandel, Introduction to Pattern Recognition: Statistical, Structural, Neural and Fuzzy Logic Approaches (World Scientific Publishing Co Pte Ltd, 1999)
R.A. Fisher, The statistical utilization of multiple measurements. Ann Eugen 8, 376–386 (1938)
K. Fukunaga, J.M. Mantock, Nonparametric discriminant analysis. IEEE Trans. Pattern Anal. Mach. Intell. 5(6), 671–678 (1983)
T. Okada, S. Tomita, An optimal orthonormal system for discriminant analysis. Pattern Recognit. 18, 139–144 (1985)
J.-S.R. Jang, C.-T. Sun, E. Mizutani, Neuro-fuzzy and Soft Computing (Prentice Hall, 1997)
J. MacQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the Fifth Berkeley Symposium on Mathematical statistics and Probability, vol. 1, ed. by L.M. LeCam, J. Neyman (University of California Press, 1977), pp. 282–297
G.H. Ball, D.J. Hall, Isodata: a method of data analysis and pattern classification. Technical report, Stanford Research Institute, Menlo Park, United States. Office of Naval Research. Information Sciences Branch (1965)
J.R. Jensen, Introductory Digital Image Processing: A Remote Sensing Perspective, 2nd edn. (Prentice Hall, Upper Saddle River, NJ, 1996)
L.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms (Plenum Press, New York, 1981)
C.K. Chow, On optimum recognition error and reject tradeoff. IEEE Trans. Inf. Theory 16, 41–46 (1970)
A.R. Webb, K.D. Copsey, Statistical Pattern Recognition, 3rd edn. (Prentice Hall, Upper Saddle River, NJ, 2011). ISBN 978-0-470-68227-2
R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, 2nd edn. (Wiley, 2001). ISBN 0471056693
K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edn. (Academic Press Professional, Inc., 1990). ISBN 978-0-470-68227-2
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)
W. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943)
H. Robbins, S. Monro, A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
R. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)
J.J. Hopfield, Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci 79, 2554–2558 (1982)
J.R. Quinlan, Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
J.R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, San Mateo, CA, 1993)
L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and Regression Trees (Wadsworth Books, 1984)
X. Lim, W.Y. Loh, X. Shih, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40, 203–228 (2000)
P.E. Utgoff, Incremental induction of decision trees. Mach. Learn. 4, 161–186 (1989)
J.R. Quinlan, R.L. Rivest, Inferring decision trees using the minimum description length principle. Inf. Comput. 80, 227–248 (1989)
J.R. Quinlan, Simplifying decision trees. Int. J. Man-Mach. Stud. 27, 221–234 (1987)
L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis (Wiley, 2009)
T. Zhang, R. Ramakrishnan, M. Livny, Birch: an efficient data clustering method for very large databases, in Proceedings of SIGMOD’96 (1996)
S. Guha, R. Rastogi, K. Shim, Rock: a robust clustering algorithm for categorical attributes, in Proceedings in ICDE’99 Sydney, Australia (1999), pp. 512–521
G. Karypis, E.-H. Han, V. Kumar, Chameleon: a hierarchical clustering algorithm using dynamic modeling. Computer 32, 68–75 (1999)
K.S. Fu, Syntactic Pattern Recognition and Applications (Prentice-Hall, Englewood Cliffs, NJ, 1982)
N. Chomsky, Three models for the description of language. IRE Trans. Inf. Theory 2, 113–124 (1956)
H. J. Zimmermann, B.R. Gaines, L.A. Zadeh, Fuzzy Sets and Decision Analysis (North Holland, Amsterdam, New York, 1984). ISBN 0444865934
Donald Ervin Knuth, On the translation of languages from left to right. Inf. Control 8(6), 607–639 (1965)
D. Marcus, Graph Theory: A Problem Oriented Approach, 1st edn. (The Mathematical Association of America, 2008). ISBN 0883857537
D.H. Ballard, C.M. Brown, Computer Vision (Prentice Hall, 1982). ISBN 978-0131653160
A. Barrero, Three models for the description of language. Pattern Recognit. 24(1), 1–8 (1991)
R.E. Woods, R.C. Gonzalez, Digital Image Processing, 2nd edn. (Prentice Hall, 2002). ISBN 0201180758
P.H. Winston, Artificial Intelligence (Addison-Wesley, 1984). ISBN 0201082594
D.E. Knuth, J.H. Morris, V.B. Pratt, Fast pattern matching in strings. SIAM J. Comput. 6(1), 323–350 (1977)
R.S. Boyer, J.S. Moore, A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)
Hume and Sunday, Fast string searching. Softw. Pract. Exp. 21(11), 1221–1248 (1991)
T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms (MIT Press and McGraw-Hill, 2001). ISBN 0-262-03293-7
R. Nigel Horspool, Practical fast searching in strings. Softw. Pract. Exp., 10(6), 501–506 (1980)
D.M. Sunday, A very fast substring search algorithm. Commun. ACM 33(8), 132–142 (1990)
N. Gonzalo, A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
P. Clifford, R. Clifford, Simple deterministic wildcard matching. Inf. Process. Lett. 101(2), 53–54 (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Distante, A., Distante, C. (2020). Object Recognition. In: Handbook of Image Processing and Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-030-42378-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-42378-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-42377-3
Online ISBN: 978-3-030-42378-0
eBook Packages: Computer ScienceComputer Science (R0)