Knowledge Discovery in Spatial Data pp 143-221 | Cite as

# Algorithmic Approach to the Identification of Classification Rules or Separation Surface for Spatial Data

## Abstract

As discussed in Chap. 3, naïve Bayes, LDA, logistic regression, and support vector machine are statistical or statistics related models developed for the classification of data. Breaking away from the statistical tradition is a number of classifiers which are algorithmic in nature. Instead of assuming a data model which is essential to the conventional statistical methods, these algorithmic classifiers attempt to work directly on the data without making any assumption about them. It has been regarded by many, particularly in the pattern recognition and artificial intelligence communities, as a more flexible approach to discover how data should be classified. Decision trees (or classification trees in the context of classification), neural networks, genetic algorithms, fuzzy sets, rough sets are typical paradigms. They are in general algorithmic in nature. In place of searching for a separation surface, like the statistical classifiers, some of these methods attempt to discover classification rules that can appropriately partition the feature space with reference to pre-specified classes. A decision tree is a segmentation of a training data set (Quinlan 1986; Friedman 1977). It is built by considering all objects as a single group, with the top node serving as the root of the tree. Training examples are then passed down the tree by splitting each intermediate node with respect to a variable. A decision tree is constructed when a certain stopping criterion is met. Each leaf, terminal, node of the tree contains a decision label, e.g., a class label. The decision tree partitions the feature space into sub-spaces corresponding to the leaves. Specifically, a decision tree that handles classification is known as a classification tree and a decision tree that solves regression problems is called a regression tree (Breiman et al. 1984). A decision tree that deals with both the classification and regression problems is referred to as a classification and regression tree (Breiman et al. 1984). Decision tree algorithms differ mainly in terms of their splitting and pruning strategies. They usually aim at the optimal partitioning of the feature space by minimizing the generalization error. The advantages of the decision tree approach are that it does not need any assumptions about the underlying distribution of the data, and it can handle both discrete and continuous variables. Furthermore, decision trees are easy to construct and interpret if they are of reasonable size and complexity. Their disadvantages are that splitting and pruning rules can be rather subjective. The theory is not as rigorous in terms of the statistical tradition. They also suffer from combinatorial explosion if the number of variables and their value labels are not appropriately controlled. Typical decision tree methods are ID3 (Quinlan 1986), C4.5 (Quinlan 1993), CART (Breiman et al. 1984), CHAID (Kass 1980), QUEST and newer versions, and FACT (Loh and Vanichsetakul 1988).

## References

- Ahlqvist O (2005) Using uncertain conceptual spaces to translate between land cover categories. Int J Geogr Inform Sci 19:831–857CrossRefGoogle Scholar
- Ahlqvist O, Keukelaar J, Oukbir K (2000) Rough classification and accuracy assessment. Int J Geogr Inform Sci 14:475–496CrossRefGoogle Scholar
- Aldridge CH (1998) A theory of empirical spatial knowledge supporting rough set based knowledge discovery in geographical databases. Ph.D. thesis, University of Otago, Dunedin, New ZealandGoogle Scholar
- Amari S (1995) Information geometry of the EM and EM algorithms for neural. Neural Network 8(9):1379–1409CrossRefGoogle Scholar
- Arbib MA (ed) (1995) The handbook of Brain Theory and Neural Networks. MIT, CambridgeGoogle Scholar
- Atkinson PM, Curran PJ (1997) Choosing an appropriate spatial resolution for remote sensing investigations. Photogramm Eng Rem Sens 63(12):1345–1351Google Scholar
- Atkinson PM, Tatnall ARL (1997) Neural networks in remote sensing. Int J Remote Sens 18(4):699–709CrossRefGoogle Scholar
- Benediktsson JA, Swain PH, Ersoy OK (1990) Neural network approaches versus statistical methods in classification of multi-source remote sensing data. IEEE Trans Geosci Rem Sens 28(4):540–552CrossRefGoogle Scholar
- Bischof H, Schneider W, Pinz AJ (1992) Multi-spectral classification of landsat images using neural network. IEEE Trans Geosci Rem Sens 30:482–490CrossRefGoogle Scholar
- Bishop CM (1995a) Neural networks for pattern recognition. Clarendon Press, OxfordGoogle Scholar
- Bittner T, Stell JG (2002) Vagueness and rough location. Geoinformatica 6:99–121CrossRefGoogle Scholar
- Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefGoogle Scholar
- Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, CaliforniaGoogle Scholar
- Bruzzone L, Prieto DF (1999) A technique for the selection of kernel-function parameters in RBF neural networks for classification of remote-sensing images. IEEE Trans Geosci Rem Sens 37(2):551–559CrossRefGoogle Scholar
- Bruzzone L, Prieto DF (2000) Automatic analysis of the difference image for unsupervised change detection. IEEE Trans Geosci Rem Sens 38(3):1171–1182CrossRefGoogle Scholar
- Cao Z, Kandel A, Li L (1990) A new model of fuzzy reasoning. Fuzzy Set Syst 36:311–325CrossRefGoogle Scholar
- Carpenter GA, Grossberg S (1988) The ART of adaptive pattern recognition by a self-organizing neural network. Computer 21:77–88CrossRefGoogle Scholar
- Carpenter GA, Grossberg S, Reynolds JH (1991) ARTMAP: supervised real time learning and classification of nonstationary data by a self-organising neural network. Neural Networks 4:565–588CrossRefGoogle Scholar
- Chen T, Chen H (1995) Approximation capability to functions of several variables, nonlinear functions, and operators by radial basis function neural networks. IEEE Trans Neural Network 6:904–910CrossRefGoogle Scholar
- Chmielewski MR, Grzymala-Busse JW (1996) Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason 15:319–331CrossRefGoogle Scholar
- Civco DL (1993) Artificial neural networks for land cover classification and mapping. Int J Geogr Inform Syst 7:173–186CrossRefGoogle Scholar
- Coren S, Ward L, Enns J (1994) Sensation and perception. Harcourt Brace College Publishers, Fort Worth, TXGoogle Scholar
- Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood estimation from incomplete data via EM algorithm. J Roy Stat Soc B 39:1–38Google Scholar
- Dubois D, Prade H (1980) Fuzzy sets and systems: theory and applications. Academic, OrlandoGoogle Scholar
- Eiumnoh A, Shrestha RP (2000) Application of DEM data to landsat image classification: evaluation in a tropical wet-dry landscape of Thailand. Photogramm Eng Rem Sens 66(3):297–304Google Scholar
- Feldman DS (1993) Fuzzy network synthesis with genetic algorithms. In: Forrest S (ed) Proceedings of the 5th International conference on genetic algorithms. Morgan Kaufmann, San Mateo, CA, pp 312–317Google Scholar
- Ferro CJS, Warner TA (2002) Scale and texture in digital image classification. Photogramm Eng Rem Sens 68(1):51–63Google Scholar
- Fischer MM, Getis A (eds) (1997) Recent developments in spatial analysis: spatial statistics, behavioural modeling, and computational intelligence. Springer, BerlinGoogle Scholar
- Fischer MM, Leung Y (1998) A genetic-algorithms based evolutionary computational neural network for modelling spatial interaction data. Ann Reg Sci 32:295–298CrossRefGoogle Scholar
- Foody GM (1995a) Land cover classification using and artificial neural network with ancillary information. Int J Geogr Inform Syst 9:527–542CrossRefGoogle Scholar
- Friedman JH (1977) A recursive partitioning decision rule for nonparametric classification. IEEE Trans Comput C-26:404–408CrossRefGoogle Scholar
- Fu L (1994) Neural networks in computer intelligence. McGraw-Hill, New YorkGoogle Scholar
- Fukunaga K, Hayes RR (1989) Estimation of classifier performance. IEEE Trans Pattern Anal Mach Intell 11:1087–1101CrossRefGoogle Scholar
- Fung T (2003) Landscape dynamics in the Maipo Ramsar Wetland site. In Roy PS (ed) Geoinformatics for tropical ecosystems. Asian association of remote sensing. Bishen Singh Mahendra Pal Singh, Dehradun, India pp. 539–553Google Scholar
- Fung T, Leung Y, Xu ZB (2007) A vision-based approach to remote sensing image classification (a research project funded by the Hong Kong Research Grants Council)Google Scholar
- Gao Y, Leung Y, Xu ZB (1996) A new genetic algorithm with no genetic operators (unpublished paper)Google Scholar
- Girosi F (1994) Regulation theory, radial basis functions, and networks. In: Cherkassky V, Friedman JH (eds) From statistics to neural networks – Theory and pattern recognition applications. Springer, Germany, pp 166–187Google Scholar
- Goldberg DE (1989) Genetic algorithms in search optimization and machine learning. Addison-Wesley, New YorkGoogle Scholar
- Gomm JB, Yu D (2000) Selecting radial basis function network centers with recursive orthogonal least squares training. IEEE Trans Neural Network 11(2):306–314CrossRefGoogle Scholar
- Gong P (1996) Integrated analysis of spatial data from multiple sources: using evidential reasoning and artificial neural network techniques for geological mapping. Photogramm Eng Rem Sens 62(5):513–523Google Scholar
- Gong P, Pu R, Chen J (1996) Mapping ecological land systems and classification uncertainties from digital elevation and forest-cover data using neural networks. Photogramm Eng Rem Sens 62(11):1249–1260Google Scholar
- Gopal S, Fischer MM (1997) Fuzzy ARTMAP – A neural classifier for multi-spectral image classification. In: Fischer MM, Getis A (eds) Recent developments in spatial analysis. Berlin, Spinger, pp 306–335Google Scholar
- Grossberg S (1976) Adaptive pattern classification and universal recording. I: Parallel development and coding neural feature detectors. Biol Cybern 23:121–134CrossRefGoogle Scholar
- Hand DJ (1986) Recent advances in error rate estimation. Pattern Recogn Lett 4:335–346CrossRefGoogle Scholar
- Heermann PD, Khazenie N (1992) Classification of multi-spectral remote sensing data using a back-propagation neural network. IEEE Trans Geosci Rem Sens 30(1):81–88CrossRefGoogle Scholar
- Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann ArborGoogle Scholar
- Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA 81:3088–3092CrossRefGoogle Scholar
- Hummel R, Moniot R (1989) Reconstructions form zero crossings in scale space. IEEE Trans Acoust Speech Signal Process 37(12):245–295CrossRefGoogle Scholar
- Ishibuchi H, Nozaki K, Yamamoto N, Tanaka H (1995) Selecting fuzzy if-then rules for classification problems using genetic algorithms. IEEE Trans Fuzzy Syst 3(3):260–270CrossRefGoogle Scholar
- Jenson JR (1996) Introductory to digital image processing: a remote sensing perspective. Prentice Hall, Upper Saddle River, NJGoogle Scholar
- Jenson JR, Langari R (1999) Fuzzy logic: intelligence, control and information. Prentice Hall, Upper Saddle River, NJGoogle Scholar
- Ji M (2003) Using fuzzy sets to improve cluster labeling in unsupervised classification. Int J Rem Sens 24:657–671CrossRefGoogle Scholar
- Karr L (1991) Design of an adaptive fuzzy logic controller using a genetic algorithms. In: Belew RK, Booker LB (eds) Proceedings of the 4th International Conference on Genetic Algorithms. Morgan Kaufmann, San Mateo, CA, pp 450–457Google Scholar
- Knoke JD (1986) The robust estimation of classification error rates. Comput Math Appl 12A:253–260CrossRefGoogle Scholar
- Koenderink JJ (1984) The structure of images. Biol Cybern 50:363–370CrossRefGoogle Scholar
- Kohonen T (1988) Self-organization and associative memory. Springer, BerlinGoogle Scholar
- Kosko B (1992) Neural networks and fuzzy systems. Prentice-Hall, Englewood Cliffs, NJGoogle Scholar
- Kryszkiewicz M (2001) Comparative study of alternative types of knowledge reduction in inconsistent systems. Int J Intell Syst 16:105–120CrossRefGoogle Scholar
- Kulkarni AD (1994) Artificial neural networks for image understanding. Van Nostrand Reinhold, New YorkGoogle Scholar
- Leung Y (1982) Approximate characterization of some fundamental concepts of spatial analysis. Geogr Anal Int J Theor Geogr 14:19–40Google Scholar
- Leung Y (1997) Intelligent spatial decision support systems. Springer, BerlinGoogle Scholar
- Leung Y (2001) Neural and evolutionary computation methods for spatial classification and knowledge acquisition. In: Fisher MM, Leung Y (eds) GeoComputational modelling: techniques and applications. Springer, Berlin, pp 71–108Google Scholar
- Leung Y, Leung KS (1993a) An intelligent expert system shell for knowledge-based geographical information systems: 1. the tools. Int J Geogr Inform Syst 7:189–199CrossRefGoogle Scholar
- Leung Y, Li DY (2003) Maximal consistent block technique for rule acquisition in incomplete information systems. Inform Sci 153:85–106CrossRefGoogle Scholar
- Leung Y, Gao Y, Zhang WX (2001b) A genetic-based method for training fuzzy systems. In: Proceedings of the 10th IEEE international conference on fuzzy systems – meeting the ground challenge: machines that serve people, organized by the institute of electrical and electronics engineers. Australia, MelbourneGoogle Scholar
- Leung Y, Leung KS, Yuan XJ (2003c) Discovery of promotion strategies for banking services by classification trees (unpublished paper)Google Scholar
- Leung Y, Luo JC, Zhou CH (2002a) A knowledge-integrated radial basis function model for the classification of multispectral remote sensing images (unpublished paper)Google Scholar
- Leung Y, Ma JH, Zhang WX (2001b) A New method for mining regression classes in Large data sets. IEEE Trans Pattern Anal Mach Intell 23(1):5–21CrossRefGoogle Scholar
- Leung Y, Mei CL, Zhang WX (2000a) Statistical tests for spatial non-stationarity based on geographically weighted regression model. Environ Plann A 32:9–32CrossRefGoogle Scholar
- Leung Y, Wu WZ, Zhang WX (2006a) Knowledge acquisition in incomplete information systems: a rough set approach. Eur J Oper Res 168:164–180CrossRefGoogle Scholar
- Leung Y, Fischer MM, Wu WZ, Mi JS (2008c) A rough set approach for the discovery of classification rules in interval-valued information systems. Int J Approx Reason 47:233–246CrossRefGoogle Scholar
- Leung Y, Fung T, Mi JS, Wu WZ (2007) A rough set approach to the discovery of classification rules in spatial data. Int J Geogr Inform Sci 21:1033–1058CrossRefGoogle Scholar
- Lippmann RP (1994) Neural networks, Bayesion a posteriori probabilities, and pattern classification. In: Cherkassky V, Friedman JH (eds) From statistics to neural networks–- theory and pattern recognition applications. Germany, Springer, pp 83–104Google Scholar
- Loh WY, Vanichsetakul N (1988) Tree-structured classification via generalized discriminant analysis (with discussion). J Am Stat Assoc 83:715–728CrossRefGoogle Scholar
- Luo JC, Leung Y, Zheng J, Ma JH (2004) An elliptical basis function for the classification of remote sensing images. J Geogr Syst 6:219–236CrossRefGoogle Scholar
- Mak M, Kung S (2000) Estimation of elliptical basis function parameters by the EM algorithm with application to speaker verification. IEEE Trans Neural Network 11(4):961–969CrossRefGoogle Scholar
- Mannan B, Roy J, Ray AK (1998) Fuzzy ARTMAP supervised classification of multi-spectral remotely-sensed images. Int J Rem Sens 19(4):767–774CrossRefGoogle Scholar
- Mather PM (1999) Land cover classification revisited. In: Atkinson PM, Tate NJ (eds) Advances in remote sensing and GIS analysis. Wiley, London, pp 7–16Google Scholar
- McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New YorkGoogle Scholar
- McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, LondonGoogle Scholar
- Medsker LR (1994) Hybrid neural network and expert systems. Kluwer, DordrechtGoogle Scholar
- Meng DY, Xu ZB (2006) Visual learning theory (unpublished paper)Google Scholar
- Meng DY, Xu ZB, Leung Y, Fung T (2008) The strong convergence of visual method and its applications in disease diagnosis. Paper presented at the 3rd international conference on pattern recognition in bioinformatics, Melbourne AustraliaGoogle Scholar
- Mi JS, Wu WZ, Zhang WX (2004) Approaches to knowledge reduction based on variable precision rough sets model. Inform Sci 159:255–272CrossRefGoogle Scholar
- Mola F, Siciliano R (1997) A fast splitting procedure for classification trees. Stat Comput 7:208–216CrossRefGoogle Scholar
- Moody J, Darken CJ (1989) Fast learning in network of locally-turned processing units. Neural Comput 1:281–294CrossRefGoogle Scholar
- Murai H, Omatu S (1997) Remote sensing image analysis using a neural network and knowledge-based processing. Int J Rem Sens 18(4):811–828CrossRefGoogle Scholar
- Pao YH (1989) Adaptive pattern recognition and neural networks. Addison-Wesley, Reading, MAGoogle Scholar
- Paola JD, Schowengerdt RA (1995) A review and analysis of back-propagation neural networks for classification of remotely-sensed multi-spectral imagery. Int J Rem Sens 16:3033–3058CrossRefGoogle Scholar
- Park D, Kandel A, Langholz G (1994) Genetic-based new fuzzy reasoning models with application to fuzzy control. IEEE Trans Syst Man Cybern 24(1):39–47CrossRefGoogle Scholar
- Pawlak Z (1982) Rough sets. Int J Inform Comput Sci 11:341–356CrossRefGoogle Scholar
- Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer, BostonGoogle Scholar
- Peddle DR (1995) Knowledge formulation for supervised evidential classification. Photogramm Eng Rem Sens 61(4):409–417Google Scholar
- Pernell C, Themlin J, Renders J, Acheroy M (1995) Optimization of fuzzy expert systems using genetic algorithms and neural networks. IEEE Trans Fuzzy Syst 3(3):300–312CrossRefGoogle Scholar
- Polkowski L, Skowron A (eds) (1998) Rough sets in knowledge discovery 1: methodology and applications, 2: Applications. Physica-Verlag, HeidelbergGoogle Scholar
- Polkowski L, Tsumoto S, Lin TY (2000) Rough set methods and applications. Physica-Verlag, HeidelbergGoogle Scholar
- Powell MJD (1987) Radial basis functions for multivariable interpolation: a review. In: Mason JC, Cox MG (eds) Algorithms for Approximation of Functions and Data. Oxford University Press, Oxford, pp 143–167Google Scholar
- Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106Google Scholar
- Richards JA, Jia XP (1998) Remote sensing digital image analysis: an introduction. Springer, New YorkGoogle Scholar
- Ripley BD (1996) Patter recognition and neural networks. Cambridge University Press, CambridgeGoogle Scholar
- Rosenblatt F (1958) The Perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386–408CrossRefGoogle Scholar
- Rudolph G (1994) Convergence properties of canonical genetic algorithms. IEEE Trans Neural Network 5(1):96–101CrossRefGoogle Scholar
- Scholkopf B, Burges CJC, Smola AJ (1999) Advances in kernel methods: support vector learning. MIT, CambridgeGoogle Scholar
- Serpico SB, Bruzzone L, Roli F (1996) An experimental comparison of neural and statistical non-parametric algorithms for supervised classification of remote-sensing images. Pattern Recogn Lett 17:1331–1341CrossRefGoogle Scholar
- Shafer G (1976) A mathematical theory of evidence. Princeton, Princeton University PressGoogle Scholar
- Skowron A, Rauszer C (1992) The discernibility matrices and functions in information systems. In: Slowinski R (ed) Intelligent decision support-Handbook of applications and advances of the rough sets theory. Kluwer, London, pp 331–362Google Scholar
- Stell JG, Worboys MF (1998) Stratified map spaces: a formal basis for multiresolution spatial databases. In: Poiker TK, Chisman N (eds) SDH’98 proceedings 8th international symposium on spatial data handling. International Geographical Union, pp. 180–189Google Scholar
- Sundararajan N, Saratchandran P, Lu Y (1999) Radial basis function neural networks with sequential learning. World Scientific, SingaporeGoogle Scholar
- Tadjudin S, Landgrebe DA (2000) Robust parameter estimation for mixture model. IEEE Trans Geosci Rem Sens 38(1):439–445CrossRefGoogle Scholar
- Wang F (1991) Integrating GIS’s and remote sensing image analysis systems by unifying knowledge representation schemes. IEEE Trans Geosci Rem Sens 29(4):656–663CrossRefGoogle Scholar
- Wang SL, Li D, Shi WZ, Wang XZ (2002) Geo-rough space. Geo-Spatial Inform Sci 6:54–61Google Scholar
- Wang SL, Wang XZ, Shi WZ (2001) Development of a data mining method for land control. Geo-Spatial Inform Sci 4:68–76Google Scholar
- Wilkinson GG, Folving S, Kanellopoulos I, McCormick N, Fullerton K, Megier J (1995) Forest mapping from multi-source satellite data using neural network classifiers - an experiment in Portugal. Rem Sens Rev 12:83–106Google Scholar
- Witkin AP (1983) Scale space filtering. In: Proceedings of International Joint Conference on Artificial Intelligence, Karlsruhe, pp. 1019–1022Google Scholar
- Worboys MF (1998a) Computation with imprecise geographical data. Comput Environ Urban Syst 22:85–106CrossRefGoogle Scholar
- Wu WZ, Zhang M, Li HZ, Mi JS (2005) Knowledge reduction in random information systems via Dempster-Shafer theory of evidence. Inform Sci 174:143–164CrossRefGoogle Scholar
- Xu ZB, Leung Y, He XW (1994) Asymmetric bidirectional associative memories. IEEE Trans Syst Man Cybern 24:1558–1564CrossRefGoogle Scholar
- Yao X (1999) Evolving artificial neural networks. In: Proceedings of the IEEE 89, IEEE, pp. 1423–1447Google Scholar
- Yasdi R (1996) Combining rough sets learning and neural learning: method to deal with uncertain and imprecise information. Neuralcomputing 7:61–84CrossRefGoogle Scholar
- Zadeh LA (1994) Fuzzy logical and soft computing: issues, contentions and perspectives. In: Proceedings of 3rd international conference on fuzzy logical, neural networks and soft computing. Fuzzy Logic Systems Institute, Japan, pp. 1–2Google Scholar
- Zhang WX, Mi JS, Wu WZ (2003b) Approaches to knowledge reductions in inconsistent systems. Int J Intell Syst 18:989–1000CrossRefGoogle Scholar
- Zhou W (1999) Verification of the non-parametric characteristics of back-propagation neural networks for image classification. IEEE Trans Geosci Rem Sens 37(2):771–779CrossRefGoogle Scholar