Journal of Computer Science and Technology

, Volume 33, Issue 4, pp 807–822 | Cite as

Hierarchical Clustering of Complex Symbolic Data and Application for Emitter Identification

  • Xin XuEmail author
  • Jiaheng Lu
  • Wei Wang
Regular Paper


It is well-known that the values of symbolic variables may take various forms such as an interval, a set of stochastic measurements of some underlying patterns or qualitative multi-values and so on. However, the majority of existing work in symbolic data analysis still focuses on interval values. Although some pioneering work in stochastic pattern based symbolic data and mixture of symbolic variables has been explored, it still lacks flexibility and computation efficiency to make full use of the distinctive individual symbolic variables. Therefore, we bring forward a novel hierarchical clustering method with weighted general Jaccard distance and effective global pruning strategy for complex symbolic data and apply it to emitter identification. Extensive experiments indicate that our method has outperformed its peers in both computational efficiency and emitter identification accuracy.


symbolic data analysis stochastic pattern fuzzy interval hierarchical clustering emitter identification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2018_1857_MOESM1_ESM.pdf (591 kb)
ESM 1 (PDF 590 kb)


  1. [1]
    Noirhomme-Fraiture M, Brito P. Far beyond the classical data models: Symbolic data analysis. Statistical Analysis and Data Mining, 2011, 4(2): 157-170.Google Scholar
  2. [2]
    Xu X, Lu J H, Wang W. Incremental hierarchical clustering of stochastic pattern based symbolic data. In Advances in Knowledge Discovery and Data Mining, Bailey J, Khan L, Washio T et al. (eds.), Springer, 2016, pp.156-167.Google Scholar
  3. [3]
    Yu X C, He H, Hu D, Zhou W. Land cover classification of remote sensing imagery based on interval-valued data fuzzy c-means algorithm. Science China Earth Science, 2014, 57(6): 1306-1313.Google Scholar
  4. [4]
    Lauro C, Verde R, Irpino A. Generalized canonical analysis In Symbolic Data Analysis and the SODAS Software, Diday E, Noirhomme-Fraiture M (eds.), Wiley-Interscience, 2008, pp.313-330.Google Scholar
  5. [5]
    de Carvalho de A T F, de Souza R M C R. Unsupervised pattern recognition models for mixed feature-type symbolic data. Pattern Recognition Letters, 2010, 31(5): 430-443.Google Scholar
  6. [6]
    Rasson J P, Pircon J Y, Lallemand P, Adans S. Unsupervised divisive classification. In Symbolic Data Analysis and the SODAS Software, Diday E, Noirhomme-Fraiture M (eds.), Wiley Interscience, 2008, pp.149-156.Google Scholar
  7. [7]
    Neto L, de Carvalho F de A T. Constrained linear regression models for symbolic interval-valued variables. Computational Statistics & Data Analysis, 2010, 54(2): 333-347.Google Scholar
  8. [8]
    Arroyo J, González-Rivera G, Maté C. Forecasting with interval and histogram data. Some financial applications. In Handbook of Empirical Economics and Finance, Ullah A, Giles D (eds.), Chapman and Hall/CRC, 2010, pp.247-279.Google Scholar
  9. [9]
    Xu X. A novel hierarchical clustering framework for complex symbolic data exploration. In Proc. the 32nd IEEE International Conference on Data Engineering Workshops, May 2016, pp.189-192.Google Scholar
  10. [10]
    Diday E. The symbolic approach in clustering and related methods of data analysis: The basic choices. In Proc. the 1st Conference of the International Federation of Classification Societies (IFCS), Bock H H (ed.), North Holland, 1988, pp.673-684.Google Scholar
  11. [11]
    Diday E. Introduction à l′ approche symbolique en analyse des données. Recherche opérationnelle/Operations Research, 1989, 23(2): 193-236. (in French)Google Scholar
  12. [12]
    Diday E, Noirhomme-Fraiture M. Symbolic Data Analysis and the SODAS Software. Wiley Interscience, 2008Google Scholar
  13. [13]
    Bock H H, Diday E. Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, 2000.Google Scholar
  14. [14]
    Billard L. Sample covariance functions for complex quantitative data. In Proc. the Joint Meeting of the 4th World Conference of the IASC and the 6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis, December 2008, pp.157-163.Google Scholar
  15. [15]
    Lin C M, Chen Y M, Hsueh C S. A self-organizing interval type-2 fuzzy neural network for radar emitter identification. International Journal of Fuzzy Systems, 2014, 16(1): 20-30.Google Scholar
  16. [16]
    González-Rivera G, Arroyo J. Time series modeling of histogram-valued data: The daily histogram time series of S&P500 intradaily returns. International Journal of Forecasting, 2012, 28(1): 20-33.Google Scholar
  17. [17]
    Kaytoue M, Kuznetsov S O, Napoli A. Revisiting numerical pattern mining with formal concept analysis. In Proc. the 22nd International Joint Conference on Artificial Intelligence, July 2011, pp.1342-1347.Google Scholar
  18. [18]
    Jaccard P. The distribution of the flora in the alpine zone. The New Phytologist, 1912, 11(2): 37-50.Google Scholar
  19. [19]
    Tan P N, Steinbach M, Kumar V. Introduction to Data Mining (1st edition). Pearson, 2005.Google Scholar
  20. [20]
    Wang L, Cheung W L D, Cheng R, Lee S D, Yang X S. Efficient mining of frequent item sets on large uncertain databases. IEEE Transactions on Knowledge & Data Engineering, 2012, 24(12): 2170-2183.Google Scholar
  21. [21]
    Tong Y X, Chen L, Cheng Y, Yu P S. Mining frequent itemsets over uncertain databases. Proceeding of the VLDB Endowment, 2012, 5(11): 1650-1661.Google Scholar
  22. [22]
    Singh S K, Wayal G, Sharma N. A review: Data mining with fuzzy association rule mining. International Journal of Engineering Research & Technology, 2012, 1(5): 1-4.Google Scholar
  23. [23]
    Prabha K S, Lawrance R. Mining fuzzy frequent item set using compact frequent pattern (CFP) tree algorithm. Data Mining and Knowledge Engineering 2012, 4(7): 365-369.Google Scholar
  24. [24]
    Johnson S C. Hierarchical clustering schemes. Psychometrika, 1967, 32(3): 241-254.Google Scholar
  25. [25]
    Karypis G, Han E H, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Computer, 1999, 32(8): 68-75.Google Scholar
  26. [26]
    Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M. Algorithms for processing K-closest-pair queries in spatial databases. Data & Knowledge Engineering, 2004, 49 (1): 67-104.Google Scholar
  27. [27]
    Guttman A. R-trees: A dynamic index structure for spatial searching. In Proc. the 1984 ACM SIGMOD International Conference on Management of Data, June 1984, pp.47-57.Google Scholar
  28. [28]
    Ibaraki T. Annals of Operations Research. Springer Verlag, 1987.Google Scholar
  29. [29]
    Xiao C, Wang W, Lin X M, Yu J X, Wang G R. Efficient similarity joins for near-duplicate detection. ACM Transactions on Database Systems, 2011, 36(3): Article No. 15.Google Scholar
  30. [30]
    Sun T Y, Shu C C, Li F, Yu H Y, Ma L L, Fang Y T. An efficient hierarchical clustering method for large datasets with MapReduce. In Proc. the International Conference on Parallel and Distributed Computing, Applications and Technologies, December 2009, pp.494-499.Google Scholar
  31. [31]
    Bruynooghe M. Recent results in hierarchical clustering: I-the reducible neighborhoods clustering algorithm. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7(3): 541-571.Google Scholar
  32. [32]
    Siegfried K. Multivariate tests based on pairwise distance or similarity measures. In Proc. the 6th Conference on Multivariate Distributions with Fixed Marginals, June 2007Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Laboratory of Science and Technology on Information System Engineering, Nanjing Research Institute of Electronics EngineeringNanjingChina
  2. 2.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland
  3. 3.State Key Laboratory for Novel Software and TechnologyNanjing UniversityNanjingChina

Personalised recommendations