Skip to main content

Comparing Attributes by a Probabilistic and Statistical Association II

  • Chapter
  • First Online:
  • 1609 Accesses

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

Abstract

The data is defined by the observation of a set \(\mathcal {A}\) of descriptive attributes on a set \(\mathcal {O}\) of elementary objects. As indicated in the introduction of the preceding chapter (see Sect. 5.1 of Chap. 5) \(\mathcal {A}\) is constituted of attributes of a same type belonging to the general type II (see Sect. 3.3 of Chap. 3). To fix ideas in this introduction, we may imagine \(\mathcal {A}\) as composed of nominal categorical attributes. The different comparison cases are listed at the beginning of the following Section (see Sect. 6.2). For this comparison, as expressed in the introductive Sect. 5.1 of Chap. 5, the LLA approach will be emphasized. It leads, in a unified process, to a very rich family of probabilistic association coefficients between descriptive attributes of any type. On the other hand, the principle of this method enables several association coefficients to be mutually compared.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Albatineh, A.N., Niewiadomska-Bugaj, M.: Correcting jaccard and other similarity indices for chance agreement in cluster analysis. Adv. Data Anal. Class. 5, 179–200 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  2. Albatineh, A.N., Niewiadomska-Bugaj, M., Mihalko, D.: On similarity indices and correction for chance agreement. J. Class. 23, 301–313 (2006)

    Article  MathSciNet  Google Scholar 

  3. Booth, H.S., Maindonald, J.H., Wilson, S.R., Gready, J.E.: An efficient z-score algorithm for assessing sequence alignments. J. Comput. Biol. 11(4), 616–625 (2004)

    Article  Google Scholar 

  4. Cramer, H.: The Elements of Probability Theory and Some of Its Applications. Wiley, New York (1946)

    Google Scholar 

  5. Daniels, H.E.: The relation between measures of correlation in the universe of sample permutations. Biometrika 33, 129–135 (1944)

    Article  MathSciNet  MATH  Google Scholar 

  6. Daudé, F.: Analyse et justification de la notion de ressemblance entre variables qualitatives dans l’optique de la classification hiérarchique par \(AVL\). Ph.D. thesis, Université de Rennes 1, June 1992

    Google Scholar 

  7. Davis, J.A.: A partial coefficient for goodman and Kruskal’s gamma. J. Am. Stat. Assoc. 62(317), 189–193 (1967)

    Article  Google Scholar 

  8. Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553–569 (1983)

    Article  MATH  Google Scholar 

  9. Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications. J. Am. Stat. Assoc. 49, 732–764 (1954)

    MATH  Google Scholar 

  10. Haigh, J.: A neat way to prove asymptotic normality. Biometrika 3, 677–678 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  11. Hajek, J.: Some extensions of the Wald-Wolfowitz-Noether theorem. Ann. Math. Stat. 32, 506–523 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  12. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    Article  MATH  Google Scholar 

  13. Hubert, L.J.: Inference procedures for the evaluation and comparison of proximity matrices. In: Felsenstein, J. (ed.) Numerical Taxonomy. Springer, Berlin (1983)

    Google Scholar 

  14. Hubert, L.J.: Combinatorial data analysis: association and partial association. Psychometrika 50(4), 449–467 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  15. Hubert, L.J.: Assignment methods in combinatorial data analysis. Numerical Taxonomy. Marcel Dekker, New York (1987)

    Google Scholar 

  16. Hulsen, T., de Vlieg, J., Leunissen, J., Groenen, P.: Testing statistical significance with structure similarity. BMC Bioinf. 7(444), 1 (2006). Online

    Google Scholar 

  17. Kendall, M.G.: Rank correlation methods. Charles Griffin, London (1970). First edition in 1948

    Google Scholar 

  18. Lecalvé, G.: Un indice de similarité pour des variables de types quelconques. Statistique et Analyse des Données 01–02, 39–47 (1976)

    Google Scholar 

  19. Lerman, I.C.: Étude distributionnelle de statistiques de proximité entre structures finies de même type; application à la classification automatique. Cahiers du Bureau Universitaire de Recherche Opérationnelle 19 1–52 (1973)

    Google Scholar 

  20. Lerman, I.C.: Formal analysis of a general notion of proximity between variables. In: Barra, J.R., et al. (eds.) Recent Developments in Statistics, pp. 787–795. North-Holland, New York (1977)

    Google Scholar 

  21. Lerman, I.C.: Classification et analyse ordinale des données. Dunod and http://www.brclasssoc.org.uk/books/index.html (1981)

  22. Lerman, I.C.: Indices d’association partielle entre variables qualitatives nominales. RAIRO série verte 17(3), 213–259 (1983)

    Google Scholar 

  23. Lerman, I.C.: Indices d’association partielle entre variables qualitatives ordinales. Publications Institut de Statistique des Universités de Paris, (XXVIII, 1,2), 7–46 (1983)

    Google Scholar 

  24. Lerman, I.C.: Justification et validité d’une échelle \([0, 1]\) de fréquence mathématique pour une structure de proximité sur un ensemble de variables observées. Publications de l’Institut de Statistique des Universités de Paris 29, 27–57 (1984)

    MathSciNet  MATH  Google Scholar 

  25. Lerman, I.C.: Maximisation de l’association entre deux variables qualitatives ordinales. Mathématiques et Sciences Humaines 100, 49–56 (1987)

    MathSciNet  MATH  Google Scholar 

  26. Lerman, I.C.: Comparing partitions (mathematical and statistical aspects). In: Bock, H.H. (ed.) Classification and Related Methods of Data Analysis, pp. 121–131. North-Holland, Amsterdam (1988)

    Google Scholar 

  27. Lerman, I.C.: Conception et analyse de la forme limite d ’ une famille de coefficients statistiques d ’ association entre variables relationnelles, i. Revue Mathématique Informatique et Sciences Humaines 118, 35–52 (1992)

    Google Scholar 

  28. Lerman, I.C.: Conception et analyse de la forme limite d ’ une famille de coefficients statistiques d ’ association entre variables relationnelles, ii. Revue Mathématique Informatique et Sciences Humaines 119, 75–100 (1992)

    Google Scholar 

  29. Lerman, I.C.: Comparing classification tree structures: a special case of comparing q-ary relations. RAIRO-Oper. Res. 33, 339–365 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  30. Lerman, I.C.: Comparing taxonomic data. Revue Mathématiques et Sciences Humaines 150, 37–51 (2000)

    Google Scholar 

  31. Lerman, I.C., Peter, P.: Structure maximale pour la somme des carrés d’une contingence aux marges fixées; une solution algorithmique programmée. Revue française d’automatique, d’informatique et de recherche opérationnelle 22(2), 83–136 (1988)

    MathSciNet  Google Scholar 

  32. Lerman, I.C., Peter, P., Risler, J.L.: Matrices AVL pour la classification et l’alignement de séquences protéiques. Research Report 2466, IRISA-INRIA, September 1994

    Google Scholar 

  33. Lerman, I.C., Rouxel, F.: Comparing classification tree structures: a special case of comparing q-ary relations ii. RAIRO-Oper. Res. 34, 251–281 (2000)

    Google Scholar 

  34. Mantel, N.: Detection of disease clustering and a generalized approach. Cancer Res. 27(2), 209–220 (1967)

    Google Scholar 

  35. Messatfa, H.: An algorithm to maximize the agreement between partitions. J. Classif. 9(1), 5–15 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  36. Mielke, P.W.: On asymptotic non-normality of null distributions of MRPP statistics. In: Communications in Statistics, Theory and Methods, pp. A8:1541–1550 (1979)

    Google Scholar 

  37. Monjardet, B.: Concordance between two linear orders: The Spearman and Kendall coefficients revisited. J. Classif. 14, 269–295 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  38. Motoo, M.: On the Hoeffding’s combinatorial central limit theorem. Ann. Inst. Stat. Math. 8, 145–154 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  39. Noether, G.: On a theorem by Wald and Wolfowitz. Ann. Math. Stat. 20, 455–458 (1949)

    Article  MathSciNet  MATH  Google Scholar 

  40. Ouali-Allah, M.: Analyse en préordonnance des données qualitatives. Application aux données numériques et symboliques. Ph.D. thesis, Université de Rennes 1, Decembre 1991

    Google Scholar 

  41. Pinto Da Costa, J.F., Roque, L.A.C.: Limit distribution for the weighted rank correlation coefficient, \(r_{W}\). REVSTAT - Stat. J. 3, 189–200 (2006)

    MathSciNet  MATH  Google Scholar 

  42. Somers, R.H.: Analysis of partial rank correlation measures based on the product-moment model: Part one. Social Forces 53(2), 229–246 (1974)

    Article  Google Scholar 

  43. Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)

    Article  Google Scholar 

  44. Steinley, D., Hendrickson, G., Brusco, M.J.: A note on maximizing the agreement between partitions: a stepwise optimal algorithm and some properties. J. Classif. 32, 114–126 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  45. Tshuprow, A.A.: Principles of the Mathematical Theory of Correlation (trans: Kantorowitsch, M). W. Hodge and Co, London (1939)

    Google Scholar 

  46. Villoing, P.: Classification ascendante hiérarchique et indices de similarité sur données qualitatives nominales selon l’algorithme de la vraisemblance de la vraisemblance du lien. Ph.D. thesis, Université de Rennes 1, December 1980

    Google Scholar 

  47. Wald, A., Wolfowitz, J.: Statistical tests based on permutations of the observations. Ann. Math. Stat. 15, 358–372 (1944)

    Article  MathSciNet  MATH  Google Scholar 

  48. Wilson, E.B., Hilferty, MM: The distribution of chi-square. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 17, pp. 684–688 (1931)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Israël César Lerman .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag London

About this chapter

Cite this chapter

Lerman, I.C. (2016). Comparing Attributes by a Probabilistic and Statistical Association II. In: Foundations and Methods in Combinatorial and Statistical Data Analysis and Clustering. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-4471-6793-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-6793-8_6

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-6791-4

  • Online ISBN: 978-1-4471-6793-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics