Comparing Attributes by a Probabilistic and Statistical Association II

Lerman, Israël César

doi:10.1007/978-1-4471-6793-8_6

Comparing Attributes by a Probabilistic and Statistical Association II

Israël César Lerman¹³

Chapter
First Online: 25 March 2016

1609 Accesses

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

Abstract

The data is defined by the observation of a set \(\mathcal {A}\) of descriptive attributes on a set \(\mathcal {O}\) of elementary objects. As indicated in the introduction of the preceding chapter (see Sect. 5.1 of Chap. 5) \(\mathcal {A}\) is constituted of attributes of a same type belonging to the general type II (see Sect. 3.3 of Chap. 3). To fix ideas in this introduction, we may imagine \(\mathcal {A}\) as composed of nominal categorical attributes. The different comparison cases are listed at the beginning of the following Section (see Sect. 6.2). For this comparison, as expressed in the introductive Sect. 5.1 of Chap. 5, the LLA approach will be emphasized. It leads, in a unified process, to a very rich family of probabilistic association coefficients between descriptive attributes of any type. On the other hand, the principle of this method enables several association coefficients to be mutually compared.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Albatineh, A.N., Niewiadomska-Bugaj, M.: Correcting jaccard and other similarity indices for chance agreement in cluster analysis. Adv. Data Anal. Class. 5, 179–200 (2011)
Article MathSciNet MATH Google Scholar
Albatineh, A.N., Niewiadomska-Bugaj, M., Mihalko, D.: On similarity indices and correction for chance agreement. J. Class. 23, 301–313 (2006)
Article MathSciNet Google Scholar
Booth, H.S., Maindonald, J.H., Wilson, S.R., Gready, J.E.: An efficient z-score algorithm for assessing sequence alignments. J. Comput. Biol. 11(4), 616–625 (2004)
Article Google Scholar
Cramer, H.: The Elements of Probability Theory and Some of Its Applications. Wiley, New York (1946)
Google Scholar
Daniels, H.E.: The relation between measures of correlation in the universe of sample permutations. Biometrika 33, 129–135 (1944)
Article MathSciNet MATH Google Scholar
Daudé, F.: Analyse et justification de la notion de ressemblance entre variables qualitatives dans l’optique de la classification hiérarchique par \(AVL\). Ph.D. thesis, Université de Rennes 1, June 1992
Google Scholar
Davis, J.A.: A partial coefficient for goodman and Kruskal’s gamma. J. Am. Stat. Assoc. 62(317), 189–193 (1967)
Article Google Scholar
Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553–569 (1983)
Article MATH Google Scholar
Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications. J. Am. Stat. Assoc. 49, 732–764 (1954)
MATH Google Scholar
Haigh, J.: A neat way to prove asymptotic normality. Biometrika 3, 677–678 (1971)
Article MathSciNet MATH Google Scholar
Hajek, J.: Some extensions of the Wald-Wolfowitz-Noether theorem. Ann. Math. Stat. 32, 506–523 (1961)
Article MathSciNet MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article MATH Google Scholar
Hubert, L.J.: Inference procedures for the evaluation and comparison of proximity matrices. In: Felsenstein, J. (ed.) Numerical Taxonomy. Springer, Berlin (1983)
Google Scholar
Hubert, L.J.: Combinatorial data analysis: association and partial association. Psychometrika 50(4), 449–467 (1985)
Article MathSciNet MATH Google Scholar
Hubert, L.J.: Assignment methods in combinatorial data analysis. Numerical Taxonomy. Marcel Dekker, New York (1987)
Google Scholar
Hulsen, T., de Vlieg, J., Leunissen, J., Groenen, P.: Testing statistical significance with structure similarity. BMC Bioinf. 7(444), 1 (2006). Online
Google Scholar
Kendall, M.G.: Rank correlation methods. Charles Griffin, London (1970). First edition in 1948
Google Scholar
Lecalvé, G.: Un indice de similarité pour des variables de types quelconques. Statistique et Analyse des Données 01–02, 39–47 (1976)
Google Scholar
Lerman, I.C.: Étude distributionnelle de statistiques de proximité entre structures finies de même type; application à la classification automatique. Cahiers du Bureau Universitaire de Recherche Opérationnelle 19 1–52 (1973)
Google Scholar
Lerman, I.C.: Formal analysis of a general notion of proximity between variables. In: Barra, J.R., et al. (eds.) Recent Developments in Statistics, pp. 787–795. North-Holland, New York (1977)
Google Scholar
Lerman, I.C.: Classification et analyse ordinale des données. Dunod and http://www.brclasssoc.org.uk/books/index.html (1981)
Lerman, I.C.: Indices d’association partielle entre variables qualitatives nominales. RAIRO série verte 17(3), 213–259 (1983)
Google Scholar
Lerman, I.C.: Indices d’association partielle entre variables qualitatives ordinales. Publications Institut de Statistique des Universités de Paris, (XXVIII, 1,2), 7–46 (1983)
Google Scholar
Lerman, I.C.: Justification et validité d’une échelle \([0, 1]\) de fréquence mathématique pour une structure de proximité sur un ensemble de variables observées. Publications de l’Institut de Statistique des Universités de Paris 29, 27–57 (1984)
MathSciNet MATH Google Scholar
Lerman, I.C.: Maximisation de l’association entre deux variables qualitatives ordinales. Mathématiques et Sciences Humaines 100, 49–56 (1987)
MathSciNet MATH Google Scholar
Lerman, I.C.: Comparing partitions (mathematical and statistical aspects). In: Bock, H.H. (ed.) Classification and Related Methods of Data Analysis, pp. 121–131. North-Holland, Amsterdam (1988)
Google Scholar
Lerman, I.C.: Conception et analyse de la forme limite d ’ une famille de coefficients statistiques d ’ association entre variables relationnelles, i. Revue Mathématique Informatique et Sciences Humaines 118, 35–52 (1992)
Google Scholar
Lerman, I.C.: Conception et analyse de la forme limite d ’ une famille de coefficients statistiques d ’ association entre variables relationnelles, ii. Revue Mathématique Informatique et Sciences Humaines 119, 75–100 (1992)
Google Scholar
Lerman, I.C.: Comparing classification tree structures: a special case of comparing q-ary relations. RAIRO-Oper. Res. 33, 339–365 (1999)
Article MathSciNet MATH Google Scholar
Lerman, I.C.: Comparing taxonomic data. Revue Mathématiques et Sciences Humaines 150, 37–51 (2000)
Google Scholar
Lerman, I.C., Peter, P.: Structure maximale pour la somme des carrés d’une contingence aux marges fixées; une solution algorithmique programmée. Revue française d’automatique, d’informatique et de recherche opérationnelle 22(2), 83–136 (1988)
MathSciNet Google Scholar
Lerman, I.C., Peter, P., Risler, J.L.: Matrices AVL pour la classification et l’alignement de séquences protéiques. Research Report 2466, IRISA-INRIA, September 1994
Google Scholar
Lerman, I.C., Rouxel, F.: Comparing classification tree structures: a special case of comparing q-ary relations ii. RAIRO-Oper. Res. 34, 251–281 (2000)
Google Scholar
Mantel, N.: Detection of disease clustering and a generalized approach. Cancer Res. 27(2), 209–220 (1967)
Google Scholar
Messatfa, H.: An algorithm to maximize the agreement between partitions. J. Classif. 9(1), 5–15 (1992)
Article MathSciNet MATH Google Scholar
Mielke, P.W.: On asymptotic non-normality of null distributions of MRPP statistics. In: Communications in Statistics, Theory and Methods, pp. A8:1541–1550 (1979)
Google Scholar
Monjardet, B.: Concordance between two linear orders: The Spearman and Kendall coefficients revisited. J. Classif. 14, 269–295 (1997)
Article MathSciNet MATH Google Scholar
Motoo, M.: On the Hoeffding’s combinatorial central limit theorem. Ann. Inst. Stat. Math. 8, 145–154 (1957)
Article MathSciNet MATH Google Scholar
Noether, G.: On a theorem by Wald and Wolfowitz. Ann. Math. Stat. 20, 455–458 (1949)
Article MathSciNet MATH Google Scholar
Ouali-Allah, M.: Analyse en préordonnance des données qualitatives. Application aux données numériques et symboliques. Ph.D. thesis, Université de Rennes 1, Decembre 1991
Google Scholar
Pinto Da Costa, J.F., Roque, L.A.C.: Limit distribution for the weighted rank correlation coefficient, \(r_{W}\). REVSTAT - Stat. J. 3, 189–200 (2006)
MathSciNet MATH Google Scholar
Somers, R.H.: Analysis of partial rank correlation measures based on the product-moment model: Part one. Social Forces 53(2), 229–246 (1974)
Article Google Scholar
Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)
Article Google Scholar
Steinley, D., Hendrickson, G., Brusco, M.J.: A note on maximizing the agreement between partitions: a stepwise optimal algorithm and some properties. J. Classif. 32, 114–126 (2015)
Article MathSciNet MATH Google Scholar
Tshuprow, A.A.: Principles of the Mathematical Theory of Correlation (trans: Kantorowitsch, M). W. Hodge and Co, London (1939)
Google Scholar
Villoing, P.: Classification ascendante hiérarchique et indices de similarité sur données qualitatives nominales selon l’algorithme de la vraisemblance de la vraisemblance du lien. Ph.D. thesis, Université de Rennes 1, December 1980
Google Scholar
Wald, A., Wolfowitz, J.: Statistical tests based on permutations of the observations. Ann. Math. Stat. 15, 358–372 (1944)
Article MathSciNet MATH Google Scholar
Wilson, E.B., Hilferty, MM: The distribution of chi-square. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 17, pp. 684–688 (1931)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Data Knowledge and Management, University of Rennes 1, IRISA, Rennes, Ille-et-Vilaine, France
Israël César Lerman

Authors

Israël César Lerman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Israël César Lerman .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lerman, I.C. (2016). Comparing Attributes by a Probabilistic and Statistical Association II. In: Foundations and Methods in Combinatorial and Statistical Data Analysis and Clustering. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-4471-6793-8_6

Download citation

DOI: https://doi.org/10.1007/978-1-4471-6793-8_6
Published: 25 March 2016
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6791-4
Online ISBN: 978-1-4471-6793-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics