Skip to main content

Japanese-Automobile Data

Explanation 2 of Matroska Feature-Selection Method (Method 2)

  • Chapter
  • First Online:
  • 912 Accesses

Abstract

Japanese-automobile data consist of 29 regular and 15 small cars with six independent variables, such as the emission rate (X1), price (X2), number of seats (X3), CO2 (X4), fuel (X4), and sales (X6). The following points are important for this book: (1) LSD discrimination: We can easily recognize that these data are LSD because X1 and X3 can separate two classes completely by two box–whisker plots. (2) Problem 3: The forward stepwise procedure selects X1, X2, X3, X4, X5, and X6 in this order. Although MNM of Revised IP-OLDF and NM of QDF are zeroes in the one-variable model (X1), QDF misclassifies all regular cars as small cars after X3 enters the model because the X3 value in small cars is four (Problem 3). These data are very suitable for explaining Problem 3 because they are easier than examination scores that use 100 items. (3) Explanation of Method 2 by these data: When we discriminate six microarray datasets by eight LDFs, only Revised IP-OLDF can naturally make the feature-selection and reduce the high-dimensionnal gene space to the small gene subspace that is a linearly separable model. We call these subspaces, “Matroska.” We establish the Matroska feature-selection method for the microarray dataset (Method 2), and the data consist of several disjoint small Matroskas with MNM = 0. Because LSD discrimination is not popular now and Method 2 has several unknown ideas, we explain these ideas by these data in addition to the Swiss banknote data from Chap. 6 and Student linearly separable data in Chap. 4. If the data are LSD, the full model is the largest Matroska that contains many smaller Matroskas in it. We already know that the smallest Matroska (the basic gene set or subspase, BGS) can describe the Matroska structure completely because MNM decreases monotonously. On the other hand, LASSO attempts to make feature-selection. If it cannot find BGS in the dataset, it cannot explain the dataset structure. Therefore, LASSO researchers have better examine their method by two common data before examining microarray datasets. If they are not successful in these ordinary data, it is not logical for them to expect a successful result for gene analysis. In particular, Japanese-automobile data are simple data for feature-selection because only two one-variable models are linearly separable and BGSs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    See many studies of DEA analysis at Japanese researcher’s DB: http://researchmap.jp/read0049917/. You can download over 14 papers from Misc(ellanies) category after 2013.

References

  • Buhlmann P, Geer AB (2011) Statistics for high-dimensional data-method, theory and applications. Springer, Berlin

    Google Scholar 

  • Cox DR (1958) The regression analysis of binary sequences (with discussion). J Roy Stat Soc B 20:215–242

    MATH  Google Scholar 

  • Firth D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80:27–39

    Article  MathSciNet  MATH  Google Scholar 

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 7:179–188

    Article  Google Scholar 

  • Fisher RA (1956) Statistical methods and statistical inference. Hafner Publishing Co, New Zealand

    Google Scholar 

  • Flury B, Rieduyl H (1988) Multivariate statistics: a practical approach. Cambridge University Press, Cambridge

    Google Scholar 

  • Friedman JH (1989) Regularized discriminant analysis. J Am Stat Assoc 84(405):165–175

    Article  MathSciNet  Google Scholar 

  • Goodnight JH (1978) SAS technical report—the sweep operator: its importance in statistical computing—(R100). SAS Institute Inc, USA

    Google Scholar 

  • Jeffery IB, Higgins DG, Culhane C (2006) Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinf. Jul 26 7:359:1–16. doi: 10.1186/1471-2105-7-359

  • Lachenbruch PA, Mickey MR (1968) Estimation of error rates in discriminant analysis. Technometrics 10:1–11

    Article  MathSciNet  Google Scholar 

  • Sall JP, Creighton L, Lehman A (2004) JMP start statistics, third edition. SAS Institute Inc., USA. (Shinmura S. edits Japanese version)

    Google Scholar 

  • Schrage L (1991) LINDO—an optimization modeling systems. The Scientific Press, UK. (Shinmura S. & Takamori, H. translate Japanese version)

    Google Scholar 

  • Schrage L (2006) Optimization Modeling with LINGO. LINDO Systems Inc. (Shinmura S. translates Japanese version)

    Google Scholar 

  • Shinmura S (1998) optimal linear discriminant functions using mathematical programming. J Jpn Soc Comput Stat 11/2:89–101

    Google Scholar 

  • Shinmura S, Tarumi T (2000) Evaluation of the optimal linear discriminant functions using integer programming (IP-OLDF) for the normal random data. J Jpn Soc Comput Stat 12(2):107–123

    Google Scholar 

  • Shinmura S (2000a) A new algorithm of the linear discriminant function using integer programming. New Trends Prob Stat 5:133–142

    MathSciNet  MATH  Google Scholar 

  • Shinmura S (2000b) Optimal linear discriminant function using mathematical programming. Dissertation, March 200:1–101, Okayama University, Japan

    Google Scholar 

  • Shinmura S (2003) Enhanced algorithm of IP-OLDF. ISI2003 CD-ROM, pp 428–429

    Google Scholar 

  • Shinmura S (2004) New algorithm of discriminant analysis using integer programming. IPSI 2004 Pescara VIP Conference CD-ROM, pp 1–18

    Google Scholar 

  • Shinmura S (2005) New age of discriminant analysis by IP-OLDF –beyond Fisher’s linear discriminant function. ISI2005, pp 1–2

    Google Scholar 

  • Shinmura S (2007) Overviews of discriminant function by mathematical programming. J Jpn Soc Comput Stat 20(1–2):59–94

    Google Scholar 

  • Shinmura S (2010a) The optimal linearly discriminant function. Union of Japanese Scientist and Engineer Publishing, Japan

    Google Scholar 

  • Shinmura S (2010b) Improvement of CPU time of Revised IP-OLDF using Linear Programming. J Jpn Soc Comput Stat 22(1):39–57

    Google Scholar 

  • Shinmura S (2011a) Beyond Fisher’s linear discriminant analysi—new world of the discriminant analysis. ISI CD-ROM, pp 1–6

    Google Scholar 

  • Shinmura S (2011b) Problems of discriminant analysis by mark sense test data. Jpn Soc Appl Stat 40(3):157–172

    Article  Google Scholar 

  • Shinmura S (2013) Evaluation of optimal linear discriminant function by 100-fold cross-validation. ISI CD-ROM, pp 1–6

    Google Scholar 

  • Shinmura S (2014a) End of discriminant functions based on variance-covariance matrices. ICORE2014, pp 5–16

    Google Scholar 

  • Shinmura S (2014b) Improvement of CPU time of linear discriminant functions based on MNM criterion by IP. Stat Optim Inf Comput 2:114–129

    Article  Google Scholar 

  • Shinmura S (2014c) Comparison of linear discriminant functions by k-fold cross-validation. Data Anal 2014:1–6

    Google Scholar 

  • Shinmura S (2015a) The 95 % confidence intervals of error rates and discriminant coefficients. Stat Optim Inf Comput 2:66–78

    MathSciNet  Google Scholar 

  • Shinmura S (2015b) Four serious problems and new facts of the discriminant analysis. In: Pinson E, Valente F, Vitoriano B (ed) Operations research and enterprise systems, pp 15–30. Springer, Berlin (ISSN: 1865-0929, ISBN: 978-3-319-17508-9, doi:10.1007/978-3-319-17509-6)

    Google Scholar 

  • Shinmura S (2015c) A trivial linear discriminant function. Stat Optim Inf Comput 3:322–335. doi:10.19139/soic.20151202

    Article  MathSciNet  Google Scholar 

  • Shinmura S (2015d) Four problems of the discriminant analysis. ISI 2015:1–6

    Google Scholar 

  • Shinmura S (2015e) The discrimination of microarray data (Ver. 1). Res Gate 1:1–4. 28 Oct 2015

    Google Scholar 

  • Shinmura S (2015f) Feature selection of three microarray data. Res Gate 2:1–7. 1 Nov 2015

    Article  Google Scholar 

  • Shinmura S (2015g) Feature Selection of Microarray Data (3)—Shipp et al. Microarray Data. Research Gate (3) 1–11

    Google Scholar 

  • Shinmura S (2015h) Validation of feature selection (4)—Alon et al. microarray data. Res Gate (4) 1–11

    Google Scholar 

  • Shinmura S (2015i) Repeated feature selection method for microarray data (5). Res Gate 5:1–12. 9 Nov 2015

    Google Scholar 

  • Shinmura S (2015j) Comparison Fisher’s LDF by JMP and revised IP-OLDF by LINGO for microarray data (6). Res Gate 6:1–10. 11 Nov 2015

    Google Scholar 

  • Shinmura S (2015k) Matroska trap of feature selection method (7)—Golub et al. microarray data. Res Gate (7), 18:1–14

    Google Scholar 

  • Shinmura S (2015l) Minimum Sets of Genes of Golub et al. Microarray Data (8). Res Gate (8) 1–12. 22 Nov 2015

    Google Scholar 

  • Shinmura S (2015m) Complete lists of small matroska in Shipp et al. microarray data (9). Res Gate (9) 1–81

    Google Scholar 

  • Shinmura S (2015n) Sixty-nine small matroska in Golub et al. microarray data (10). Res Gate 1–58

    Google Scholar 

  • Shinmura S (2015o) Simple structure of Alon et al. microarray data (11). Res Gate(1.1) 1–34

    Google Scholar 

  • Shinmura S (2015p) Feature selection of Singh et al. microarray data (12). Res Gate (12) 1–89

    Google Scholar 

  • Shinmura S (2015q) Final list of small matroska in Tian et al. microarray data. Res Gate (13) 1–160

    Google Scholar 

  • Shinmura S (2015r) Final list of small matroska in Chiaretti et al. microarray data. Res Gate (14) 1–16

    Google Scholar 

  • Shinmura S (2016a) The best model of swiss banknote data. Stat Optim Inf Comput, 4:118–131. International Academic Press (ISSN: 2310-5070 (online) ISSN: 2311-004X (print), doi:10.19139/soic.v4i2.178)

  • Shinmura S (2016b) Matroska featurE−selection method for microarray data. Biotechnology 2016:1–6

    Google Scholar 

  • Shinmura S (2016c) The Best Model of Swiss banknote data. Statistics, Optimization and Information Computing, vol. X: 0–13

    Google Scholar 

  • Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparsE−group lasso. J Comput Graph Stat 22:231–245

    Article  MathSciNet  Google Scholar 

  • Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin

    Book  MATH  Google Scholar 

Bibliography

  • Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ, Broad AJ (1999) Patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750

    Article  Google Scholar 

  • Chiaretti S, Li X, Gentleman R, Vitale A, Vignetti M, Mandelli F, Ritz J, Chiaretti RF (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood, 103/7:2771–2778

    Google Scholar 

  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286(5439):531–537

    Article  Google Scholar 

  • Miyake A, Shinmura S (1976) Error rate of linear discriminant function. In: Dombal FT, Gremy F (ed). North-Holland Publishing Company, The Netherland, pp 435–445

    Google Scholar 

  • Sall JP (1981) SAS regression applications. SAS Institute Inc., USA. (Shinmura S. translate Japanese version)

    Google Scholar 

  • Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR (2002) Diffuse large B-cell lymphoma outcome prediction by genE−expression profiling and supervised machine learning. Nat Med 8:68–74

    Article  Google Scholar 

  • Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico A, Richie JP, Lander ES, Lada M, Kantoff PW, Golub TR, Sellers WR (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209

    Google Scholar 

  • Taguchi G, Jugular R (2002) The mahalanobis-taguchi strategy—a pattern technology system. Wiley, New York

    Google Scholar 

  • Tian E, Zhan F, Walker R, Rasmussen E, Ma Y, Barlogie B, Shaughnessy JD (2003) The role of the wnt-signaling antagonist dkk1 in the development of osteolytic lesions in multiple myeloma. New Engl J Med 349(26):2483–2494

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuichi Shinmura .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this chapter

Cite this chapter

Shinmura, S. (2016). Japanese-Automobile Data. In: New Theory of Discriminant Analysis After R. Fisher. Springer, Singapore. https://doi.org/10.1007/978-981-10-2164-0_7

Download citation

Publish with us

Policies and ethics