Skip to main content

The Complexity of Some Pattern Problems in the Logical Analysis of Large Genomic Data Sets

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2016)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9656))

Included in the following conference series:

Abstract

Many biomedical experiments produce large data sets in the form of binary matrices, with features labeling the columns and individuals (samples) associated to the rows. An important case is when the rows are also labeled into two groups, namely the positive (or healthy) and the negative (or diseased) samples. The Logical Analysis of Data (LAD) is a procedure aimed at identifying relevant features and building boolean formulas (rules) which can be used to classify new samples as positive or negative. These rules are said to explain the data set. Each rule can be represented by a string over {0,1,-}, called a pattern. A data set can be explained by alternative sets of patterns, and many computational problems arise related to the choice of a particular set of patterns for a given instance. In this paper we study the computational complexity of these pattern problems and show that they are, in general, very hard. We give an integer programming formulation for the problem of determining if two sets of patterns are equivalent. We also prove computational complexity results which imply that there should be no simple ILP model for finding a minimal set of patterns explaining a given data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alexe, G., Alexe, S., Bonates, T.O., Kogan, A.: Logical analysis of data - the vision of Peter L. Hammer. Ann. Math. Artif. Intell. 49, 265–312 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  2. Baker, W., van den Broek, A., Camon, E., Hingamp, P., Sterk, P., Stoesser, G., Tuli, M.A.: The EMBL nucleotide sequence database. Nucleic Acid Res. 28, 19–23 (2000)

    Article  Google Scholar 

  3. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acid Res. 28, 235–242 (2000)

    Article  Google Scholar 

  4. Bertolazzi, P., Felici, G., Festa, P., Lancia, G.: Logic classification and feature selection for biomedical data. Comput. Math. Appl. 55, 889–899 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  5. Boros, E., Hammer, P., Ibaraki, T., Kogan, A., Mayoraz, E., Muchnik, I.: An implementation of logical analysis of data. IEEE Trans. Knowl. Data Eng. 12, 292–306 (2000)

    Article  Google Scholar 

  6. Chee, M., Yang, R., Hubbell, E., Berno, A., Huang, X.C., Stern, D., Winkler, J., Lockhart, D.J., Morris, M.S., Fodor, S.P.A.: Accessing genetic information with high-density DNA arrays. Science 274, 610–614 (1996)

    Article  Google Scholar 

  7. Chikalov, I., Lozin, V., Lozina, I., Moshkov, M., Son Nguyen, H., Skowron, A., Zielosko, B.:Logical analysis of data: theory, methodology and applications. In: Three Approaches to Data Analysis, pp. 147–192. Springer (2013)

    Google Scholar 

  8. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)

    Article  Google Scholar 

  9. Felici, G., de Angelis, V., Mancinelli, G.: Feature selection for data mining. In: Felici, G., Triantaphyllou, E. (eds.) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, pp. 227–252. Springer (2006)

    Google Scholar 

  10. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, San Francisco (1979)

    MATH  Google Scholar 

  11. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  12. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  13. Hammer, P., Bonates, T.: Logical analysis of data: from combinatorial optimization to medical applications. RUTCOR Research Report, 10-05, Rutgers University, NJ (2005)

    Google Scholar 

  14. Hebert, P.D.N., Cywinska, A., Ball, S.L., de Waard, J.R.: Biological identifications through DNA barcodes. Proc. R. Soc. Lond. B 270, 313–321 (2003)

    Article  Google Scholar 

  15. Hu, H., Li, J., Plank, A., Wang, H., Daggard, G.: A comparative study of classification methods for microarray data analysis. In: Proceedings of the fifth Australasian Conference on Data Mining and Analystics, vol. 61, pp. 33–37, Sydney, Australia (2006)

    Google Scholar 

  16. Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20, 2429–2437 (2004)

    Article  Google Scholar 

  17. Montgomery, D., Undem, B.L.: CombiMatrix’ customizable DNA microarrays-Tutoial: In situ computer-aided synthesis of custom oligo microarrays. Genetic Eng. News 22, 32–33 (2002). Drug Discovery

    Google Scholar 

  18. Serafini, P.: Classifying negative and positive points by optimal box clustering. Discrete Appl. Math. 165(270–282), 10 (2014)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuseppe Lancia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Lancia, G., Serafini, P. (2016). The Complexity of Some Pattern Problems in the Logical Analysis of Large Genomic Data Sets. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2016. Lecture Notes in Computer Science(), vol 9656. Springer, Cham. https://doi.org/10.1007/978-3-319-31744-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31744-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31743-4

  • Online ISBN: 978-3-319-31744-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics