Skip to main content

Part of the book series: Massive Computing ((MACO,volume 6))

Abstract

This chapter reviews a data mining and knowledge discovery approach called OCAT (for One Clause At a Time). The OCAT approach is based on concepts of mathematical logic and discrete optimization. As input it uses samples of the performance of the system (or phenomenon) under consideration and then it extracts its underlying behavior in terms of a compact and rather accurate set of classification rules. This chapter also provides ways for decomposing large scale data mining problems, and a way of how to generate the next best example to consider for training. The later methods can be combined with any Boolean function learning method and are not restricted to the OCAT approach only.

Triantaphyllou, E. and G. Felici (Eds.), Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, Massive Computing Series, Springer, Heidelberg, Germany, pp. 45–87, 2006.

The author is very appreciative for the support by the U.S. Navy, Office of Naval Research (ONR), research grants N00014-95-1-0639 and N00014-97-1-0632.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Angluin, D., (1987), “Learning Propositional Horn Sentences With Hints,” Technical Report, YALE/DCS/RR-590, Department of Computer Science, Yale University, Connecticut, U.S.A.

    Google Scholar 

  • Babel, L. and G. Tinhofer, (1990), “A Branch and Bound Algorithm for the Maximum Clique Problem,” Methods and Models of Operations Research, Vol. 34, pp. 207–217.

    Article  MathSciNet  MATH  Google Scholar 

  • Babel, L., (1991), “Finding Maximum Cliques in Arbitrary and in Special Graphs,” Computing, Vol. 46, pp. 321–341.

    Article  MathSciNet  MATH  Google Scholar 

  • Babel, (1995), “A Fast Algorithm for the Maximum Weight Clique Problem,” Computing, Vol. 10, pp. 12–23.

    MathSciNet  Google Scholar 

  • Balas, E. and J. Xue, (1993), “Weighted and Unweighted Maximum Clique Algorithms with Upper Bounds From Fractional Coloring,” Management Science Research Report #MSRR-590, Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A., 19 pages.

    Google Scholar 

  • Balas, E. and W. Niehaus, (1994), “Finding Large Cliques by Bipartite Matching,” Management Science Research Report #MSRR-597, Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A., 11 pages.

    Google Scholar 

  • Bartnikowski, S., M. Granberry, J. Mugan, and K. Truemper, (2006), “Transformation of Rational Data and Set Data to Logic Data,” Chapter 7 in: “Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques,” Triantaphyllou, E. and G. Felici (Eds.), Massive Computing Series, Springer, Heidelberg, Germany, pp. 253–278.

    Chapter  Google Scholar 

  • Bollobás, B., (1979), “Graph Theory, An Introductory Course,” Springer, Berlin, Germany.

    MATH  Google Scholar 

  • Bongard, M., (1970), “Pattern Recognition,” Spartan Books, New York, NY, U.S.A.

    MATH  Google Scholar 

  • Boros, E., P.L. Hammer, and J.N. Hooker, (1994), “Predicting Cause-Effect Relationships from Incomplete Discrete Observations,” SIAM Journal on Discrete Mathematics, Vol. 7, No. 4, pp. 531–543.

    Article  MathSciNet  MATH  Google Scholar 

  • Bradshaw, G., R. Fozzard, and L. Cece, (1989), “A Connectionist Expert System that Really Works,” Advances in Neural Information Processing, Morgan Kaufman, Palo Alto, CA, U.S.A.

    Google Scholar 

  • Brayton, R., G. Hachtel, C. McMullen, and A. Sangiovanni-Vincentelli, (1985), “Logic Minimization Algorithms for VLSI Minimization,” Kluwer Academic Publishers, Norwell, MA, U.S.A.

    Google Scholar 

  • Brown, D., (1981), “A State-Machine Synthesizer-SMS,” Proceedings of the 18-th Design Automation Conference, pp. 443–458.

    Google Scholar 

  • Carraghan, R. and P.M. Pardalos, (1990), “An Exact Algorithm for the Maximum Clique Problem,” Operations Research Letters, Vol. 9, No. 11, pp. 375–382 (1990).

    Article  MATH  Google Scholar 

  • Chang, L, R. Engel, D. Kandlur, D. Pendarakis, D. Saha, (1999), “Key Management for Secure Internet Multicast using Boolean Function Minimization Techniques,” Proceedings of IEEE Infocomm, 1999. Also available as a PDF file from the Citeseer website.

    Google Scholar 

  • Cohn, D., L. Atlas and R. Ladner, (1994), “Improving Generalizing with Active Learning,” Machine Learning, Vol. 15, pp. 201–221.

    Google Scholar 

  • Deshpande, A.S., and E. Triantaphyllou, (1998), “A Greedy Randomized Adaptive Search Procedure (GRASP) for Inferring Logical Clauses from Examples in Polynomial Time and some Extensions,” Mathematical and Computer Modelling, Vol. 27, No. 1, pp. 75–99.

    Article  MathSciNet  Google Scholar 

  • Dietterich, T.C., and R.S. Michalski, (1983), “A Comparative Review of Selected Methods for Learning from Examples,” R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (eds.). Machine Learning: An Artificial Intelligence Approach, Tioga Publishing Company, Palo Alto, CA, U.S.A., pp. 41–81.

    Google Scholar 

  • Fayyad, U.M., G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996, Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, MA, U.S.A.

    Google Scholar 

  • Felici, G., and K. Truemper, (2002), “A Minsat Approach for Learning in Logic Domains,” INFORMS Journal on Computing, Vol. 14, No. 1, Winter 2002, pp. 20–36.

    Article  MathSciNet  Google Scholar 

  • Feo, T.A. and M.G.C. Resende, (1995), “Greedy Randomized Adaptive Search Procedures,” Journal of Global Optimization, Vol. 6, pp. 109–133.

    Article  MathSciNet  MATH  Google Scholar 

  • Fu, L.M., (1993), “Knowledge-Based Connectionism for Revising Domain Theories,” IEEE Transactions on Systems, Man, and Cybernetics, Vol. 23, No. 1, pp. 173–182.

    Article  Google Scholar 

  • Galant, S., (1988), “Connectionist Expert Systems,” Commun. of the ACM, Vol. 31, No. 2, pp. 152–169.

    Article  Google Scholar 

  • Goldman, S.A., (1990), “Learning Binary Relations, Total Orders, and Read-Once Formulas,” Ph.D. Thesis, Massachusetts Institute of Technology, September 1990. Available as Technical Report MIT/LCS/TR-483, MIT Laboratory for Computer Science.

    Google Scholar 

  • Goldman, S., and R.H. Sloan, (1994), “The Power of Self-Directed Learning,” Machine Learning, Vol. 14, pp. 271–294.

    Google Scholar 

  • Golumbic, M.C., (1980), Algorithmic Graph Theory and Perfect Graphs, Academic Press, New York, NY, U.S.A.

    MATH  Google Scholar 

  • Gimpel, J., (1965), “A Method of Producing a Boolean Function Having an Arbitrarily Prescribed Prime Implicant Table,” IEEE Trans, on Computers, Vol. 14, pp. 485–488.

    MATH  Google Scholar 

  • Hall, L., and A. Romaniuk, (1990), “A Hybrid Connectionist, Symbolic Learning System,” Proceedings of the AAAI’ 90, Boston, MA, U.S.A., pp. 783–788.

    Google Scholar 

  • Hattori, K. and Y. Torri, (1993), “Effective Algorithms for the Nearest Neighbor Method in the Clustering Problem,” Pattern Recognition, Vol. 26, No. 5, pp. 741–746.

    Article  Google Scholar 

  • Haussler, D. 1989, “Learning conjunctive concepts in structural domains,” Machine Learning, Vol. 4, pp. 7–40.

    Google Scholar 

  • Haussler, D., (1988), “Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework,” Artificial Intelligence, Vol. 36, pp. 177–221.

    Article  MathSciNet  MATH  Google Scholar 

  • Haussler, D., and M. Warmuth, (1993), “The Probably Approximately Correct (PAC) and Other Learning Models,” Chapter in: Foundations of Knowledge Acquisition: Machine Learning, A.L. Meyrowitz and S. Chipman (Eds.), Kluwer Academic Publishers, Norwell, MA, U.S.A., pp. 291–312.

    Google Scholar 

  • Hong, S., R. Cain, and D. Ostapko, (1974), “MINI: A Heuristic Approach for Logic Minimization,” IBM J. Res. Develop., pp. 443–458.

    Google Scholar 

  • Johnson, N., (1991), “Everyday Diagnostics: A Critique of the Bayesian Model,” Med. Hypotheses, Vol. 34, No. 4, pp. 289–96.

    Article  Google Scholar 

  • Quine, W., (1952), “The Problem of Simplifying Truth Functions,” Am. Math. Monthly, Vol. 59, pp. 102–111.

    Article  MathSciNet  Google Scholar 

  • Quine, W., (1955), “A Way to Simplify Truth Functions,” Am. Math. Monthly, Vol. 62.

    Google Scholar 

  • Quinlan, J.R., (1986), “Induction of Decision Trees,” Machine Learning, Vol. 1, No. 1, pp. 81–106.

    Google Scholar 

  • Quinlan, J.R., (1979), “Discovering Rules by Induction from Large Numbers of Examples: A Case Study,” D. Michie (ed.), Expert Systems in the Micro-Electronic Age. Edinburgh University Press, Scotland, UK.

    Google Scholar 

  • Kamath, A.P., N.K. Karmakar, K.G. Ramakrishnan, and M.G.C. Resende, (1992), “A Continuous Approach to Inductive Inference,” Math. Progr., Vol. 57, pp. 215–238.

    Article  MATH  Google Scholar 

  • Kamath, A.P., N.K. Karmakar, K.G. Ramakrishnan, and M.G.C. Resende, (1994), “An Interior Point Approach to Boolean Vector Synthesis,” Proceedings of the 36-th MSCAS, pp. 1–5.

    Google Scholar 

  • Kamgar-Parsi, B. and L.N. Kanal, (1985), “An Improved Branch-And-Bound Algorithm for Computing k-Nearest Neighbors,” Pattern Recognition Letters, Vol. 3 pp. 7–12.

    Article  Google Scholar 

  • Karmakar, N.K., M.G.C. Resende, and K.G. Ramakrishnan, (1992), “An Interior Point Algorithm to Solve Computationally Difficult Set Covering Problems,” Math. Progr, Vol. 52, pp. 597–618.

    Article  Google Scholar 

  • Karnaugh, M., (1953), “The Map Method for Synthesis of Combinatorial Logic Circuits,” Transactions of the AIEE, Communications and Electronics, Vol. 72, pp. 593–599.

    MathSciNet  Google Scholar 

  • Kearns, M., M. Li, L. Pitt, and L.G. Valiant, (1987), “On the Learnability of Boolean Formulae,” Journal of the Association for Computing Machinery, No. 9, pp. 285–295.

    Google Scholar 

  • Kovalerchuk, B., E. Triantaphyllou, J.F. Ruiz, V.I. Torvik, and E. Vityaev, (2000), “The Reliability Issue of Computer-Aided Breast Cancer Diagnosis,” Computers and Biomedical Research, Vol. 33, No. 4, August, pp. 296–313.

    Article  Google Scholar 

  • Kurita, T., (1991), “An Efficient Agglomerative Clustering Algorithm Using a Heap,” Pattern Recognition, Vol. 24, No. 3, pp. 205–209.

    Article  MathSciNet  Google Scholar 

  • Mangasarian, O.L., W.N. Street, and W.H. Woldberg, (1995), “Breast Cancer Diagnosis and Prognosis Via Linear Programming,” Operations Research, Vol. 43, No. 4, pp. 570–577.

    Article  MathSciNet  MATH  Google Scholar 

  • Mangasarian, O.L., R. Setiono, and W.H. Woldberg, (1991), “Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis,” Large-Scale Numerical Optimization, T.F. Coleman, and Y. Li, (Eds.), SIAM, pp. 22–30.

    Google Scholar 

  • Mansour, Y., (1992), “Learning of DNF Formulas,” Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 53–59.

    Google Scholar 

  • McCluskey, E., (1956), “Minimization of Boolean Functions,” Bell Syst. Tech. J., Vol. 35, pp. 1417–1444.

    MathSciNet  Google Scholar 

  • Motwani, R, and P. Raghavan, (1995), Randomized Algorithms, Cambridge University Press, 1995.

    Google Scholar 

  • Nieto Sanchez, S., E. Triantaphyllou, J. Chen, and T.W. Liao, (2002), “An Incremental Learning Algorithm for Constructing Boolean Functions From Positive and Negative Examples,” Computers and Operations Research, Vol. 29, No. 12, pp. 177–1700.

    Article  MathSciNet  Google Scholar 

  • Nieto Sanchez, S., E. Triantaphyllou, and D. Kraft, (2002), “A Feature Mining Approach for the Classification of Text Documents Into Disjoint Classes,” Information Processing and Management, Vol. 38, No. 4, pp. 583–604.

    Article  MATH  Google Scholar 

  • Pappas, N.L, (1994), Digital Design, West Publishing Co., Minneapolis/St. Paul, MN, U.S.A.

    Google Scholar 

  • Pardalos, P.M. and J. Xue, (1994), “The Maximum Clique Problem,” Journal of Global Optimization, Vol. 4, pp. 301–328.

    Article  MathSciNet  MATH  Google Scholar 

  • Pardalos, P.M. and C.S. Rentala, (1990), “Computational Aspects of a Parallel Algorithm to Find the Connected Components of a Graph,” Technical Report, Dept. of Computer Science, Pennsylvania State University, PA, U.S.A.

    Google Scholar 

  • Peysakh, J., (1987), “A Fast Algorithm to Convert Boolean Expressions into CNF,” IBM Comp. Sci. RC12913 (#57971)

    Google Scholar 

  • Watson, NY. Pitt, L. and L.G. Valiant, (1988), “Computational Limitations on Learning from Examples,” Journal of the Association for Computing Machinery, Vol. 35, No. 4, pp. 965–984.

    MathSciNet  Google Scholar 

  • Rivest, R.L., (1987), “Learning Decision Trees,” Machine Learning, Vol. 2, No. 3, pp. 229–246.

    Google Scholar 

  • Shavlik, J.W., (1994), “Combining Symbolic and Neural Learning,” Machine Learning, Vol. 14, pp. 321–331.

    Google Scholar 

  • Sun, R. and F. Alexandre (Eds.), (1997), “Connectionist-Symbolic Integration: From Unified to Hybrid Approaches,” Lawrence Erilbaum Associates, Publishers, Mahwah, NJ, U.S.A.

    Google Scholar 

  • Torvik, V.I., and E. Triantaphyllou, (2006), “Discovering Rules that Govern Monotone Phenomena,” Chapter 4 in: “Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques,” Triantaphyllou, E. and G. Felici (Eds.), Massive Computing Series, Springer, Heidelberg, Germany, pp. 149–192.

    Chapter  Google Scholar 

  • Towell, G., J. Havlic, and M. Noordewier, (1990), “Refinement Approximate Domain Theories by Knowledge-Based Neural Networks,” Proceedings of the AAAI’ 90 Conference, Boston, MA, U.S.A., pp. 861–866.

    Google Scholar 

  • Triantaphyllou, E., (2006), “Data Mining and Knowledge Discovery Via a Logic-Based Approach,” Massive Computing Series, Springer, Heidelberg, Germany.

    Google Scholar 

  • Triantaphyllou, E., and A.L. Soyster, (1996), “On the Minimum Number of Logical Clauses Which Can be Inferred From Examples,” Computers and Operations Research, Vol. 23, No. 8, pp. 783–799.

    Article  MathSciNet  MATH  Google Scholar 

  • Triantaphyllou, E., and A.L. Soyster, (1995a), “A Relationship Between CNF and DNF Systems Derivable from Examples,” ORSA Journal on Computing, Vol. 7, No. 3, pp. 283–285.

    MATH  Google Scholar 

  • Triantaphyllou, E., and A.L. Soyster, (1995b), “An Approach to Guided Learning of Boolean Functions,” Mathematical and Computer Modeling, Vol. 23, No. 3, pp. 69–86.

    Article  Google Scholar 

  • Triantaphyllou, E., (1994), “Inference of A Minimum Size Boolean Function From Examples by Using A New Efficient Branch-and-Bound Approach,” Journal of Global Optimization, Vol. 5, No. 1, pp. 69–94.

    Article  MathSciNet  MATH  Google Scholar 

  • Triantaphyllou, E., A.L. Soyster, and S.R.T. Kumara, (1994), “Generating Logical Expressions From Positive and Negative Examples Via a Branch-and-Bound Approach,” Computers and Operations Research, Vol. 21, No. 2, pp. 185–197.

    Article  MATH  Google Scholar 

  • Truemper, K., (2004), Design of Logic-based Intelligent Systems, John Wiley & Sons, Inc., New York, NY, U.S.A.

    Book  MATH  Google Scholar 

  • Truemper, K., (1998), Effective Logic Computation, Wiley-Interscience, New York, NY, U.S.A.

    MATH  Google Scholar 

  • Woldberg, W.W., and O.L. Mangasarian, (1990), “A Multisurface Method of Pattern Separation for Medical Diagnosis Applied to Breast Cytology,” Proceedings of the National Academy of Sciences of the USA, Vol. 87, No. 23, pp. 9193–9196.

    Article  Google Scholar 

  • Valiant, L.G., (1984), “A Theory of the Learnable,” Comm. of ACM, Vol. 27, No. 11, pp. 1134–1142.

    Article  MATH  Google Scholar 

  • Valiant, L.G., (1985), “Learning Disjunctions of Conjunctives,” Proceedings of the 9th IJCAI, pp. 560–566.

    Google Scholar 

  • Yilmaz, E., E. Triantaphyllou, J. Chen, and T.W. Liao, (2003), “A Heuristic for Mining Association Rules In Polynomial Time,” Mathematical and Computer Modelling, No. 37, pp. 219–233.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Triantaphyllou, E. (2006). The One Clause at a Time (OCAT) Approach to Data Mining and Knowledge Discovery. In: Triantaphyllou, E., Felici, G. (eds) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing, vol 6. Springer, Boston, MA . https://doi.org/10.1007/0-387-34296-6_2

Download citation

  • DOI: https://doi.org/10.1007/0-387-34296-6_2

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-34294-8

  • Online ISBN: 978-0-387-34296-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics