The One Clause at a Time (OCAT) Approach to Data Mining and Knowledge Discovery

Triantaphyllou, Evangelos

doi:10.1007/0-387-34296-6_2

Evangelos Triantaphyllou³

Part of the book series: Massive Computing ((MACO,volume 6))

1164 Accesses
1 Citations

Abstract

This chapter reviews a data mining and knowledge discovery approach called OCAT (for One Clause At a Time). The OCAT approach is based on concepts of mathematical logic and discrete optimization. As input it uses samples of the performance of the system (or phenomenon) under consideration and then it extracts its underlying behavior in terms of a compact and rather accurate set of classification rules. This chapter also provides ways for decomposing large scale data mining problems, and a way of how to generate the next best example to consider for training. The later methods can be combined with any Boolean function learning method and are not restricted to the OCAT approach only.

Triantaphyllou, E. and G. Felici (Eds.), Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, Massive Computing Series, Springer, Heidelberg, Germany, pp. 45–87, 2006.

The author is very appreciative for the support by the U.S. Navy, Office of Naval Research (ONR), research grants N00014-95-1-0639 and N00014-97-1-0632.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Angluin, D., (1987), “Learning Propositional Horn Sentences With Hints,” Technical Report, YALE/DCS/RR-590, Department of Computer Science, Yale University, Connecticut, U.S.A.
Google Scholar
Babel, L. and G. Tinhofer, (1990), “A Branch and Bound Algorithm for the Maximum Clique Problem,” Methods and Models of Operations Research, Vol. 34, pp. 207–217.
Article MathSciNet MATH Google Scholar
Babel, L., (1991), “Finding Maximum Cliques in Arbitrary and in Special Graphs,” Computing, Vol. 46, pp. 321–341.
Article MathSciNet MATH Google Scholar
Babel, (1995), “A Fast Algorithm for the Maximum Weight Clique Problem,” Computing, Vol. 10, pp. 12–23.
MathSciNet Google Scholar
Balas, E. and J. Xue, (1993), “Weighted and Unweighted Maximum Clique Algorithms with Upper Bounds From Fractional Coloring,” Management Science Research Report #MSRR-590, Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A., 19 pages.
Google Scholar
Balas, E. and W. Niehaus, (1994), “Finding Large Cliques by Bipartite Matching,” Management Science Research Report #MSRR-597, Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A., 11 pages.
Google Scholar
Bartnikowski, S., M. Granberry, J. Mugan, and K. Truemper, (2006), “Transformation of Rational Data and Set Data to Logic Data,” Chapter 7 in: “Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques,” Triantaphyllou, E. and G. Felici (Eds.), Massive Computing Series, Springer, Heidelberg, Germany, pp. 253–278.
Chapter Google Scholar
Bollobás, B., (1979), “Graph Theory, An Introductory Course,” Springer, Berlin, Germany.
MATH Google Scholar
Bongard, M., (1970), “Pattern Recognition,” Spartan Books, New York, NY, U.S.A.
MATH Google Scholar
Boros, E., P.L. Hammer, and J.N. Hooker, (1994), “Predicting Cause-Effect Relationships from Incomplete Discrete Observations,” SIAM Journal on Discrete Mathematics, Vol. 7, No. 4, pp. 531–543.
Article MathSciNet MATH Google Scholar
Bradshaw, G., R. Fozzard, and L. Cece, (1989), “A Connectionist Expert System that Really Works,” Advances in Neural Information Processing, Morgan Kaufman, Palo Alto, CA, U.S.A.
Google Scholar
Brayton, R., G. Hachtel, C. McMullen, and A. Sangiovanni-Vincentelli, (1985), “Logic Minimization Algorithms for VLSI Minimization,” Kluwer Academic Publishers, Norwell, MA, U.S.A.
Google Scholar
Brown, D., (1981), “A State-Machine Synthesizer-SMS,” Proceedings of the 18-th Design Automation Conference, pp. 443–458.
Google Scholar
Carraghan, R. and P.M. Pardalos, (1990), “An Exact Algorithm for the Maximum Clique Problem,” Operations Research Letters, Vol. 9, No. 11, pp. 375–382 (1990).
Article MATH Google Scholar
Chang, L, R. Engel, D. Kandlur, D. Pendarakis, D. Saha, (1999), “Key Management for Secure Internet Multicast using Boolean Function Minimization Techniques,” Proceedings of IEEE Infocomm, 1999. Also available as a PDF file from the Citeseer website.
Google Scholar
Cohn, D., L. Atlas and R. Ladner, (1994), “Improving Generalizing with Active Learning,” Machine Learning, Vol. 15, pp. 201–221.
Google Scholar
Deshpande, A.S., and E. Triantaphyllou, (1998), “A Greedy Randomized Adaptive Search Procedure (GRASP) for Inferring Logical Clauses from Examples in Polynomial Time and some Extensions,” Mathematical and Computer Modelling, Vol. 27, No. 1, pp. 75–99.
Article MathSciNet Google Scholar
Dietterich, T.C., and R.S. Michalski, (1983), “A Comparative Review of Selected Methods for Learning from Examples,” R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (eds.). Machine Learning: An Artificial Intelligence Approach, Tioga Publishing Company, Palo Alto, CA, U.S.A., pp. 41–81.
Google Scholar
Fayyad, U.M., G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996, Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, MA, U.S.A.
Google Scholar
Felici, G., and K. Truemper, (2002), “A Minsat Approach for Learning in Logic Domains,” INFORMS Journal on Computing, Vol. 14, No. 1, Winter 2002, pp. 20–36.
Article MathSciNet Google Scholar
Feo, T.A. and M.G.C. Resende, (1995), “Greedy Randomized Adaptive Search Procedures,” Journal of Global Optimization, Vol. 6, pp. 109–133.
Article MathSciNet MATH Google Scholar
Fu, L.M., (1993), “Knowledge-Based Connectionism for Revising Domain Theories,” IEEE Transactions on Systems, Man, and Cybernetics, Vol. 23, No. 1, pp. 173–182.
Article Google Scholar
Galant, S., (1988), “Connectionist Expert Systems,” Commun. of the ACM, Vol. 31, No. 2, pp. 152–169.
Article Google Scholar
Goldman, S.A., (1990), “Learning Binary Relations, Total Orders, and Read-Once Formulas,” Ph.D. Thesis, Massachusetts Institute of Technology, September 1990. Available as Technical Report MIT/LCS/TR-483, MIT Laboratory for Computer Science.
Google Scholar
Goldman, S., and R.H. Sloan, (1994), “The Power of Self-Directed Learning,” Machine Learning, Vol. 14, pp. 271–294.
Google Scholar
Golumbic, M.C., (1980), Algorithmic Graph Theory and Perfect Graphs, Academic Press, New York, NY, U.S.A.
MATH Google Scholar
Gimpel, J., (1965), “A Method of Producing a Boolean Function Having an Arbitrarily Prescribed Prime Implicant Table,” IEEE Trans, on Computers, Vol. 14, pp. 485–488.
MATH Google Scholar
Hall, L., and A. Romaniuk, (1990), “A Hybrid Connectionist, Symbolic Learning System,” Proceedings of the AAAI’ 90, Boston, MA, U.S.A., pp. 783–788.
Google Scholar
Hattori, K. and Y. Torri, (1993), “Effective Algorithms for the Nearest Neighbor Method in the Clustering Problem,” Pattern Recognition, Vol. 26, No. 5, pp. 741–746.
Article Google Scholar
Haussler, D. 1989, “Learning conjunctive concepts in structural domains,” Machine Learning, Vol. 4, pp. 7–40.
Google Scholar
Haussler, D., (1988), “Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework,” Artificial Intelligence, Vol. 36, pp. 177–221.
Article MathSciNet MATH Google Scholar
Haussler, D., and M. Warmuth, (1993), “The Probably Approximately Correct (PAC) and Other Learning Models,” Chapter in: Foundations of Knowledge Acquisition: Machine Learning, A.L. Meyrowitz and S. Chipman (Eds.), Kluwer Academic Publishers, Norwell, MA, U.S.A., pp. 291–312.
Google Scholar
Hong, S., R. Cain, and D. Ostapko, (1974), “MINI: A Heuristic Approach for Logic Minimization,” IBM J. Res. Develop., pp. 443–458.
Google Scholar
Johnson, N., (1991), “Everyday Diagnostics: A Critique of the Bayesian Model,” Med. Hypotheses, Vol. 34, No. 4, pp. 289–96.
Article Google Scholar
Quine, W., (1952), “The Problem of Simplifying Truth Functions,” Am. Math. Monthly, Vol. 59, pp. 102–111.
Article MathSciNet Google Scholar
Quine, W., (1955), “A Way to Simplify Truth Functions,” Am. Math. Monthly, Vol. 62.
Google Scholar
Quinlan, J.R., (1986), “Induction of Decision Trees,” Machine Learning, Vol. 1, No. 1, pp. 81–106.
Google Scholar
Quinlan, J.R., (1979), “Discovering Rules by Induction from Large Numbers of Examples: A Case Study,” D. Michie (ed.), Expert Systems in the Micro-Electronic Age. Edinburgh University Press, Scotland, UK.
Google Scholar
Kamath, A.P., N.K. Karmakar, K.G. Ramakrishnan, and M.G.C. Resende, (1992), “A Continuous Approach to Inductive Inference,” Math. Progr., Vol. 57, pp. 215–238.
Article MATH Google Scholar
Kamath, A.P., N.K. Karmakar, K.G. Ramakrishnan, and M.G.C. Resende, (1994), “An Interior Point Approach to Boolean Vector Synthesis,” Proceedings of the 36-th MSCAS, pp. 1–5.
Google Scholar
Kamgar-Parsi, B. and L.N. Kanal, (1985), “An Improved Branch-And-Bound Algorithm for Computing k-Nearest Neighbors,” Pattern Recognition Letters, Vol. 3 pp. 7–12.
Article Google Scholar
Karmakar, N.K., M.G.C. Resende, and K.G. Ramakrishnan, (1992), “An Interior Point Algorithm to Solve Computationally Difficult Set Covering Problems,” Math. Progr, Vol. 52, pp. 597–618.
Article Google Scholar
Karnaugh, M., (1953), “The Map Method for Synthesis of Combinatorial Logic Circuits,” Transactions of the AIEE, Communications and Electronics, Vol. 72, pp. 593–599.
MathSciNet Google Scholar
Kearns, M., M. Li, L. Pitt, and L.G. Valiant, (1987), “On the Learnability of Boolean Formulae,” Journal of the Association for Computing Machinery, No. 9, pp. 285–295.
Google Scholar
Kovalerchuk, B., E. Triantaphyllou, J.F. Ruiz, V.I. Torvik, and E. Vityaev, (2000), “The Reliability Issue of Computer-Aided Breast Cancer Diagnosis,” Computers and Biomedical Research, Vol. 33, No. 4, August, pp. 296–313.
Article Google Scholar
Kurita, T., (1991), “An Efficient Agglomerative Clustering Algorithm Using a Heap,” Pattern Recognition, Vol. 24, No. 3, pp. 205–209.
Article MathSciNet Google Scholar
Mangasarian, O.L., W.N. Street, and W.H. Woldberg, (1995), “Breast Cancer Diagnosis and Prognosis Via Linear Programming,” Operations Research, Vol. 43, No. 4, pp. 570–577.
Article MathSciNet MATH Google Scholar
Mangasarian, O.L., R. Setiono, and W.H. Woldberg, (1991), “Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis,” Large-Scale Numerical Optimization, T.F. Coleman, and Y. Li, (Eds.), SIAM, pp. 22–30.
Google Scholar
Mansour, Y., (1992), “Learning of DNF Formulas,” Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 53–59.
Google Scholar
McCluskey, E., (1956), “Minimization of Boolean Functions,” Bell Syst. Tech. J., Vol. 35, pp. 1417–1444.
MathSciNet Google Scholar
Motwani, R, and P. Raghavan, (1995), Randomized Algorithms, Cambridge University Press, 1995.
Google Scholar
Nieto Sanchez, S., E. Triantaphyllou, J. Chen, and T.W. Liao, (2002), “An Incremental Learning Algorithm for Constructing Boolean Functions From Positive and Negative Examples,” Computers and Operations Research, Vol. 29, No. 12, pp. 177–1700.
Article MathSciNet Google Scholar
Nieto Sanchez, S., E. Triantaphyllou, and D. Kraft, (2002), “A Feature Mining Approach for the Classification of Text Documents Into Disjoint Classes,” Information Processing and Management, Vol. 38, No. 4, pp. 583–604.
Article MATH Google Scholar
Pappas, N.L, (1994), Digital Design, West Publishing Co., Minneapolis/St. Paul, MN, U.S.A.
Google Scholar
Pardalos, P.M. and J. Xue, (1994), “The Maximum Clique Problem,” Journal of Global Optimization, Vol. 4, pp. 301–328.
Article MathSciNet MATH Google Scholar
Pardalos, P.M. and C.S. Rentala, (1990), “Computational Aspects of a Parallel Algorithm to Find the Connected Components of a Graph,” Technical Report, Dept. of Computer Science, Pennsylvania State University, PA, U.S.A.
Google Scholar
Peysakh, J., (1987), “A Fast Algorithm to Convert Boolean Expressions into CNF,” IBM Comp. Sci. RC12913 (#57971)
Google Scholar
Watson, NY. Pitt, L. and L.G. Valiant, (1988), “Computational Limitations on Learning from Examples,” Journal of the Association for Computing Machinery, Vol. 35, No. 4, pp. 965–984.
MathSciNet Google Scholar
Rivest, R.L., (1987), “Learning Decision Trees,” Machine Learning, Vol. 2, No. 3, pp. 229–246.
Google Scholar
Shavlik, J.W., (1994), “Combining Symbolic and Neural Learning,” Machine Learning, Vol. 14, pp. 321–331.
Google Scholar
Sun, R. and F. Alexandre (Eds.), (1997), “Connectionist-Symbolic Integration: From Unified to Hybrid Approaches,” Lawrence Erilbaum Associates, Publishers, Mahwah, NJ, U.S.A.
Google Scholar
Torvik, V.I., and E. Triantaphyllou, (2006), “Discovering Rules that Govern Monotone Phenomena,” Chapter 4 in: “Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques,” Triantaphyllou, E. and G. Felici (Eds.), Massive Computing Series, Springer, Heidelberg, Germany, pp. 149–192.
Chapter Google Scholar
Towell, G., J. Havlic, and M. Noordewier, (1990), “Refinement Approximate Domain Theories by Knowledge-Based Neural Networks,” Proceedings of the AAAI’ 90 Conference, Boston, MA, U.S.A., pp. 861–866.
Google Scholar
Triantaphyllou, E., (2006), “Data Mining and Knowledge Discovery Via a Logic-Based Approach,” Massive Computing Series, Springer, Heidelberg, Germany.
Google Scholar
Triantaphyllou, E., and A.L. Soyster, (1996), “On the Minimum Number of Logical Clauses Which Can be Inferred From Examples,” Computers and Operations Research, Vol. 23, No. 8, pp. 783–799.
Article MathSciNet MATH Google Scholar
Triantaphyllou, E., and A.L. Soyster, (1995a), “A Relationship Between CNF and DNF Systems Derivable from Examples,” ORSA Journal on Computing, Vol. 7, No. 3, pp. 283–285.
MATH Google Scholar
Triantaphyllou, E., and A.L. Soyster, (1995b), “An Approach to Guided Learning of Boolean Functions,” Mathematical and Computer Modeling, Vol. 23, No. 3, pp. 69–86.
Article Google Scholar
Triantaphyllou, E., (1994), “Inference of A Minimum Size Boolean Function From Examples by Using A New Efficient Branch-and-Bound Approach,” Journal of Global Optimization, Vol. 5, No. 1, pp. 69–94.
Article MathSciNet MATH Google Scholar
Triantaphyllou, E., A.L. Soyster, and S.R.T. Kumara, (1994), “Generating Logical Expressions From Positive and Negative Examples Via a Branch-and-Bound Approach,” Computers and Operations Research, Vol. 21, No. 2, pp. 185–197.
Article MATH Google Scholar
Truemper, K., (2004), Design of Logic-based Intelligent Systems, John Wiley & Sons, Inc., New York, NY, U.S.A.
Book MATH Google Scholar
Truemper, K., (1998), Effective Logic Computation, Wiley-Interscience, New York, NY, U.S.A.
MATH Google Scholar
Woldberg, W.W., and O.L. Mangasarian, (1990), “A Multisurface Method of Pattern Separation for Medical Diagnosis Applied to Breast Cytology,” Proceedings of the National Academy of Sciences of the USA, Vol. 87, No. 23, pp. 9193–9196.
Article Google Scholar
Valiant, L.G., (1984), “A Theory of the Learnable,” Comm. of ACM, Vol. 27, No. 11, pp. 1134–1142.
Article MATH Google Scholar
Valiant, L.G., (1985), “Learning Disjunctions of Conjunctives,” Proceedings of the 9th IJCAI, pp. 560–566.
Google Scholar
Yilmaz, E., E. Triantaphyllou, J. Chen, and T.W. Liao, (2003), “A Heuristic for Mining Association Rules In Polynomial Time,” Mathematical and Computer Modelling, No. 37, pp. 219–233.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Louisiana State University, 298 Coates Hall, Baton Rouge, LA, 70803, USA
Evangelos Triantaphyllou

Authors

Evangelos Triantaphyllou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Louisiana State University, Baton Rouge, Louisiana, USA
Evangelos Triantaphyllou
Consiglio Nazionale delle Ricerche, Rome, Italy
Giovanni Felici

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Triantaphyllou, E. (2006). The One Clause at a Time (OCAT) Approach to Data Mining and Knowledge Discovery. In: Triantaphyllou, E., Felici, G. (eds) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing, vol 6. Springer, Boston, MA . https://doi.org/10.1007/0-387-34296-6_2

Download citation

DOI: https://doi.org/10.1007/0-387-34296-6_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34294-8
Online ISBN: 978-0-387-34296-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics