Data Farming: Concepts and Methods

Kusiak, Andrew

doi:10.1007/0-387-34296-6_8

Andrew Kusiak³

Part of the book series: Massive Computing ((MACO,volume 6))

1211 Accesses
1 Citations

Abstract

A typical data mining project uses data collected for various purposes, ranging from routinely gathered data, to process improvement projects, and to data required for archival purposes. In some cases, the set of considered features might be large (a wide data set) and sufficient for extraction of knowledge. In other cases the data set might be narrow and insufficient to extract meaningful knowledge or the data may not even exist.

Mining wide data sets has received attention in the literature, and many models and algorithms for feature selection have been developed for wide data sets.

Determining features for which data should be collected in the absence of an existing data set or when a data set is partially available has not been sufficiently addressed in the literature. Yet, this issue is of paramount importance as the interest in data mining is growing. The methods and process for the definition of the most appropriate features for data collection, data transformation, data quality assessment, and data analysis are referred to as data farming. This chapter outlines the elements of a data fanning discipline.

Triantaphyllou, E. and G. Felici (Eds.), Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, Massive Computing Series, Springer, Heidelberg, Germany, pp. 279–304, 2006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barry, MJ.A. and G. Linoff (1997), Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley, New York.
Google Scholar
Bloedorn, E. and R.S. Michalski (1998), Data-driven constructive induction, IEEE Intelligent Systems, Vol. 13, No. 2, pp. 30–37.
Article Google Scholar
Breiman, L., J.H. Friedman, R.A. Olshen, and P.J. Stone (1984), Classification and Regression Trees, Wadworth International Group, Belmont, CA.
MATH Google Scholar
Carlett, J. (1991), Megainduction: Machine Learning on Very Large Databases, Ph.D. Thesis, Department of Computer Science, University of Sydney, Australia.
Google Scholar
Caroll, J.M. and J. Olson (1987), Mental Models in Human-Computer Interaction: Research Issues About the User of Software Knows, National Academy Press, Washington, DC.
Google Scholar
Cattral, R., F. Oppacher, and D. Deugo (2001), Supervised and unsupervised data mining with an evolutionary algorithm, Proceedings of the 2001 Congress on Evolutionary Computation, IEEE Press, Piscataway, NJ, pp. 767–776.
Chapter Google Scholar
Cios, K., W. Pedrycz, and R. Swiniarski (1998), Data Mining: Methods for Knowledge Discovery, Kluwer, Boston, MA.
Google Scholar
Dugherty, D., R. Kohavi, and M. Sahami (1995), Supervised and unsupervised discretization of continuous features, Proceedings of the 12 ^th International Machine Learning Conference, pp. 194–202.
Google Scholar
Duda, R.O. and P.E. Hart (1973), Pattern Recognition and Scene Analysis, John Wiley, New York.
Google Scholar
Fayyad, U.M. and K.B. Irani (1993), Multi-interval discretization of continuously-valued attributes for classification learning, Proceedings of the 13 ^th International Joint Conference on Artificial Intelligence, pp. 1022–1027.
Google Scholar
Fukunaga, K. (1990), Introduction to Statistical Pattern Analysis, Academic Press, San Diego, CA.
Google Scholar
Han, J. and M. Kamber (2001), Data Mining: Concepts and Techniques, Morgan Kaufmann, San Diego, CA.
Google Scholar
John, G., R. Kohavi, and K. Pfleger (1994), Irrelevant features and the subset selection problem, Proceedings of the II ^th International Conference on Machine Learning, ICLM’94, Morgan Kaufmann, San Diego, CA, pp. 121–127.
Google Scholar
Kruchten, P. (2000), The Rational Unified Process: An Introduction, Addison-Wesley, New York, 2000.
Google Scholar
Kovacs, T. (2001), What should a classifier system learn, Proceedings of the 2001 Congress on Evolutionary Computation, IEEE Press, Piscataway, NJ, pp. 775–782.
Chapter Google Scholar
Kusiak, A. (1999), Engineering Design: Products, Processes, and Systems, Academic Press, San Diego, CA.
Google Scholar
Kusiak, A. (2000), Decomposition in data mining: an industrial case study, IEEE Transactions on Electronics Packaging Manufacturing, Vol. 23, No. 4, pp. 345–353.
Article Google Scholar
Kusiak, A., J.A. Kern, K.H. Kernstine, and T.L. Tseng (2000), Autonomous decision-making: A data mining approach, IEEE Transactions on Information Technology in Biomedicine, Vol. 4, No. 4, pp. 274–284.
Article Google Scholar
Kusiak, A. (2001), Feature transformation methods in data mining, IEEE Transactions on Electronics Packaging Manufacturing, Vol. 24, No. 3, 2001, pp. 214–221.
Article Google Scholar
Kusiak, A. (2002), A Data Mining Approach for Generation of Control Signatures, ASME Transactions: Journal of Manufacturing Science and Engineering, Vol. 124, No. 4, pp. 923–926.
Article Google Scholar
LINDO (2003), http://www.lindo.com (Accessed June 5, 2003).
Google Scholar
Pawlak Z. (1982), Rough sets, International Journal of Information and Computer Science, Vol. 11, No. 5, pp. 341–356.
Article MATH MathSciNet Google Scholar
Pawlak, Z. (1991), Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer, Boston, MA.
Google Scholar
Preparata, F.P. and Shamos, M.I. (1985), Pattern Recognition and Scene Analysis, Springer-Verlag, New York.
Google Scholar
Quinlan, J.R. (1986), Induction of decision trees, Machine Learning, Vol. 1, No 1, pp. 81–106.
Google Scholar
Ragel, A. and B. Cremilleux (1998), Treatment of missing values for association rules, Proceedings of the Second Pacific Asia Conference, PAKDD’ 98, Melbourne, Australia.
Google Scholar
Stone, M. (1974), Cross-validatory choice and assessment of statistical predictions, Journal of the Royal Statistical Society, Vol. 36, pp. 111–147.
MATH Google Scholar
Slowinski, R. (1993), Rough set learning of preferential attitude in multi-criteria decision making, in Komorowski, J. and Ras, Z. (Eds), Methodologies for Intelligent Systems, Springer-Verlag, Berlin, Germany, pp. 642–651.
Google Scholar
Tou, J.T. and R.C. Gonzalez (1974), Pattern Recognition Principles, Addison Wesley, New York.
MATH Google Scholar
Vafaie, H. and K. De Jong (1998), Feature space transformation using genetic algorithms, IEEE Intelligent Systems, Vol. 13, No. 2, pp. 57–65.
Article Google Scholar
Venables, W.N. and B.D. Ripley (1998), Modern Statistics with S-PLUS, Springer-Verlag, New York.
Google Scholar
Wickens, G., S.E. Gordon, and Y. Liu (1998), An Introduction to Human Factors Engineering, Harper Collins, New York.
Google Scholar
Wilson, S.W. (1995), Classifier fitness based on accuracy, Evolutionary Computation, Vol. 3, No. 2, pp. 149–175.
Google Scholar
Wnek, J. and R.S. Michalski (1994), Hypothesis-driven constructive induction in AQ17-HCI: A method and experiments, Machine Learning, Vol. 14, No, 2, pp. 139–168.
Article MATH Google Scholar
Yang, J. and V. Honavar (1998), Feature subset selection using a genetic algorithm, IEEE Intelligent Systems, Vol. 13, No. 2, pp. 44–49.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Systems Laboratory Mechanical and Industrial Engineering, The University of Iowa, 2139 Seamans Center, Iowa City, Iowa, 52242-1527
Andrew Kusiak

Authors

Andrew Kusiak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Louisiana State University, Baton Rouge, Louisiana, USA
Evangelos Triantaphyllou
Consiglio Nazionale delle Ricerche, Rome, Italy
Giovanni Felici

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kusiak, A. (2006). Data Farming: Concepts and Methods. In: Triantaphyllou, E., Felici, G. (eds) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing, vol 6. Springer, Boston, MA . https://doi.org/10.1007/0-387-34296-6_8

Download citation

DOI: https://doi.org/10.1007/0-387-34296-6_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34294-8
Online ISBN: 978-0-387-34296-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics