Abstract
A typical data mining project uses data collected for various purposes, ranging from routinely gathered data, to process improvement projects, and to data required for archival purposes. In some cases, the set of considered features might be large (a wide data set) and sufficient for extraction of knowledge. In other cases the data set might be narrow and insufficient to extract meaningful knowledge or the data may not even exist.
Mining wide data sets has received attention in the literature, and many models and algorithms for feature selection have been developed for wide data sets.
Determining features for which data should be collected in the absence of an existing data set or when a data set is partially available has not been sufficiently addressed in the literature. Yet, this issue is of paramount importance as the interest in data mining is growing. The methods and process for the definition of the most appropriate features for data collection, data transformation, data quality assessment, and data analysis are referred to as data farming. This chapter outlines the elements of a data fanning discipline.
Triantaphyllou, E. and G. Felici (Eds.), Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, Massive Computing Series, Springer, Heidelberg, Germany, pp. 279–304, 2006.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barry, MJ.A. and G. Linoff (1997), Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley, New York.
Bloedorn, E. and R.S. Michalski (1998), Data-driven constructive induction, IEEE Intelligent Systems, Vol. 13, No. 2, pp. 30–37.
Breiman, L., J.H. Friedman, R.A. Olshen, and P.J. Stone (1984), Classification and Regression Trees, Wadworth International Group, Belmont, CA.
Carlett, J. (1991), Megainduction: Machine Learning on Very Large Databases, Ph.D. Thesis, Department of Computer Science, University of Sydney, Australia.
Caroll, J.M. and J. Olson (1987), Mental Models in Human-Computer Interaction: Research Issues About the User of Software Knows, National Academy Press, Washington, DC.
Cattral, R., F. Oppacher, and D. Deugo (2001), Supervised and unsupervised data mining with an evolutionary algorithm, Proceedings of the 2001 Congress on Evolutionary Computation, IEEE Press, Piscataway, NJ, pp. 767–776.
Cios, K., W. Pedrycz, and R. Swiniarski (1998), Data Mining: Methods for Knowledge Discovery, Kluwer, Boston, MA.
Dugherty, D., R. Kohavi, and M. Sahami (1995), Supervised and unsupervised discretization of continuous features, Proceedings of the 12 th International Machine Learning Conference, pp. 194–202.
Duda, R.O. and P.E. Hart (1973), Pattern Recognition and Scene Analysis, John Wiley, New York.
Fayyad, U.M. and K.B. Irani (1993), Multi-interval discretization of continuously-valued attributes for classification learning, Proceedings of the 13 th International Joint Conference on Artificial Intelligence, pp. 1022–1027.
Fukunaga, K. (1990), Introduction to Statistical Pattern Analysis, Academic Press, San Diego, CA.
Han, J. and M. Kamber (2001), Data Mining: Concepts and Techniques, Morgan Kaufmann, San Diego, CA.
John, G., R. Kohavi, and K. Pfleger (1994), Irrelevant features and the subset selection problem, Proceedings of the II th International Conference on Machine Learning, ICLM’94, Morgan Kaufmann, San Diego, CA, pp. 121–127.
Kruchten, P. (2000), The Rational Unified Process: An Introduction, Addison-Wesley, New York, 2000.
Kovacs, T. (2001), What should a classifier system learn, Proceedings of the 2001 Congress on Evolutionary Computation, IEEE Press, Piscataway, NJ, pp. 775–782.
Kusiak, A. (1999), Engineering Design: Products, Processes, and Systems, Academic Press, San Diego, CA.
Kusiak, A. (2000), Decomposition in data mining: an industrial case study, IEEE Transactions on Electronics Packaging Manufacturing, Vol. 23, No. 4, pp. 345–353.
Kusiak, A., J.A. Kern, K.H. Kernstine, and T.L. Tseng (2000), Autonomous decision-making: A data mining approach, IEEE Transactions on Information Technology in Biomedicine, Vol. 4, No. 4, pp. 274–284.
Kusiak, A. (2001), Feature transformation methods in data mining, IEEE Transactions on Electronics Packaging Manufacturing, Vol. 24, No. 3, 2001, pp. 214–221.
Kusiak, A. (2002), A Data Mining Approach for Generation of Control Signatures, ASME Transactions: Journal of Manufacturing Science and Engineering, Vol. 124, No. 4, pp. 923–926.
LINDO (2003), http://www.lindo.com (Accessed June 5, 2003).
Pawlak Z. (1982), Rough sets, International Journal of Information and Computer Science, Vol. 11, No. 5, pp. 341–356.
Pawlak, Z. (1991), Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer, Boston, MA.
Preparata, F.P. and Shamos, M.I. (1985), Pattern Recognition and Scene Analysis, Springer-Verlag, New York.
Quinlan, J.R. (1986), Induction of decision trees, Machine Learning, Vol. 1, No 1, pp. 81–106.
Ragel, A. and B. Cremilleux (1998), Treatment of missing values for association rules, Proceedings of the Second Pacific Asia Conference, PAKDD’ 98, Melbourne, Australia.
Stone, M. (1974), Cross-validatory choice and assessment of statistical predictions, Journal of the Royal Statistical Society, Vol. 36, pp. 111–147.
Slowinski, R. (1993), Rough set learning of preferential attitude in multi-criteria decision making, in Komorowski, J. and Ras, Z. (Eds), Methodologies for Intelligent Systems, Springer-Verlag, Berlin, Germany, pp. 642–651.
Tou, J.T. and R.C. Gonzalez (1974), Pattern Recognition Principles, Addison Wesley, New York.
Vafaie, H. and K. De Jong (1998), Feature space transformation using genetic algorithms, IEEE Intelligent Systems, Vol. 13, No. 2, pp. 57–65.
Venables, W.N. and B.D. Ripley (1998), Modern Statistics with S-PLUS, Springer-Verlag, New York.
Wickens, G., S.E. Gordon, and Y. Liu (1998), An Introduction to Human Factors Engineering, Harper Collins, New York.
Wilson, S.W. (1995), Classifier fitness based on accuracy, Evolutionary Computation, Vol. 3, No. 2, pp. 149–175.
Wnek, J. and R.S. Michalski (1994), Hypothesis-driven constructive induction in AQ17-HCI: A method and experiments, Machine Learning, Vol. 14, No, 2, pp. 139–168.
Yang, J. and V. Honavar (1998), Feature subset selection using a genetic algorithm, IEEE Intelligent Systems, Vol. 13, No. 2, pp. 44–49.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Kusiak, A. (2006). Data Farming: Concepts and Methods. In: Triantaphyllou, E., Felici, G. (eds) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing, vol 6. Springer, Boston, MA . https://doi.org/10.1007/0-387-34296-6_8
Download citation
DOI: https://doi.org/10.1007/0-387-34296-6_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34294-8
Online ISBN: 978-0-387-34296-2
eBook Packages: Computer ScienceComputer Science (R0)