Abstract
Data mining is a generic term given to the process of analysing data, usually high volumes of data contained in large databases, in order to discover previously unknown patterns and trends. Data mining utilizes and combines methods from statistics, machine learning, pattern recognition and database management. Typical data mining tasks involve detecting data subsets that are similar in some way, unusual or anomalous or have features that are associated or dependent. Although data mining is not traditionally focused on the development of predictive models that generalise known patterns to new (unseen) data, this is often an extremely valuable way of verifying the efficacy of the derived models. That is, if these models accurately predict unseen data, then is more likely that these models truly represent the underlying patterns in the data rather than being purely by-chance occurrences. In addition, in biomedical applications the purpose of the data mining is often to better understand the patterns of disease so that improved diagnoses, prognoses and treatments can be developed in the future.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159
Bradley AP, Longstaff ID (2004) Sample size estimation using the receiver operating characteristic curve. In: International Conference on Pattern Recognition, Cambridge, vol. 4, pp. 428–431
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, Belmont, CA
Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1–2):31–71
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York, NY
Efron B (1982) The jackknife, the bootstrap, and other resampling plans. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874
Friedman JH. (1995) Introduction to computational learning and statistical prediction. In: Tutorial at the International Conference on Machine Learning, Lake Tahoe, CA
Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. New York, NY, Academic
Hand DJ (1981) Discrimination and classification. Wiley, Chichester
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:601–611
Landgrebe TCW, Duin RPW (2008) Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis. IEEE Trans Pattern Anal Mach Intell 30(5):810–822
Landgrebe TCW, Paclik P, Duin RPW, Bradley AP (2006) Precision-recall operating characteristic curves in imprecise environments. In: International Conference on Pattern Recognition, Hong Kong, vol. 4, pp. 123–127
McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition. Wiley, New York, NY
Old F (1993) Inventions, patents, brands and designs. Patent Press, Sydney
Preston N (1996) Understanding ethics. Federation Press, Sydney
Seltzer W (2005) The promise and pitfalls of data mining: ethical issues. In Proceedings of the American Statistical Association, Section on Government Statistics, Alexandria, VA: American Statistical Association, pp. 1441–1445
Swets JA, Dawes RM, Monahan J (2000) Better Decisions Through Science. Scientific American, pp. 82–87
Swets JA, Pickets RM (1982) Evaluation of Diagnostic Systems: Methods from Signal Detection Theory, Academic Press, New York
Vandewalle P, Kovacevic J, Vetterli M (2009) Reproducible research in signal processing. IEEE Signal Process Mag 26(3):37–47
Weiss S, Kulikowski C (1991) Computer systems that learn: classification and prediction methods from statistics, neural networks, machine learning, and expert systems. Morgan Kaufmann, San Mateo, CA
Zar JH (1998) Biostatistical analysis, 4th edn. Prentice-Hall, Upper Saddle River, NJ
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Bradley, A.P. (2013). Ethics and Data Mining in Biomedical Engineering. In: Ethics for Biomedical Engineers. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6913-1_5
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6913-1_5
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6912-4
Online ISBN: 978-1-4614-6913-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)