Abstract
Osteogenesis Imperfecta (OI) is a genetic collagenous disease caused by mutations in one or both of the genes COLIA1 and COLIA2. There are at least four known phenotypes of OI, of which type II is the severest and often lethal. We applied a noise correction mechanism called polishing to a data set of amino acid sequences and associated information of point mutations of COLIA1. Polishing makes use of the inter-relationship between attribute and class values in the data set to identify and selectively correct components that are noisy. Preliminary results suggest that polishing is a viable mechanism for improving data quality, resulting in a more accurate classification of the lethal OI phenotype.
This work was supported by NASA NCC2-1239 and ONR N00014-03-1-0516.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131–167 (1999)
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3(4), 261–283 (1989)
Domingos, P., Pazzani, M.: Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 105–112 (1996)
Drastal, G.: Informed pruning in constructive induction. In: Proceedings of the Eighth International Workshop on Machine Learning, pp. 132–136 (1991)
Gamberger, D., Lavrač, N., Džeroski, S.: Noise elimination in inductive concept learning: A case study in medical diagnosis. In: Proceedings of the Seventh International Workshop on Algorithmic Learning Theory, pp. 199–212 (1996)
Hunter, L., Klein, T.E.: Finding relevant biomolecular features. In: Proceedings of the International Conference on Intelligent Systems for Molecular Biology, pp. 190–197 (1993)
John, G.H.: Robust decision trees: Removing outliers from databases. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pp. 174–179 (1995)
Klein, T.E., Wong, E.: Neural networks applied to the collagenous disease osteogenesis imperfecta. In: Proceedings of the Hawaii International Conference on System Sciences, vol. I, pp. 697–705 (1992)
Kononenko, I.: Semi-naive Bayesian classifier. In: Proceedings of the Sixth European Working Session on Learning, pp. 206–219 (1991)
Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 223–228 (1992)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Mooney, S.D., Huang, C.C., Kollman, P.A., Klein, T.E.: Computed free energy differences between point mutations in a collagenlike peptide. Biopolymers 58, 347–353 (2001)
Ross Quinlan, J.: Simplifying decision trees. International Journal of Man-Machine Studies 27(3), 221–234 (1987)
Ross Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley & Sons, Chichester (1987)
Teng, C.M.: Correcting noisy data. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 239–248 (1999)
Teng, C.M.: Evaluating noise correction. In: Lecture Notes in Artificial Intelligence: Proceedings of the Sixth Pacific Rim International Conference on Artificial Intelligence, Springer, Heidelberg (2000)
Teng, C.M.: A comparison of noise handling techniques. In: Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference, pp. 269–273 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Teng, C.M. (2003). Noise Correction in Genomic Data. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-45080-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40550-4
Online ISBN: 978-3-540-45080-1
eBook Packages: Springer Book Archive