Abstract
It is well-recognized that medical datasets are often noisy and incomplete due to the difficulties in data collection and integration. Noise and incompleteness in medical data post substantial challenges for accurate classification. A differential latent semantic indexing (DLSI) approach which is an improvement of the standard LSI method has been proposed for information retrieval and demonstrated improved performance over standard LSI approach. The key idea is that DLSI adapts to the unique characteristics of individual record/document. By experimental results on real datasets, we show that DLSI outperforms the standard LSI method on noisy and incomplete medical datasets. The results strongly indicate that the DLSI approach is also capable of medical numerical data analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
T. A. Letsche, and M. W. Berry. Large-Scale Information Retrieval with Latent Semantic Indexing. Information Sciences-Applications, 100: 105–137, 1997.
M. W. Berry, Z. Drmac, and E. R. Jessup. Matrices, Vector Spaces, and Information Retrieval. SIAM Review, 41(2): 335–362, 1999.
L. Chen, N. Tokuda, and A. Nagai. A New Differential LSI Space-Based Probabilistic Document Classifier. Information Processing Letters, 88: 203–212, 2003.
O. L. Mangasarian and W. H. Wolberg. Cancer Diagnosis via Linear Programming. SIAM News, 23(5): 1–18, 1990.
K.P. Bennett and O. L. Mangasarian. Neural Network Training via Linear Programming. Elsevier Science, 1992.
I. Taha and J. Gosh. Characterization of the Wisconsin Breast Cancer Database Using a Hybrid Symbolic-Connectionist System. Tech. Report UT-CVISS-TR-97-007, the Computer and Vision Research Center, University of Texas, Austin, 1996.
R. Setiono. Extracting Rules from Pruned Neural Network for Breast Cancer Diagnosis. Artificial Intelligence in Medicine, 8: 37–51, 1996.
R. Setiono. Generating Concise and Accurate Classification Rules for Breast Cancer Diagnosis. Artificial Intelligence in Medicine, 18: 205–219, 2000.
W. H. Wolberg and O. L. Mangasarian. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87: 9193–9196, 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Chen, L., Zeng, J., Pei, J. (2007). Classifying Noisy and Incomplete Medical Data by a Differential Latent Semantic Indexing Approach. In: Pardalos, P.M., Boginski, V.L., Vazacopoulos, A. (eds) Data Mining in Biomedicine. Springer Optimization and Its Applications, vol 7. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69319-4_10
Download citation
DOI: https://doi.org/10.1007/978-0-387-69319-4_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-69318-7
Online ISBN: 978-0-387-69319-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)