Feature Dimensionality Reduction for Mammographic Report Classification

  • Luca Agnello
  • Albert Comelli
  • Salvatore VitabileEmail author
Part of the Computer Communications and Networks book series (CCN)


The amount and the variety of available medical data coming from multiple and heterogeneous sources can inhibit analysis, manual interpretation, and use of simple data management applications. In this paper a deep overview of the principal algorithms for dimensionality reduction is carried out; moreover, the most effective techniques are applied on a dataset composed of 4461 mammographic reports is presented. The most useful medical terms are converted and represented using a TF-IDF matrix, in order to enable data mining and retrieval tasks. A series of query have been performed on the raw matrix and on the same matrix after the dimensionality reduction obtained using the most useful techniques, such as LSI, PCA, and SVD. The obtained query results are comparable to the results achieved using the raw unprocessed matrix, where the processed matrix contains less than 13 % of the raw TF-IDF data using PCA-LSI techniques and less than 6 % of the raw TF-IDF data using SVD technique.


Principal Component Analysis Dimensionality Reduction Singular Value Decomposition Cosine Similarity Latent Semantic Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Fayyad, U.M., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining, vol. 21. AAAI Press Menlo Park (1996)Google Scholar
  2. 2.
    Koh, H.C., Tan, G.: Data mining applications in healthcare. J. Healthc. Inform. Manage. 19(2), 65 (2011)Google Scholar
  3. 3.
    Farruggia, A., Magro, R., Vitabile, S.: Bayesian network based classification of mammography structured reports. In: 2013 International Conference on Computer Medical Applications (ICCMA), pp. 1–5. IEEE (2013)Google Scholar
  4. 4.
    Duan, L., Street, W.N., Xu, E.: Healthcare information systems: data mining methods in the creation of a clinical recommender system. Enterp. Inform. Syst. 5(2), 169–181 (2011)CrossRefGoogle Scholar
  5. 5.
    Farruggia, A., Magro, R., Vitabile, S.: A text based indexing system for mammographic image retrieval and classification. Future Gener. Comput. Syst. 37, 243–251 (2014)Google Scholar
  6. 6.
    Agnello, L., Comelli, A., Ardizzone, E., Vitabile, S.: Unsupervised tissue classification of brain MR images for voxel-based morphometry analysis. Int. J. Imaging Syst. Technol. 26(2), 136–150 (2016)Google Scholar
  7. 7.
    Farruggia, A., Magro, R., Vitabile, S.: A novel web service for mammography images indexing. In: 2013 27th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 225–230. IEEE (2013)Google Scholar
  8. 8.
    Anchala, R., Pant, H., Prabhakaran, D., Franco, O. H.: Decision support system (DSS) for prevention of cardiovascular disease (CVD) among hypertensive (HTN) patients in Andhra Pradesh, India—a cluster randomised community intervention trial. BMC Public Health 12(1), 1 (2012)Google Scholar
  9. 9.
    Comelli, A., Agnello, L., Vitabile, S.: An ontology-based retrieval system for mammographic reports. In: 2015 IEEE Symposium on Computers and Communication (ISCC), pp. 1001–1006. IEEE (2015)Google Scholar
  10. 10.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391 (1990)Google Scholar
  11. 11.
    Golub, G.H., Van Loan, C.F.: Matrix computations, vol. 3. JHU Press (2012)Google Scholar
  12. 12.
    Yang, Q., Li, F.: Support vector machine for intrusion detection based on LSI feature selection. In: 2006 6th World Congress on Intelligent Control and Automation, vol. 1, pp. 4113–4117. IEEE (2006)Google Scholar
  13. 13.
    Muflikhah, L., Baharudin, B.: Document clustering using concept space and cosine similarity measurement. In: International Conference on Computer Technology and Development, 2009. ICCTD’09, vol. 1, pp. 58–62. IEEE (2009)Google Scholar
  14. 14.
    Lin, P., Zhang, J., An, R.: Data dimensionality reduction approach to improve feature selection performance using sparsified SVD. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 1393–1400. IEEE (2014)Google Scholar
  15. 15.
    Zhu, W., Allen, R.B.: Active learning for text classification: Using the LSI subspace signature model. In: 2014 International Conference on Data Science and Advanced Analytics (DSAA), pp. 149–155. IEEE (2014)Google Scholar
  16. 16.
    Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse process. 25(2–3), 259–284 (1998)Google Scholar
  17. 17.
    Jolliffe, I.: Principal component analysis. John Wiley & Sons, Ltd (2002)Google Scholar
  18. 18.
    Wall, M.E., Rechtsteiner, A., Rocha, L.M.: Singular value decomposition and principal component analysis. In: A Practical Approach to Microarray Data Analysis, pp. 91–109. Springer, US (2003)Google Scholar
  19. 19.
    Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numer. Math. 14(5), 403–420 (1970)Google Scholar
  20. 20.
    Gorrell, G.: Generalized hebbian algorithm for incremental singular value decomposition in natural language processing. In: EACL, vol. 6, pp. 97–104 (2006)Google Scholar
  21. 21.
    Saunders, M.A.: Large-scale linear programming using the Cholesky factorization (1972)Google Scholar
  22. 22.
    O’Leary, D.P., Whitman, P.: Parallel QR factorization by Householder and modified Gram-Schmidt algorithms. Parallel Comput. 16(1), 99–112 (1990)Google Scholar
  23. 23.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Luca Agnello
    • 1
  • Albert Comelli
    • 1
  • Salvatore Vitabile
    • 1
    Email author
  1. 1.Department of Biopathology and Medical BiotechnologiesUniversity of PalermoPalermoItaly

Personalised recommendations