Skip to main content

A Single Program Multiple Data Algorithm for Feature Selection

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 940))

Abstract

Feature selection is a critical component in data science and has been the topic of research for many years. Advances in hardware and the availability of better multiprocessing platforms have enabled parallel computing to reach very high levels of performance. Minimum Redundancy Maximum Relevance (mRMR) is a powerful feature selection technique used in many applications. In this paper, we present a novel optimized Single Program Multiple Data (SPMD) approach to implement the mRMR algorithm with synchronous computation, optimum load balancing and greater speedup than task-parallel approaches. The experimental results presented using multiple synthesized datasets prove the efficiency and scalability of the proposed technique over original mRMR.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005)

    Article  Google Scholar 

  2. Lu, D., Weng, Q.: A survey of image classification methods and techniques for improving classification performance. J. Remote Sens. 28, 823–870 (2007)

    Article  Google Scholar 

  3. Jain, A., Zongker, D.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. 19, 153–158 (1997)

    Article  Google Scholar 

  4. Bhattacharyya, C., et al.: Simultaneous relevant feature identification and classification in high-dimensional spaces: application to molecular profiling data. Spec. Issue Genomic Sig. Process. 83(4), 729–743 (2003)

    Article  Google Scholar 

  5. Alizadeh, A.A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)

    Article  Google Scholar 

  6. Thomas, J.G., et al.: An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res. 11, 1227–1236 (2001)

    Article  Google Scholar 

  7. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)

    Article  Google Scholar 

  8. Narendra, P., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 6, 917–922 (1977)

    Article  Google Scholar 

  9. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)

    Article  Google Scholar 

  10. Raymer, M.L., Punch, W.F., Goodman, E.D.: Dimensionality reduction using genetic algorithms. IEEE Trans. Evol. Comput. 4, 164–171 (2000)

    Article  Google Scholar 

  11. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, vol. IV, pp. 1942–1948 (1995)

    Google Scholar 

  12. Ververidis, D., Kotropoulos, C.: Sequential forward feature selection with low computational cost. In: 13th European Signal Processing Conference, Antalya, pp. 1–4 (2005)

    Google Scholar 

  13. LeKhac, N., Wu, B., Chen, C., Kechadi, M.T.: Feature selection parallel technique for remotely sensed imagery classification. In: Murgante, B. (eds.) Computational Science and Its Applications - ICCSA 2013. Lecture Notes in Computer Science, vol. 7972. Springer, Heidelberg (2013)

    Google Scholar 

  14. de Souza, J.T., Matwin, S., Japkowicz, N.: Parallelizing feature selection. Algorithmica 45(3), 433–456 (2006)

    Article  MathSciNet  Google Scholar 

  15. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3, 523–528 (2003)

    Google Scholar 

  16. Ramírez-Gallego, S., et al.: An information theory-based feature selection framework for big data under apache spark. IEEE Trans. Syst. Man Cybern.: Syst. PP(99), 1–13 (2017)

    Google Scholar 

  17. Ramírez-Gallego, S., et al.: Fast-mRMR: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int. J. Intell. Syst. 32, 134–152 (2017)

    Article  Google Scholar 

  18. Reggiani, C., et al.: Feature selection in high-dimensional dataset using MapReduce. In: BNCAI (2017)

    Google Scholar 

  19. Le-Khac, N.-A.: Studying the performance of overlapping communication and computation by active message: Inuktitut case. In: International Conference on Parallel and Distributed Computing and Network (PDCN 2006), 12–14 February 2006, Innsbruck, Austria

    Google Scholar 

  20. Ayguadé, E.: Is the schedule clause really necessary in OpenMP? In: Voss, M.J. (eds.) OpenMP Shared Memory Parallel Programming. WOMPAT (2003). Lecture Notes in Computer Science, vol. 2716. Springer, Heidelberg (2003)

    Google Scholar 

  21. Tick, E.: NGCO, 7, p. 325 (1990). https://doi.org/10.1007/BF03037210

  22. Alshamlan, H., Badr, G., Alohali, Y.: mRMR-ABC: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed Res Int., Article ID 604910, 49–60 (2015)

    Google Scholar 

  23. Enireddy, V., PhaniKumar, D.V.V.S., Kishore, G.: Application of fisher score and mRMR techniques for feature selection in compressed medical images. Int. J. Eng. Technol. (IJET) 7(6), 2109–2121 (2016)

    Google Scholar 

  24. Kaya, H., et al.: Random forests for laughter detection. In: Proceedings of Workshop on Affective Social Speech Signals-in Conjunction with the INTERSPEECH (2013)

    Google Scholar 

  25. Alomari, O.A., et al.: MRMR BA: a hybrid gene selection algorithm for cancer classification. J. Theor. Appl. Inf. Technol. 95(12), 1 (2017)

    Google Scholar 

  26. Li, Z., et al.: A parallel feature selection method study for text classification. Neural Comput. Appl. 28(1), 513–524 (2017)

    Article  MathSciNet  Google Scholar 

  27. Zhou, Y., Porwal, U., Zhang, C., Ngo, H.Q., Nguyen, X., Ré, C., Govindaraju, V.: Parallel feature selection inspired by group testing. In: Advances in Neural Information Processing Systems, pp. 3554–3562 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhabesh Chanduka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chanduka, B., Gangavarapu, T., Jaidhar, C.D. (2020). A Single Program Multiple Data Algorithm for Feature Selection. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2018 2018. Advances in Intelligent Systems and Computing, vol 940. Springer, Cham. https://doi.org/10.1007/978-3-030-16657-1_62

Download citation

Publish with us

Policies and ethics