Skip to main content

Enhanced Prediction for Piezophilic Protein by Incorporating Reduced Set of Amino Acids Using Fuzzy-Rough Feature Selection Technique Followed by SMOTE

  • Conference paper
  • First Online:
Mathematics and Computing (ICMC 2018)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 253))

Included in the following conference series:

Abstract

In this paper, the learning performance of different machine learning algorithms is investigated by applying fuzzy-rough feature selection (FRFS) technique on optimally balanced training and testing sets, consisting of the piezophilic and nonpiezophilic proteins. By experimenting using FRFS technique followed by Synthetic Minority Over-sampling Technique (SMOTE) at optimal balancing ratios, we obtain the best results by achieving sensitivity of 79.60%, specificity of 74.50%, average accuracy of 77.10%, AUC of 0.841, and MCC of 0.542 with random forest algorithm. The ranking of input features according to their differentiating ability of piezophilic and nonpiezophilic proteins is presented by using fuzzy-rough attribute evaluator. From the results, it is observed that the performance of classification algorithms can be improved by selecting the reduced optimally balanced training and testing sets. This can be obtained by selecting the relevant and non-redundant features from training sets using FRFS approach followed by suitably modifying the class distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baldi, P., Brunak, S.: Bioinformatics: The Machine Learning approach. MIT press (2001)

    Google Scholar 

  2. Breiman, L.: Random Forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  3. Chawla, N.V.: Data Mining for Imbalanced Datasets: An Overview. Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer (2009)

    Google Scholar 

  4. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  5. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)

    Article  Google Scholar 

  6. Dubois, D., Prade, H.: Putting Rough Sets and Fuzzy Sets Together Intelligent Decision Support, pp. 203–232. Springer (1992)

    Google Scholar 

  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009)

    Article  Google Scholar 

  8. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  9. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    MATH  Google Scholar 

  10. Jensen, R., Shen, Q.: Fuzzy rough attribute reduction with application to web categorization. Fuzzy Sets Syst. 141(3), 469–485 (2004a)

    Article  MathSciNet  Google Scholar 

  11. Jensen, R., Shen, Q.: Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches. IEEE Trans. Knowl. Data Eng. 16(12), 1457–1471 (2004b)

    Article  Google Scholar 

  12. Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15(1), 73–89 (2007)

    Article  Google Scholar 

  13. Jensen, R., Shen, Q.: Computational Intelligence and Feature Selection: Rough and Fuzzy Approaches, Vol. 8. Wiley (2008)

    Google Scholar 

  14. Langley, P.: Selection of relevant features in machine learning. Paper presented at the Proceedings of the AAAI Fall Symposium on Relevance

    Google Scholar 

  15. Lee, P.H.: Resampling methods improve the predictive power of modeling in class-imbalanced datasets. Int. J. Environ. Res. Public Health 11(9), 9776–9789

    Article  Google Scholar 

  16. Li, H., Pi, D., Wang, C.: The prediction of protein-protein interaction sites based on RBF classifier improved by SMOTE. Math. Prob, Eng (2014)

    Google Scholar 

  17. Ling, C., Huang, J., Zhang, H.: AUC: a better measure than accuracy in comparing learning algorithms. Adv. Artif. Intell. 991–991 (2003)

    Google Scholar 

  18. Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective, vol. 453. Springer Science and Business Media (1998)

    Google Scholar 

  19. Lusa, L.: SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. 14(1), 106 (2013)

    Article  Google Scholar 

  20. Nath, A., Chaube, R., Karthikeyan, S.: Discrimination of psychrophilic and mesophilic proteins using random forest algorithm. Paper presented at the 2012 International Conference on Biomedical Engineering and Biotechnology (iCBEB) (2012)

    Google Scholar 

  21. Nath, A., Karthikeyan, S.: Enhanced prediction and characterization of CDK inhibitors using optimal class distribution. Interdisc. Sci. Comput. Life Sci. 9(2), 292–303 (2017)

    Article  Google Scholar 

  22. Nath, A., Subbiah, K.: Inferring biological basis about psychrophilicity by interpreting the rules generated from the correctly classified input instances by a classifier. Comput. Biol. Chem. 53, 198–203 (2014)

    Article  Google Scholar 

  23. Nath, A., Subbiah, K.: Maximizing lipocalin prediction through balanced and diversified training set and decision fusion. Comput. Biol. Chem. 59, 101–110 (2015)

    Article  Google Scholar 

  24. Nath, A., Subbiah, K.: Insights into the molecular basis of piezophilic adaptation: extraction of piezophilic signatures. J. Theoret. Biol. 390, 117–126 (2016)

    Article  MathSciNet  Google Scholar 

  25. Okun, O.: Feature Selection and Ensemble Methods for Bioinformatics: Algorithmic Classification and Implementations. Information Science Reference-Imprint of IGI Publishing (2011)

    Google Scholar 

  26. Pawlak, Z.: Rough sets. Int. J. Parallel. Program. 11(5), 341–356 (1982)

    MATH  Google Scholar 

  27. Platt, J.: Sequential minimal optimization: a fast algorithm for training support vector machines (1998)

    Google Scholar 

  28. Prompramote, S., Chen, Y., Chen, Y.-P.P.: Machine learning in bioinformatics. In: Chen, Y.-P.P. (ed.) Bioinformatics Technologies, pp. 117–153. Springer, Berlin Heidelberg, Berlin, Heidelberg (2005)

    Chapter  Google Scholar 

  29. Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006)

    Article  Google Scholar 

  30. Ruck, D.W., Rogers, S.K., Kabrisky, M., Oxley, M.E., Suter, B.W.: The multilayer perceptron as an approximation to a bayes optimal discriminant function. IEEE Trans. Neural Netw. 1(4), 296–298 (1990)

    Article  Google Scholar 

  31. Tiwari, A.K., Nath, A., Subbiah, K., Shukla, K.K.: Effect of varying degree of resampling on prediction accuracy for observed peptide count in protein mass spectrometry data. Paper presented at the 2015 11th International Conference on Natural Computation (ICNC) (2015)

    Google Scholar 

  32. Tiwari, A.K., Nath, A., Subbiah, K., Shukla, K.K.: Enhanced prediction for observed peptide count in protein mass spectrometry data by optimally balancing the training dataset. Int. J. Pattern Recogn. Artif. Intell. 1750040 (2017)

    Google Scholar 

  33. Vani, K.S., Bhavani, S.D.: SMOTE based protein fold prediction classification. In: Advances in Computing and Information Technology, pp. 541–550. Springer (2013)

    Google Scholar 

  34. Wang, L., Fu, X.: Data Mining with Computational Intelligence. Springer Science and Business Media (2006)

    Google Scholar 

  35. Weiss, G.M., Provost, F.: The effect of class distribution on classifier learning: an empirical study. Rutgers Univ (2001)

    Google Scholar 

  36. Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)

    Article  Google Scholar 

  37. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shivam Shreevastava .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tiwari, A.K., Shreevastava, S., Subbiah, K., Som, T. (2018). Enhanced Prediction for Piezophilic Protein by Incorporating Reduced Set of Amino Acids Using Fuzzy-Rough Feature Selection Technique Followed by SMOTE. In: Ghosh, D., Giri, D., Mohapatra, R., Sakurai, K., Savas, E., Som, T. (eds) Mathematics and Computing. ICMC 2018. Springer Proceedings in Mathematics & Statistics, vol 253. Springer, Singapore. https://doi.org/10.1007/978-981-13-2095-8_15

Download citation

Publish with us

Policies and ethics