Skip to main content

Advertisement

Log in

Medical data mining in sentiment analysis based on optimized swarm search feature selection

  • Special Issue Article
  • Published:
Australasian Physical & Engineering Sciences in Medicine Aims and scope Submit manuscript

Abstract

In this paper, we propose a novel technique termed as optimized swarm search-based feature selection (OS-FS), which is a swarm-type of searching function that selects an ideal subset of features for enhanced classification accuracy. In terms of gaining insights from unstructured medical based texts, sentiment prediction is becoming an increasingly crucial machine learning technique. In fact, due to its robustness and accuracy, it recently gained popularity in the medical industries. Medical text mining is well known as a fundamental data analytic for sentiment prediction. To form a high-dimensional sparse matrix, a popular preprocessing step in text mining is employed to transform medical text strings to word vectors. However, such a sparse matrix poses problems to the induction of accurate sentiment prediction model. The swarm search in our proposed OS-FS can be optimized by a new feature evaluation technique called clustering-by-coefficient-of-variation. In order to find a subset of features from all the original features from the sparse matrix, this type of feature selection has been a commonly utilized dimensionality reduction technique, and has the capability to improve accuracy of the prediction model. We implement this method based on a case scenario where 279 medical articles related to ‘meaningful use functionalities on health care quality, safety, and efficiency’ from a systematic review of previous medical IT literature. For this medical text mining, a multi-class of sentiments, positive, mixed-positive, neutral and negative is recognized from the document contents. Our experimental results demonstrate the superiority of OS-FS over traditional feature selection methods in literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://www.nlm.nih.gov/bsd/num_titles.html.

  2. http://wiki.pentaho.com/display/DATAMINING/StringToWordVector.

  3. http://hitconsultant.net/2014/03/05/onc-releases-report-effects-meaningful-use-functionalities-healthcare-quality-safety-efficiency/.

References

  1. Lakshminarayan CK (2013) High dimensional big data and pattern analysis: a tutorial. In: Bhatnagar V, Srinivasa S (eds) Big data analytics, Lecture Notes in Computer Science, Springer, Cham. https://doi.org/10.1007/978-3-319-03689-2_5

    Chapter  Google Scholar 

  2. Yusta SC (2009) Different metaheuristic strategies to solve the feature selection problem. Pattern Recognit Lett 30(5):525–534. https://doi.org/10.1016/j.patrec.2008.11.012

    Article  Google Scholar 

  3. Fong S, Deb S, Yang XS, Li J (2014) Feature selection in life science classification: metaheuristic swarm search. IEEE IT Prof 16(4):24–29. https://doi.org/10.1109/MITP.2014.50

    Article  Google Scholar 

  4. Tsamardinos I, Aliferis CF, Statnikov A (2003) Time and sample efficient discovery of markov blankets and direct causal relations. In Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, ACM Press, pp. 673–678

  5. Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data, IEEE Trans Knowl Data Eng 25(1):1–14. https://doi.org/10.1109/TKDE.2011.181

    Article  CAS  Google Scholar 

  6. Baris S (2008) Fast correlation based filter (FCBF) with a different search strategy. In Proceedings of 23rd international symposium on computer and information sciences, IEEE, Oct. 2008, pp. 1–4

  7. Hall MA, Smith LA (1999) Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In Proceedings of the 12th international florida artificial intelligence research society conference, pp. 235–239

  8. Fong S, Deb S, Yang X-S, Li J (2014) Metaheuristic swarm search for feature selection in life science classification. IEEE IT Prof 16(4):24–29

    Article  Google Scholar 

  9. Fong S, Liang J, Wong R, Ghanavati M (2014) A novel feature selection by clustering coefficients of variations. In: 2014 ninth international conference on digital information management (ICDIM), 29 Sep–1 Oct 2014, pp. 205–213

  10. Fong S, Liang J, Deb S (2013) Diabetics prediction by using feature selection based on coefficient of variation. In: Proceedings of Wilkes—international conference on computing sciences, New Delhi, November 2013

  11. Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4(1):1–58

    Article  Google Scholar 

  12. Hassanien A-E, Azar T, Snásel A, Kacprzyk V, Abawajy J, J.H. (eds) (2015) Big data in complex systems: challenges and opportunities. Studies in Big Data. Springer, Cham

    Google Scholar 

  13. Muskan Kukreja SA, Johnston, Stafford P (2012) Comparative study of classification algorithms for immunosignaturing data. BMC Bioinf 13:139

    Article  Google Scholar 

  14. Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge

    Google Scholar 

  15. Jacob Eisenstein A, Ahmed, Xing EP (2011) Sparse additive generative models of text. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp. 1041–1048

  16. Hall MA (1998) Correlation-based feature subset selection for machine learning, PhD thesis, University of Waikato, Hamilton, New Zealand

  17. Liu H, Setiono R (1996) A probabilistic approach to feature selection—a filter solution. In: 13th international conference on machine learning, pp. 319–327

  18. Ohta K, Moriai S, Aoki K (1995) Improving the Search Algorithm for the Best Linear Expression. Advances in cryptology—CRYPT0′95, Lecture Notes in Computer Science, vol 963, pp. 157–170

    Chapter  Google Scholar 

  19. Ferrer J, Kruse PM, Chicano F, Alba E (2015) Search based algorithms for test sequence generation in functional testing. Inf Softw Technol 58:419–432

    Article  Google Scholar 

  20. Bravo Y, Luque G, Alba E (2015) Takeovers time in evolutionary dynamic optimization: from theory to practice. Appl Math Comput 250(1):94–104

    Google Scholar 

  21. Moraglio A, Di Chio C, Poli R (2007) Geometric Particle Swarm Optimisation. In: Proceedings of the 10th European Conference on Genetic Programming, Berlin, Heidelberg, pp. 125–136

  22. Jones SS, Rudin RS, Perry T, Shekelle PG (2014) Health information technology: an updated systematic review with a focus on meaningful use. Ann Intern Med 160(1):48–54

    Article  Google Scholar 

  23. Fong S, Zhang Y, Fiaidhi J, Mohammed O, Mohammed S (2013) Evaluation of stream mining classifiers for real-time clinical decision support system: a case study of blood glucose prediction in diabetes therapy. Biomed Res Int. https://doi.org/10.1155/2013/274193

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This paper is supported by the research grant “Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest (iOVFDF),” Grant No. MYRG2015-00128-FST, which is offered by the University of Macau, FST, and RDAO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Fong.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ehtical approval

This article does not contain any studies with human participants and animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, D., Peng, J., Fong, S. et al. Medical data mining in sentiment analysis based on optimized swarm search feature selection. Australas Phys Eng Sci Med 41, 1087–1100 (2018). https://doi.org/10.1007/s13246-018-0674-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13246-018-0674-3

Keywords

Navigation