Medical data mining in sentiment analysis based on optimized swarm search feature selection

Zeng, Daohui; Peng, Jidong; Fong, Simon; Qiu, Yining; Wong, Raymond

doi:10.1007/s13246-018-0674-3

Medical data mining in sentiment analysis based on optimized swarm search feature selection

Special Issue Article
Published: 11 September 2018

Volume 41, pages 1087–1100, (2018)
Cite this article

Australasian Physical & Engineering Sciences in Medicine Aims and scope Submit manuscript

Daohui Zeng¹^na1,
Jidong Peng²^na1,
Simon Fong³,
Yining Qiu⁴ &
…
Raymond Wong⁴

507 Accesses
8 Citations
Explore all metrics

Abstract

In this paper, we propose a novel technique termed as optimized swarm search-based feature selection (OS-FS), which is a swarm-type of searching function that selects an ideal subset of features for enhanced classification accuracy. In terms of gaining insights from unstructured medical based texts, sentiment prediction is becoming an increasingly crucial machine learning technique. In fact, due to its robustness and accuracy, it recently gained popularity in the medical industries. Medical text mining is well known as a fundamental data analytic for sentiment prediction. To form a high-dimensional sparse matrix, a popular preprocessing step in text mining is employed to transform medical text strings to word vectors. However, such a sparse matrix poses problems to the induction of accurate sentiment prediction model. The swarm search in our proposed OS-FS can be optimized by a new feature evaluation technique called clustering-by-coefficient-of-variation. In order to find a subset of features from all the original features from the sparse matrix, this type of feature selection has been a commonly utilized dimensionality reduction technique, and has the capability to improve accuracy of the prediction model. We implement this method based on a case scenario where 279 medical articles related to ‘meaningful use functionalities on health care quality, safety, and efficiency’ from a systematic review of previous medical IT literature. For this medical text mining, a multi-class of sentiments, positive, mixed-positive, neutral and negative is recognized from the document contents. Our experimental results demonstrate the superiority of OS-FS over traditional feature selection methods in literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrated Feature Selection Methods Using Metaheuristic Algorithms for Sentiment Analysis

Particle swarm optimization-based feature selection in sentiment classification

Article 10 March 2016

Review Sentiment Classification and Feature Selection Using Hybridized Support Vector Machine

Notes

References

Lakshminarayan CK (2013) High dimensional big data and pattern analysis: a tutorial. In: Bhatnagar V, Srinivasa S (eds) Big data analytics, Lecture Notes in Computer Science, Springer, Cham. https://doi.org/10.1007/978-3-319-03689-2_5
Chapter Google Scholar
Yusta SC (2009) Different metaheuristic strategies to solve the feature selection problem. Pattern Recognit Lett 30(5):525–534. https://doi.org/10.1016/j.patrec.2008.11.012
Article Google Scholar
Fong S, Deb S, Yang XS, Li J (2014) Feature selection in life science classification: metaheuristic swarm search. IEEE IT Prof 16(4):24–29. https://doi.org/10.1109/MITP.2014.50
Article Google Scholar
Tsamardinos I, Aliferis CF, Statnikov A (2003) Time and sample efficient discovery of markov blankets and direct causal relations. In Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, ACM Press, pp. 673–678
Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data, IEEE Trans Knowl Data Eng 25(1):1–14. https://doi.org/10.1109/TKDE.2011.181
Article CAS Google Scholar
Baris S (2008) Fast correlation based filter (FCBF) with a different search strategy. In Proceedings of 23rd international symposium on computer and information sciences, IEEE, Oct. 2008, pp. 1–4
Hall MA, Smith LA (1999) Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In Proceedings of the 12th international florida artificial intelligence research society conference, pp. 235–239
Fong S, Deb S, Yang X-S, Li J (2014) Metaheuristic swarm search for feature selection in life science classification. IEEE IT Prof 16(4):24–29
Article Google Scholar
Fong S, Liang J, Wong R, Ghanavati M (2014) A novel feature selection by clustering coefficients of variations. In: 2014 ninth international conference on digital information management (ICDIM), 29 Sep–1 Oct 2014, pp. 205–213
Fong S, Liang J, Deb S (2013) Diabetics prediction by using feature selection based on coefficient of variation. In: Proceedings of Wilkes—international conference on computing sciences, New Delhi, November 2013
Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4(1):1–58
Article Google Scholar
Hassanien A-E, Azar T, Snásel A, Kacprzyk V, Abawajy J, J.H. (eds) (2015) Big data in complex systems: challenges and opportunities. Studies in Big Data. Springer, Cham
Google Scholar
Muskan Kukreja SA, Johnston, Stafford P (2012) Comparative study of classification algorithms for immunosignaturing data. BMC Bioinf 13:139
Article Google Scholar
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge
Google Scholar
Jacob Eisenstein A, Ahmed, Xing EP (2011) Sparse additive generative models of text. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp. 1041–1048
Hall MA (1998) Correlation-based feature subset selection for machine learning, PhD thesis, University of Waikato, Hamilton, New Zealand
Liu H, Setiono R (1996) A probabilistic approach to feature selection—a filter solution. In: 13th international conference on machine learning, pp. 319–327
Ohta K, Moriai S, Aoki K (1995) Improving the Search Algorithm for the Best Linear Expression. Advances in cryptology—CRYPT0′95, Lecture Notes in Computer Science, vol 963, pp. 157–170
Chapter Google Scholar
Ferrer J, Kruse PM, Chicano F, Alba E (2015) Search based algorithms for test sequence generation in functional testing. Inf Softw Technol 58:419–432
Article Google Scholar
Bravo Y, Luque G, Alba E (2015) Takeovers time in evolutionary dynamic optimization: from theory to practice. Appl Math Comput 250(1):94–104
Google Scholar
Moraglio A, Di Chio C, Poli R (2007) Geometric Particle Swarm Optimisation. In: Proceedings of the 10th European Conference on Genetic Programming, Berlin, Heidelberg, pp. 125–136
Jones SS, Rudin RS, Perry T, Shekelle PG (2014) Health information technology: an updated systematic review with a focus on meaningful use. Ann Intern Med 160(1):48–54
Article Google Scholar
Fong S, Zhang Y, Fiaidhi J, Mohammed O, Mohammed S (2013) Evaluation of stream mining classifiers for real-time clinical decision support system: a case study of blood glucose prediction in diabetes therapy. Biomed Res Int. https://doi.org/10.1155/2013/274193
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This paper is supported by the research grant “Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest (iOVFDF),” Grant No. MYRG2015-00128-FST, which is offered by the University of Macau, FST, and RDAO.

Author information

Daohui Zeng and Jidong Peng contributed equally to this work and are co-first authors.

Authors and Affiliations

First Affiliated Hospital of Guangzhou University of TCM, Guangzhou, People’s Republic of China
Daohui Zeng
Ganzhou People’s Hospital, Jiangxi, People’s Republic of China
Jidong Peng
Department of Computer and Information Science, University of Macau, Taipa, Macau SAR, People’s Republic of China
Simon Fong
School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
Yining Qiu & Raymond Wong

Authors

Daohui Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Jidong Peng
View author publications
You can also search for this author in PubMed Google Scholar
Simon Fong
View author publications
You can also search for this author in PubMed Google Scholar
Yining Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Fong.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ehtical approval

This article does not contain any studies with human participants and animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zeng, D., Peng, J., Fong, S. et al. Medical data mining in sentiment analysis based on optimized swarm search feature selection. Australas Phys Eng Sci Med 41, 1087–1100 (2018). https://doi.org/10.1007/s13246-018-0674-3

Download citation

Received: 28 June 2018
Accepted: 09 August 2018
Published: 11 September 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s13246-018-0674-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Medical data mining in sentiment analysis based on optimized swarm search feature selection

Abstract

Access this article

Similar content being viewed by others

Integrated Feature Selection Methods Using Metaheuristic Algorithms for Sentiment Analysis

Particle swarm optimization-based feature selection in sentiment classification

Review Sentiment Classification and Feature Selection Using Hybridized Support Vector Machine

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ehtical approval

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Medical data mining in sentiment analysis based on optimized swarm search feature selection

Abstract

Access this article

Similar content being viewed by others

Integrated Feature Selection Methods Using Metaheuristic Algorithms for Sentiment Analysis

Particle swarm optimization-based feature selection in sentiment classification

Review Sentiment Classification and Feature Selection Using Hybridized Support Vector Machine

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ehtical approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation