Mining the Risk Types of Human Papillomavirus (HPV) by AdaCost
- 504 Downloads
Human Papillomavirus (HPV) infection is known as the main factor for cervical cancer, where cervical cancer is a leading cause of cancer deaths in women worldwide. Because there are more than 100 types in HPV, it is critical to discriminate the HPVs related with cervical cancer from those not related with it. In this paper, we classify the risk type of HPVs using their textual explanation. The important issue in this problem is to distinguish false negatives from false positives. That is, we must find out high-risk HPVs though we may miss some low-risk HPVs. For this purpose, the AdaCost, a cost-sensitive learner is adopted to consider different costs between training examples. The experimental results on the HPV sequence database show that considering costs gives higher performance. The F-score is higher than the accuracy, which implies that most high-risk HPVs are found.
KeywordsCervical Cancer Human Papilloma Virus Weak Learner Human Papilloma Virus Type Risk Type
Unable to display preview. Download preview PDF.
- 2.Fan, W., Stolfo, S., Zhang, J., Chan, P.: AdaCost: Misclassification Cost-Sensitive Boosting. In: Proceedings of the 16th International Conference on Machine Learning, pp. 97–105 (1999)Google Scholar
- 4.Furumoto, H., Irahara, M.: Human Papilloma Virus (HPV) and Cervical Cancer. The Jounral of Medical Investigation 49(3–4), 124–133 (2002)Google Scholar
- 5.Ishiji, T.: Molecular Mechanism of Carcinogenesis by Human Papillomavirus-16. The Journal of Dermatology 27(2), 73–86 (2000)Google Scholar
- 7.Kim, Y.-H., Hahn, S.-Y., Zhang, B.-T.: Text Filtering by Boosting Naive Bayes Classifiers. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 168–175 (2000)Google Scholar
- 8.McCallum, A., Nigam, K.: Empolying EM in Pool-based Active Learning for Text Classification. In: Proceedings of the 15th International Conference on Machine Learning, pp. 350–358 (1998)Google Scholar
- 10.Nuovo, G., Crum, C., De Villiers, E., Levine, R., Silverstein, S.: Isolation of a Novel Human Papillomavirus (Type 51) from a Cervical Condyloma. Journal of Virology 62, 1452–1455 (1988)Google Scholar
- 11.Provost, F., Fawcett, T.: Analysis and Visualization of Classifier Performance: Comparison Under Imprecise Class and Cost Distributions. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 43–48 (1997)Google Scholar
- 12.Park, S.-B., Zhang, B.-T.: A Boosted Maximum Entropy Model for Learning Text Chunking. In: Proceedings of the 19th Internatinal Conference on Machine Learning, pp. 482–489 (2002)Google Scholar
- 13.Ting, K.-M., Zheng, Z.: Boosting Trees for Cost-Sensitive Classifications. In: Proceedings of the 10th European Conference on Machine Learning, pp. 190–195 (1998)Google Scholar