Abstract
We propose three transductive versions of the set covering machine with data dependent rays for classification in the molecular high-throughput setting. Utilizing both labeled and unlabeled samples, these transductive classifiers can learn information from both sample types, not only from labeled ones. These transductive set covering machines are based on modified selection criteria for their ensemble members. Via counting arguments we include the unlabeled information into the base classifier selection. One of the three methods we developed, uniformly increased the classification accuracy, the other two showed mixed behaviour for all data sets. Here, we could show that only by observing the order of unlabeled samples, not distances, we were able to increase classification accuracies, making these approaches useful even when very few information is available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We only utilize the information of orderings of features values, not the information given by a ordinal structure of the class labels (e.g. Herbrich et al. 1999).
References
Armstrong, S. A., Staunton, J. E., Silverman, L. B., Pieters, R., den Boer, M. L., Minden, M. D., et al. (2002). Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30(1), 41–47.
Bishop, C. M. (2006). Pattern recognition and machine learning. Secaucus: Springer.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. I. (1984). Classification and regression trees. Belmont: Wadsworth.
Buchholz, M., Kestler, H. A., Bauer, A., B \(\ddot{\mathrm{o}}\) ck, W., Rau, B., Leder, G., et al. (2005). Specialized DNA arrays for the differentiation of pancreatic tumors. Clinical Cancer Research, 11(22), 8048–8054.
Herbrich, R., Graepel, T., & Obermayer, K. (1999). Regression Models for Ordinal Data: A Machine Learning Approach. Technical report, TU Berlin.
Jolliffe, I. T. (2002). Principal component analysis. New York: Springer.
Kestler, H. A., Lausser, L., Lindner, W., & Palm, G. (2011). On the fusion of threshold classifiers for categorization and dimensionality reduction. Computational Statistics, 26, 321–340.
Lausser, L., Schmid, F., & Kestler, H. A. (2011). On the utility of partially labeled data for classification of microarray data. In F. Schwenker & E. Trentin (Eds.), Partially supervised learning (pp. 96–109). Berlin: Springer.
Marchand, M., & Taylor, J. S. (2003). The set covering machine. Journal of Machine Learning Research, 3, 723–746.
Su, A. I., Welsh, J. B., Sapinoso, L. M., Kern, S. G., Dimitrov, P., Lapp, H., et al. (2001). Molecular classification of human carcinomas by use of gene expression signatures. Cancer Research, 61(20), 7388–7393.
Valk, P. J., Verhaak, R. G., Beijen, M. A., Erpelinck, C. A., Barjesteh van Waalwijk van Doorn-Khosrovani, S., Boer, J. M., et al. (2004). Prognostically useful gene-expression profiles in acute myeloid leukemia. New England Journal of Medicine, 16(350), 1617–1628.
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
Weston, J., Pérez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A., Sch\(\ddot{\mathrm{o}}\) lkopf, B., et al. (2003). Feature selection and transduction for prediction of molecular bioactivity for drug design. Bioinformatics, 19(6), 764–771.
Acknowledgements
This work was funded in part by a Karl-Steinbuch grant to Florian Schmid, the German federal ministry of education and research (BMBF) within the framework of the program of medical genome research (PaCa-Net; Project ID PKB-01GS08) and the framework GERONTOSYS 2 (Forschungskern SyStaR, Project ID 0315894A), and by the German Science Foundation (SFB 1074, Project Z1) to Hans A. Kestler. The responsibility for the content lies exclusively with the authors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Schmid, F., Lausser, L., Kestler, H.A. (2014). Three Transductive Set Covering Machines. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds) Data Analysis, Machine Learning and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01595-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-01595-8_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01594-1
Online ISBN: 978-3-319-01595-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)