Comparing Different Data Fusion Strategies for Cancer Classification

  • Katarzyna Pojda
  • Michał Jakubczak
  • Sebastian Student
  • Andrzej Świerniak
  • Krzysztof Fujarewicz
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 721)


Automatic cancer diagnosis can be performed based on different types of data sets. Some of them are microarray data, clinical trial data and cytopathological data. Usually class prediction is done by chosen classification method, for example a machine learning algorithm, and uses only one type of the available data. In this work an additional predictive value of a fusion of these three types of data is examined. To perform such research, authors upgrade and use their recently developed Spicy system. Different data fusion strategies have been tested on thyroid cancer data set. The workflow that has been created and the new module of a data fusion implemented in the Spicy system allows to qualify fusion of microarray data, clinical trials data and information about the Bethesda system class as a valuable method of prediction the thyroid nodule malignancy.


Cancer classification Feature selection Data fusion Machine learning 



This work was supported by NCBR (Polish National Centre for Research and Development) under grant Strategmed2/267398/4/NCBR/2015 and by Silesian University of Technology. Calculations were performed using the infrastructure supported by the computer cluster Ziemowit ( funded by the Silesian BIO-FARMA project No. POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00-040/13 in the Computational Biology and Bioinformatics Laboratory of the Biotechnology Centre at the Silesian University of Technology.


  1. 1.
    Zhang, M., Lin, O.: Molecular testing of thyroid nodules: a review of current available tests for fine-needle aspiration specimens. Arch. Pathol. Lab. Med. 140(12), 1338–1344 (2016)CrossRefGoogle Scholar
  2. 2.
    Jarzab, B., Wiench, M., Fujarewicz, K., Simek, K., Jarzab, M., Oczko-Wojciechowska, M., Wloch, L., Czarniecka, A., Chmielik, E., Lange, D., Pawlaczek, A., Szpak, S., Gubala, E., Swierniak, A.: Gene expression profile of papillary thyroid cancer: sources of variability and diagnostic implications. Cancer Res. 65(4), 1587–1597 (2005)CrossRefGoogle Scholar
  3. 3.
    Eszlinger, M., Wiench, M., Jarzab, B., Krohn, K., Beck, M., Lauter, J., Gubala, E., Fujarewicz, K., Swierniak, A., Paschke, R.: Meta-and reanalysis of gene expression profiles of hot and cold thyroid nodules and papillary thyroid carcinoma for gene groups. J. Clin. Endocrinol. Metab. 91(5), 1934–1942 (2006)CrossRefGoogle Scholar
  4. 4.
    Fujarewicz, K., Jarzab, M., Eszlinger, M., Krohn, K., Paschke, R., Oczko-Wojciechowska, M., Wiench, M., Kukulska, A., Jarzab, B., Swierniak, A.: A multi-gene approach to differentiate papillary thyroid carcinoma from benign lesions: gene selection using support vector machines with bootstrapping. Endocr. Relat. Cancer 14(3), 809–826 (2007)CrossRefGoogle Scholar
  5. 5.
    Cibas, E.S., Ali, S.Z.: The Bethesda system for reporting thyroid cytopathology. Am. J. Clin. Pathol. 132(5), 658–665 (2009)CrossRefGoogle Scholar
  6. 6.
    Boulesteix, A.L., Porzelius, C., Daumer, M.: Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value. Bioinformatics 24(15), 1698–1706 (2008)CrossRefGoogle Scholar
  7. 7.
    Thomas, M., De Brabanter, K., Suykens, J.A., De Moor, B.: Predicting breast cancer using an expression values weighted clinical classifier. BMC Bioinform. 15(1), 411 (2014)CrossRefGoogle Scholar
  8. 8.
    Fujarewicz, K., Student, S., Zielański, T., Jakubczak, M., Pieter, J., Pojda, K., Świerniak, A.: Large-scale data classification system based on galaxy server and protected from information leak. In: Nguyen, N., Tojo, S., Nguyen, L., Trawiński, B. (eds.) Intelligent Information and Database Systems (ACIIDS 2017). Lecture Notes in Computer Science, vol. 10192, pp. 765–773. Springer, Cham (2017)Google Scholar
  9. 9.
    Synnergren, J., Olsson, B., Gamalielsson, J.: Classification of information fusion methods in systems biology. Silico Biol. 9(3), 65–76 (2009)Google Scholar
  10. 10.
    Chudova, D., Wilde, J.I., Wang, E.T., Wang, H., Rabbee, N., Egidio, C.M., Reynolds, J., Tom, E., Pagan, M., Rigl, C.T., Friedman, L., Wang, C.C., Lanman, R.B., Zeiger, M., Kebebew, E., Rosai, J., Fellegara, G., LiVolsi, V.A., Kennedy, G.C.: Molecular classification of thyroid nodules using high-dimensionality genomic data. J. Clin. Endocrinol. Metab. 95(12), 5296–5304 (2010)CrossRefGoogle Scholar
  11. 11.
    Afgan, E., Baker, D., van den Beek, M., Blankenberg, D., Bouvier, D., Cech, M., Chilton, J., Clements, D., Coraor, N., Eberhard, C., Gruning, B., Guerler, A., Hillman-Jackson, J., Von Kuster, G., Rasche, E., Soranzo, N., Turaga, N., Taylor, J., Nekrutenko, A., Goecks, J.: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 44, w3–w10 (2016)CrossRefGoogle Scholar
  12. 12.
  13. 13.
    Bensz, W., Borys, D., Fujarewicz, K., Herok, K., Jaksik, R., Krasucki, M., Kurczyk, A., Matusik, K., Mrozek, D., Ochab, M., Pacholczyk, M., Pieter, J., Puszynski, K., Psiuk-Maksymowicz, K., Student, S., Swierniak, A., Smieja, J.: Integrated system supporting research on environment related cancers. In: Król, D., Madeyski, L., Nguyen, N.T. (eds.) Recent Developments in Intelligent Information and Database Systems. Studies in Computational Intelligence, vol. 642, pp. 399–409. Springer, Cham (2016)CrossRefGoogle Scholar
  14. 14.
    Psiuk-Maksymowicz, K., Mrozek, D., Jaksik, R., Borys, D., Fujarewicz, K., Swierniak, A.: Scalability of a genomic data analysis in the biotest platform. In: Nguyen, N.T., Tojo, S., Nguyen, L.M., Trawiński, B. (eds.) Intelligent Information and Database Systems (ACIIDS 2017). Lecture Notes in Computer Science, vol. 10192, pp. 741–752. Springer, Cham (2017)CrossRefGoogle Scholar
  15. 15.
    Psiuk-Maksymowicz, K., Płaczek, A., Jaksik, R., Student, S., Borys, D., Mrozek, D., Fujarewicz, K., Świerniak, A.: A holistic approach to testing biomedical hypotheses and analysis of biomedical data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures and Structures, BDAS 2016. Advanced Technologies for Data Mining and Knowledge Discovery. Communications in Computer and Information Science, vol. 613, pp. 449–462. Springer, Cham (2016).Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Katarzyna Pojda
    • 1
  • Michał Jakubczak
    • 1
  • Sebastian Student
    • 1
  • Andrzej Świerniak
    • 1
  • Krzysztof Fujarewicz
    • 1
  1. 1.Silesian University of TechnologyGliwicePoland

Personalised recommendations