Integrative Visual Data Mining of Biomedical Data: Investigating Cases in Chronic Fatigue Syndrome and Acute Lymphoblastic Leukaemia

  • Paul Kennedy
  • Simeon J. Simoff
  • Daniel R. Catchpoole
  • David B. Skillicorn
  • Franco Ubaudi
  • Ahmad Al-Oqaily
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4404)


This chapter presents an integrative visual data mining approach towards biomedical data. This approach and supporting methodology are presented at a high level. They combine in a consistent manner a set of visualisation and data mining techniques that operate over an integrated data set of several diverse components, including medical (clinical) data, patient outcome and interview data, corresponding gene expression and SNP data, domain ontologies and health management data. The practical application of the methodology and the specific data mining techniques engaged are demonstrated on two case studies focused on the biological mechanisms of two different types of diseases: Chronic Fatigue Syndrome and Acute Lymphoblastic Leukaemia, respectively. The common between the cases is the structure of the data sets.


Gene Ontology Gene Expression Data Chronic Fatigue Syndrome Acute Lymphoblastic Leukaemia Domain Ontology 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Parmigiani, G., Garrett, E.S., Irizarry, R.A., Zeger, S.L. (eds.): The Analysis of Gene Expression Data: Methods and Software. Springer, New York (2003)zbMATHGoogle Scholar
  2. 2.
    Hoffman, E.P., Awad, T., Spira, A., Palma, J., Webster, T., Wright, G., Buckley, J., Davis, R., Hubbell, E., Jones, W., Tibshirani, R., Tompkins, R., Triche, T., Xiao, W., West, M., Warrington, J.A.: Expression profiling - best practices for data generation and interpretation in clinical trials. Nature Reviews: Genetics 4, 229–237 (2004)CrossRefGoogle Scholar
  3. 3.
    Piatetsky-Shapiro, G., Khabaza, T., Ramaswamy, S.: Capturing best practice for microarray gene expression data analysis. In: Proceedings of the 9-th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD-2003, ACM Press, Washington, D.C. (2003)Google Scholar
  4. 4.
    Piatetsky-Shapiro, G., Tamayo, P.: Microarray data mining: Facing the challenges. SIGKDD Explorations 5(2), 1–5 (2003)CrossRefGoogle Scholar
  5. 5.
    Glenisson, P., Mathys, J., Moor, B.D.: Meta-clustering of gene expression data and literature-based information. SIGKDD Explorations 5(2), 101–112 (2003)CrossRefGoogle Scholar
  6. 6.
    Curran, M.D., Liu, H., Long, F., Ge, N.: Statistical methods for joint data mining of gene expression and DNA sequence database. SIGKDD Explorations 5(2), 122–129 (2003)CrossRefGoogle Scholar
  7. 7.
    Seifert, M., Scherf, M., Epple, A., Werner, T.: Multievidence microarray mining. Trends in Genetics 21(10), 553–558 (2005)CrossRefGoogle Scholar
  8. 8.
    Carmona-Saez, P., Chagoyen, M., Rodriguez, A., Trelles, O., Carazo, J.M., Pascual-Montano, A.: Integrated analysis of gene expression by association rules discovery. BMC Bioinformatics 7, 54–70 (2006)CrossRefGoogle Scholar
  9. 9.
    Georgii, E., Richter, L., Ruckert, U., Kramer, S.: Analyzing microarray data using quantitative association rules. Bioinformatics, 21(suppl. 2), 123–129 (2005)Google Scholar
  10. 10.
    Dietzsch, J., Gehlenborg, N., Nieselt, K.: Mayday-a microarray data analysis workbench. Bioinformatics 22(8), 1010–1012 (2006)CrossRefGoogle Scholar
  11. 11.
    Shamir, R., Maron-Katz, A., Tanay, A., Linhart, C., Steinfeld, I., Sharan, R., Shiloh, Y., Elkon, R.: EXPANDER – an integrative program suite for microarray data analysis. BMC Bioinformatics 6, 232–244 (2005)CrossRefGoogle Scholar
  12. 12.
    Hasegawa, Y., Seki, M., Mochizuki, Y., Heida, N., Hirosawa, K., Okamoto, N., Sakurai, T., Satou, M., Akiyama, K., Iida, K., Lee, K., Kanaya, S., Demura, T., Shinozaki, K., Konagaya, A., Toyoda, T.: A flexible representation of omic knowledge for thorough analysis of microarray data. Plant Methods 2(1), 5–46 (2006)CrossRefGoogle Scholar
  13. 13.
    Felix, C.A., Lange, B.J., Chessells, J.M.: Pediatric acute lymphoblastic leukemia: Challenges and controversies in 2000. In: Hematology 2000, January 2000, pp. 285–302 (2000)Google Scholar
  14. 14.
    Nelson, S.J., Powell, T., Humphreys, B.L.: The Unified Medical Language System (UMLS) project. In: Kent, A., Hall, C.M. (eds.) Encyclopedia of Library and Information Science, pp. 369–378. Marcel Dekker, Inc., New York (2002)Google Scholar
  15. 15.
    Weng, L., Dai, H., Zhan, Y., He, Y., Stepaniants, S.B., Bassett, D.E.: Rosetta error model for gene expression analysis. Bioinformatics 22(9), 1111–1121 (2006)CrossRefGoogle Scholar
  16. 16.
    Spellman, P.T., Miller, M., Stewart, J., Troup, C., Sarkans, U., Chervitz, S., Bernhart, D., Sherlock, G., Ball, C., Lepage, M., Swiatek, M., Marks, W.L., Goncalves, J., Markel, S., Iordan, D., Shojatalab, M., Pizarro, A., White, J., Hubley, R., Deutsch, E., Senger, M., Aronow, B.J., Robinson, A., Bassett, D., Stoeckert Jr., C.J., Brazma, A.: Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biology 3(9), 1–9 (2002)CrossRefGoogle Scholar
  17. 17.
    Aplenc, R., Lange, B.: Pharmacogenetic determinants of outcome in acute lymphoblastic leukaemia. British Journal of Haematology 125(4), 421–434 (2004)CrossRefGoogle Scholar
  18. 18.
    Goto, Y., Yue, L., Yokoi, A., Nishimura, R., Uehara, T., Koizumi, S., Saikawa, Y.: A novel single-nucleotide polymorphism in the 3’-untranslated region of the human dihydrofolate reductase gene with enhanced expression. Clinical Cancer Research 7, 1952–1956 (2001)Google Scholar
  19. 19.
    The Gene Ontology Consortium, Gene Ontology: tool for the unification of biology. Nature - Genetics 25, 25–29 (2000)Google Scholar
  20. 20.
    Afari, N., Buchwald, D.: Chronic Fatigue Syndrome: A review. American Journal of Psychiatry 160, 221–236 (2003)CrossRefGoogle Scholar
  21. 21.
    Reeves, W.C., Wagner, D., Nisenbaum, R., Jones, J.F., Gurbaxani, B., Solomon, L., Papanicolaou, D.A., Unger, E.R., Vernon, S.D., Heim, C.: Chronic Fatigue Syndrome - A clinically empirical approach to its definition and study. BMC Medicine 3(19) (2005)Google Scholar
  22. 22.
    CDC Chronic Fatigue Syndrome Research Group. CAMDA 2006 Conference Contest Datasets, viewed at January 12, 2008 (2006),
  23. 23.
    National Center for Infectious Diseases. Proposal: clinical assessment of subjects with Chronic Fatigue Syndrome and other fatiguing illnesses in Wichita (2006),
  24. 24.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis, pp. 282–285. Cambridge University Press, Cambridge (2004)Google Scholar
  25. 25.
    Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13, 637–649 (2001)zbMATHCrossRefGoogle Scholar
  26. 26.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 185–208. MIT Press, Boston (1998)Google Scholar
  27. 27.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)zbMATHGoogle Scholar
  28. 28.
    Australian Institute of Health and Welfare (AIHW) & Australasian Association of Cancer Registries (AACR), Cancer in Australia, in AIHW cat. no. CAN 23. 2004: Canberra: AIHW (Cancer Series no. 28) (2001)Google Scholar
  29. 29.
    Henze, G., Fengler, R., Hartmann, R., Kornhuber, B., Janka-Schaub, G., Niethammer, D., Riehm, H.: Six-year experience with a comprehensive approach to the treatment of recurrent childhood acute lymphoblastic leukemia (ALL-REZ BFM 85). A relapse study of the BFM group. Blood 78(5), 1166–1172 (1991)Google Scholar
  30. 30.
    Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M.J., Bergh, J., Piccart, M., Delorenzi, M.: Gene expression profiling in breast cancer: Understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute, 98(4), 262–272 (2006)CrossRefGoogle Scholar
  31. 31.
    Skillicorn, D.B., Simoff, S., Kennedy, P., Catchpoole, D.: Strategies for winnowing microarray data. In: Bioinformatics Workshop, SIAM International Conference on Data Mining 2004 (2004)Google Scholar
  32. 32.
    Kennedy, P., Simoff, S.J.: CONGO: Clustering on the Gene Ontology. In: Proceedings 2nd Australasian Data Mining Workshop, ADM 2003., UTS Press, Canberra (2003)Google Scholar
  33. 33.
    Kennedy, P.J., Simoff, S.J., Skillicorn, D., Catchpoole, D.: Extracting and explaining biological knowledge in microarray data. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, Springer, Berlin/Heidelberg (2004)Google Scholar
  34. 34.
    Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, San Diego, USA (1999)Google Scholar
  35. 35.
    Lee, S.G., Hur, J.U., Kim, Y.,, S.: A graph-theoretic modeling on GO space for biological interpretation of gene clusters. Bioinformatics 20(3), 381–388 (2004)CrossRefGoogle Scholar
  36. 36.
    Vêncio, R.Z.N., Koide, T., Gomes, S.L., Pereira, C.A.d.B.: BayGO: Bayesian analysis of ontology term enrichment in microarray data. BMC Bioinformatics 7(1), 86–116 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Paul Kennedy
    • 2
  • Simeon J. Simoff
    • 1
    • 2
  • Daniel R. Catchpoole
    • 2
    • 3
  • David B. Skillicorn
    • 4
  • Franco Ubaudi
    • 2
  • Ahmad Al-Oqaily
    • 2
  1. 1.School of Computing and MathematicsUniversity of Western SydneyAustralia
  2. 2.Faculty of Information TechnologyUniversity of TechnologySydneyAustralia
  3. 3.The Oncology Research UnitThe Children’s Hospital at WestmeadWestmeadAustralia
  4. 4.School of ComputingQueen’s UniversityKingstonCanada

Personalised recommendations