Mining Diversified Shared Decision Tree Sets for Discovering Cross Domain Similarities

Dong, Guozhu; Han, Qian

doi:10.1007/978-3-319-06605-9_44

Guozhu Dong²³ &
Qian Han²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8444))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

4050 Accesses

Abstract

This paper studies the problem of mining diversified sets of shared decision trees (SDTs). Given two datasets representing two application domains, an SDT is a decision tree that can perform classification on both datasets and it captures class-based population-structure similarity between the two datasets. Previous studies considered mining just one SDT. The present paper considers mining a small diversified set of SDTs having two properties: (1) each SDT in the set has high quality with regard to “shared” accuracy and population-structure similarity and (2) different SDTs in the set are very different from each other. A diversified set of SDTs can serve as a concise representative of the huge space of possible cross-domain similarities, thus offering an effective way for users to examine/select informative SDTs from that huge space. The diversity of an SDT set is measured in terms of the difference of the attribute usage among the SDTs. The paper provides effective algorithms to mine diversified sets of SDTs. Experimental results show that the algorithms are effective and can find diversified sets of high quality SDTs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beer, D.G., et al.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine 8, 816–824 (2002)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Christie, S., Gentner, D.: Where hypotheses come from: Learning new relations by structural alignment. Journal of Cognition and Development 11(3), 356–373 (2010)
Article Google Scholar
Dong, G.: Cross domain similarity mining: Research issues and potential applications including supporting research by analogy. ACM SIGKDD Explorations (June 2012)
Google Scholar
Dong, G., Han, Q.: Mining accurate shared decision trees from microarray gene expression data for different cancers. In: International Conference on Bioinformatics and Computational Biology, BIOCOMP 2013 (2013)
Google Scholar
Fauconnier, G.: Mappings in Thought and Language. Cambridge University Press (1997)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: ICML, pp. 148–156 (1996)
Google Scholar
Gentner, D.: Structure mapping: A theoretical framework for analogy. Cognitive Science 7, 155–170 (1983)
Article Google Scholar
Gentner, D., Colhoun, J.: Analogical processes in human thinking and learning. In: Glatzeder, B., Goel, V., von Müller, A. (eds.) Towards a Theory of Thinking. On Thinking, vol. 2. Springer, Heidelberg (2010)
Google Scholar
Gentner, D., Markman, A.B.: Structure mapping in analogy and similarity. American Psychologist 52(1), 45–56 (1997)
Article Google Scholar
Gordon, G.J., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62, 4963–4967 (2002)
Google Scholar
Han, Q., Dong, G.: Using attribute behavior diversity to build accurate decision tree committees for microarray data. J. Bioinformatics and Computational Biology 10(4) (2012)
Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Article Google Scholar
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51(2), 181–207 (2003)
Article MATH Google Scholar
Li, J., Liu, H.: Ensembles of cascading trees. In: ICDM, pp. 585–588 (2003)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10), 1345–1359 (2010)
Article Google Scholar
Pomeroy, S.L., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)
Article Google Scholar
Shipp, M.A., et al.: Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine 8, 68–74 (2002)
Article Google Scholar
Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Article Google Scholar
Tong, W., et al.: ArrayTrack-Supporting toxicogenomic research at the FDA’s National Center for Toxicological Research (NCTR). EHP Toxicogenomics 111(15), 1819–1826 (2003)
Google Scholar
Van’t Veer, L.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Knoesis Center, and Department of Computer Science and Engineering, Wright State University, Dayton, Ohio, USA
Guozhu Dong & Qian Han

Authors

Guozhu Dong
View author publications
You can also search for this author in PubMed Google Scholar
Qian Han
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Cheng Kung University, Tainan, Taiwan, R.O.C.
Vincent S. Tseng & Hung-Yu Kao &
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Tu Bao Ho
Nanjing University, China
Zhi-Hua Zhou
National Chengchi University, Taipei, Taiwan, R.O.C.
Arbee L. P. Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dong, G., Han, Q. (2014). Mining Diversified Shared Decision Tree Sets for Discovering Cross Domain Similarities. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-06605-9_44
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06604-2
Online ISBN: 978-3-319-06605-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics