Domain Driven Tree Mining of Semi-structured Mental Health Information

The World Health Organization predicted that depression would be the world's leading cause of disability by 2020. This is calling for urgent interventions. As most mental illnesses are caused by a number of genetic and environmental factors and many different types of mental illness exist, the identification of a precise combination of genetic and environmental causes for each mental illness type is crucial in the prevention and effective treatment of mental illness. Sophisticated data analysis tools, such as data mining, can greatly contribute in the identification of precise patterns of genetic and environmental factors and greatly help the prevention and intervention strategies. One of the factors that complicates data mining in this area is that much of the information is not in strictly structured form. In this paper, we demonstrate the application of tree mining algorithms on semi-structured mental health information. The extracted data patterns can provide useful information to help in the prevention of mental illness, and assist in the delivery of effective and efficient mental health services.


Mental Illness Tree Mining Support Threshold Illness Type Minimum Support Threshold 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal R., Srikant R.: Fast algorithms for mining association rules. VLDB, Chile (1994).Google Scholar
  2. 2.
    Asai T., Arimura H., Uno T., Nakano S.: Discovering Frequent Substructures in Large Unordered Trees. Proc. of the Int'l Conf. on Discovery Science, Japan (2003).Google Scholar
  3. 3.
    Craddock N., Jones I.: Molecular genetics of bipolar disorder. The British Journal of Psychiatry, vol. 178, no. 41, pp. 128–133 (2001).CrossRefGoogle Scholar
  4. 4.
    Ghoting A., Buehrer G., Parthasarathy S., Kim D., Nguyen A., Chen Y.-K., Dubey P. : Cache-conscious Frequent Pattern Mining on a Modern Processor, VLDB Conf., (2005).Google Scholar
  5. 5.
    Hadzic M., Chang E.: Web Semantics for Intelligent and Dynamic Information Retrieval Illustrated Within the Mental Health Domain, to appear in Advances in Web Semantics: A State-of-the Art, Springer, (2008).Google Scholar
  6. 6.
    Hadzic M., Chang E.: An Integrated Approach for Effective and Efficient Retrieval of the Information about Mental Illnesses', Biomedical Data and Applications, Springer, (2008).Google Scholar
  7. 7.
    Hadzic F., Tan H., Dillon T.S.: UNI3-Efficient Algorithm for Mining Unordered Induced Subtrees Using TMG Candidate Generation. IEEE CIDM Symposium, Hawaii (2007).Google Scholar
  8. 8.
    Hadzic F., Tan H., Dillon T.S., Chang E.: Implications of frequent subtree mining using hybrid support definition, Data Mining and Information Engineering, UK, (2007).Google Scholar
  9. 9.
    Hadzic F., Dillon T.S., Chang E.: Knowledge Analysis with Tree Patterns, HICSS-41, USA,(2008).Google Scholar
  10. 10.
    Hadzic F., Dillon T.S., Sidhu A., Chang E., Tan H.: Mining Substructures in Protein Data,IEEE ICDM DMB Workshop, China (2006).Google Scholar
  11. 11.
    Hadzic M., Hadzic F., Dillon T.: Mining of Health Information from Ontologies, Int'l Conf.on Health Informatics, Portugal, (2008).Google Scholar
  12. 12.
    Hadzic M., Hadzic F., Dillon T.: Tree Mining in Mental Health Domain, HICSS-41, USA,(2008).Google Scholar
  13. 13.
    Han J., Kamber M.: Data Mining: Concepts and Techniques (2nd edition). San Francisco:Morgan Kaufmann (2006).Google Scholar
  14. 14.
    Horvitz-Lennon M., Kilbourne A.M., Pincus H.A.: From Silos To Bridges: Meeting The General Health Care Needs Of Adults With Severe Mental Illnesses. Health Affairs vol. 25, no.3, pp. 659–669 (2006).CrossRefGoogle Scholar
  15. 15.
    Liu J., Juo S.H., Dewan A., Grunn A., Tong X., Brito M., Park N., Loth J.E., Kanyas K., Lerer B., Endicott J., Penchaszadeh G., Knowles J.A., Ott J., Gilliam T.C., Baron M.: Evidence for a putative bipolar disorder locus on 2p13–16 and other potential loci on 4q31, 7q34, 8q13, 9q31,10q21–24,13q32, 14q21 and 17q11–12. Mol Psychiatry, vol. 8, no. 3, pp. 333–342 (2003).CrossRefGoogle Scholar
  16. 16.
    Lopez A.D., Murray C.C.J.L.: The Global Burden of Disease, 1990–2020. Nature Medicine vol. 4, pp. 1241–1243 (1998).CrossRefGoogle Scholar
  17. 17.
    Novichkova S., Egorov S., Daraselia N.: Medscan, a natural language processing engine for Medline abstracts. Bioinformatics, vol. 19, no. 13, pp. 1699–1706, (2003).CrossRefGoogle Scholar
  18. 18.
    Onkamo P., Toivonen H.: A survey of data mining methods for linkage disequilibrium mapping. Human genomics, vol. 2, no. 5, pp. 336–340 (2006).Google Scholar
  19. 19.
    Piatetsky-Shapiro G., Tamayo P.: Microarray Data Mining: Facing the Challenges. SIGKDD Explorations, vol. 5, no. 2, pp. 1–6 (2003).CrossRefGoogle Scholar
  20. 20.
    Shasha D., Wang J.T.L., Zhang S.: Unordered Tree Mining with Applications to Phylogeny.Int'l Conf. on Data Engineering, USA (2004).Google Scholar
  21. 21.
    Sidhu A.S., Dillon T.S., Sidhu B.S., Setiawan H.: A Unified Representation of Protein Structure Databases. Biotech. Approaches for Sustainable Development, pp. 396–408 (2004).Google Scholar
  22. 22.
    Smith D.G., Ebrahim S., Lewis S., Hansell A.L., Palmer L.J., Burton P.R.: Genetic epidemiology and public health: hope, hype, and future prospects. The Lancet, vol. 366, no. 9495, pp.1484–1498 (2005).CrossRefGoogle Scholar
  23. 23.
    Tan H., Dillon T.S., Hadzic F., Chang E.: Razor: mining distance constrained embedded subtrees. IEEE ICDM 2006 Workshop on Ontology Mining and Knowledge Discovery from Semistructured documents, China (2006).Google Scholar
  24. 24.
    Tan H., Dillon T.S., Hadzic F., Chang E.: SEQUEST: mining frequent subsequences using DMA Strips. Data Mining and Information Engineering, Czech Republic, (2006).Google Scholar
  25. 25.
    Tan H., Dillon T.S., Hadzic F., Chang E., Feng L.: MB3-Miner: mining eMBedded subTREEs using Tree Model Guided candidate generation. MCD workshop, held in conjunction with ICDM05, USA (2005).Google Scholar
  26. 26.
    Tan H., Dillon T.S., Hadzic F., Feng L., Chang E.: IMB3-Miner: Mining Induced/Embedded subtrees by constraining the level of embedding. Proc. of PAKDD, (2006).Google Scholar
  27. 27.
    Tan H., Hadzic F., Dillon T.S., Feng L., Chang E.: Tree Model Guided Candidate Generation for Mining Frequent Subtrees from XML, to appear in ACM Transactions on Knowledge Discovery from Data, (2008).Google Scholar
  28. 28.
    Tan H., Hadzic F., Dillon T.S., Chang E.: State of the art of data mining of tree structured information, CSSE Journal, vol. 23, no 2, (2008).Google Scholar
  29. 29.
    Wang J.T.L., Shan H., Shasha D., Piel W.H.: Treerank: A similarity measure for nearest neighbor searching in phylogenetic databases. Int'l Conf. on Scientific and Statistical Database Management, USA (2003).Google Scholar
  30. 30.
    Wilczynski N.L., Haynes R.B., Hedges T.: Optimal search strategies for identifying mentalhealth content in MEDLINE: an analytic survey. Annals of General Psychiatry, vol. 5, (2006).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Digital Ecosystems and Business Intelligence Institute (DEBII)Curtin University of TechnologyAustralia

Personalised recommendations