Skip to main content

Fuzzy Clustering with Prototype Extraction for Census Data Analysis

  • Chapter
Soft Computing: State of the Art Theory and Novel Applications

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 291))

Abstract

Not long ago primary census data became available to publicity. It opened qualitatively new perspectives not only for researchers in demography and sociology, but also for those people, who somehow face processes occurring in society.

In this paper authors propose using Data Mining methods for searching hidden interconnections in census data. A novel clustering-based technique is described as well. It allows determining factors which influence people behavior, in particular decision-making process (as an example, a decision whether to have a baby or not). Proposed technique concerns contrast mining as it is based on dividing the whole set of respondents on two contrasting groups. The first group consists of those, who possess a certain feature (for instance, has a baby) unlike members of the second group. We propose define clustering based subgroups out of the first group and their prototypes out of the second one. By means of analyzing subgroups’ and their prototypes’ characteristics it is possible to identify which factors influence the decision-making process. Authors also provide an experimental example of the described approach usage, which additionally shows that fuzzy clustering provides more accurate results than hard clustering techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gantz, J., Reinsel, D.: The 2011 Digital Universe study: extracting value from chaos. IDC iview (June 2011), http://www.emc.com/collateral/demos/microsites/emc-digital-universe-2011/index.htm

  2. Mullins, I., et al.: Data Mining and clinical data repositories: insights from a 667,000 patient data set. Comput. in Biology and Medicine 36(12), 1351–1377 (2006)

    Article  Google Scholar 

  3. Public Law 104-191, 104th Congress. Health Insurance Portability and Accountability Act of 1996 (HIPAA) (August 21, 1996), http://aspe.hhs.gov/admnsimp/pl104191.htm

  4. Patient Safety and Quality Improvement Act of 2005 (PSQIA). Federal Register  73(266) (2001)

    Google Scholar 

  5. Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002. Official J. of the European Communities  L 201 (July 31, 2002)

    Google Scholar 

  6. The Law of Ukraine On State Statistics as for 2009, March 5 (in Ukrainian), http://zakon1.rada.gov.ua/cgi-bin/laws/main.cgi?nreg=2614-12&p=1265575855780241

  7. Minnesota Population Center, University of Minnesota. Integrated Public Use Microdata Series International, https://international.ipums.org/international/

  8. Lenz, H.-J., Shoshani, A.: Summarizability in OLAP and statistical data bases. In: Proc. 9th Int. Conf. Scientific and Statistical Database Manage, SSDBM 1997, Olympia, WA, USA (1997)

    Google Scholar 

  9. U.S. Census Bureau. Statistical Quality Standard E1: Analyzing Data, http://www.census.gov/quality/standards/standarde1.html

  10. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2006)

    MATH  Google Scholar 

  11. Berson, A., Smith, S., Thearling, K.: An Overview of Data Mining Techniques (2005), http://www.stat.ucla.edu/~hqxu/stat19/DM-Techniques.pdf

  12. Colet, E.: Clustering and Classification: Data Mining Approaches (July 4, 2004), http://www.taborcommunications.com/dsstar/00/0704/101861.html

  13. Hammouda, K., Karay, F.: A comparative study of data clustering techniques, http://pami.uwaterloo.ca/pub/hammouda/sde625-paper.pdf

  14. Dong, G.: International Workshop on Contrast Data Mining and Applications (2011), http://www.cs.wright.edu/~gdong/ContrastDMWorkshop.pdf

  15. Novak, P.K., Lavrac, N., Webb, G.I.: Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining. J. of Mach. Learning Research 10, 377–403 (2009)

    MATH  Google Scholar 

  16. Dong, G., Bailey, J.: Overview of Contrast Data Mining as a Field and Preview of an Upcoming Book. In: 2011 IEEE 11th Int. Conf. on Data Mining Workshops, Vancouver, Canada (2011)

    Google Scholar 

  17. Bay, S.D., Pazzani, M.J.: Detecting change in categorical data: mining contrast sets. In: Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KKD 1999, San Diego, CA, USA (1999)

    Google Scholar 

  18. Dong, G., Li, J.: Efficient Mining of Emerging Patterns: Discovering Trends and Differences. In: Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KKD 1999, San Diego, CA, USA (1999)

    Google Scholar 

  19. Liu, B., Hsu, W., Ma, Y.: Discovering the set of fundamental rule changes. In: Proc. of the 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, USA, pp. 335–340 (2001)

    Google Scholar 

  20. Garriga, G.C., Kralj, P., Lavrač, N.: Closed Sets for Labeled Data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 163–174. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Daly, O., Taniar, D.: Exception rules in data mining. In: Encyclopedia of Information Science and Technology (II), pp. 1144–1148. Idea Group Reference (2005)

    Google Scholar 

  22. Loekito, E., Bailey, J.: Fast Mining of High Dimensional Expressive Contrast Patterns Using Zero-suppressed Binary Decision Diagrams. In: Proc. of the 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2006), Philadelphia, PA, USA, pp. 307–316 (2006)

    Google Scholar 

  23. García-Borroto, M., Trinidad, J.F.M., Carrasco-Ochoa, J.A.: Fuzzy emerging patterns for classifying hard domains. Knowledge and Inform. Syst. 28(2), 473–489 (2011)

    Article  Google Scholar 

  24. Duan, L., et al.: Mining contrast inequalities in numeric dataset. In: Int. Conf. on Web-Age Inform. Manage. (WAIM), Jiuzhaigou, China, pp. 194–205 (2010)

    Google Scholar 

  25. Duan, L., Tang, C., Tang, L., Zhang, T., Zuo, J.: Mining Class Contrast Functions by Gene Expression Programming. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds.) ADMA 2009. LNCS, vol. 5678, pp. 116–127. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  26. Nedjar, S., Cicchetti, R., Lakhal, L.: Extracting semantics in OLAP databases using emerging cubes. Inform. Sci. 181, 2036–2059 (2011)

    Article  Google Scholar 

  27. Ramamohanarao, K., Bailey, J., Fan, H.: Efficient Mining of Contrast Patterns and Their Applications to Classification. In: Proc. of the 2005 3rd Int. Conf. on Intelligent Sensing and Inform. Process (ICISIP 2005), pp. 39–47. IEEE Computer Society, Washington, DC (2005)

    Chapter  Google Scholar 

  28. Fore, N., Dong, G.: CPC: A contrast pattern based clustering algorithm requiring no distance function. Department of Computer Science and Engineering, Wright State University, OH, USA, Tech. Rep. (2011)

    Google Scholar 

  29. Li, J., Wong, L.: Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics 18(5), 725–734 (2002)

    Article  Google Scholar 

  30. Dong, G., Fore, N.: Discovering dynamic logical blog communities based on their distinct interest profiles. In: The First Int. Conf. on Social Eco-Informatics (SOTICS 2011), Barcelona, Spain, pp. 24–30 (2011)

    Google Scholar 

  31. Kobyliński, Ł., Walczak, K.: Jumping Emerging Patterns with Occurrence Count in Image Classification. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 904–909. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  32. Encheva, S., Tumin, S.: Problem identification based on fuzzy functions. WSEAS Trans. on Advances in Eng. Educ. 6(9), 111–120 (2009)

    Google Scholar 

  33. Gupta, A., Kumar, N., Bhatnagar, V.: Analysis of medical data using Data Mining and formal concept analysis. Proc. World Academy of Sci., Ing. and Technology 6 (2005)

    Google Scholar 

  34. Nonyelum, O.: Potential value of Data Mining for customer relationship marketing in the banking industry. Adv. in Nat. Appl. Sci. 3(1), 73–78 (2009)

    Google Scholar 

  35. Ngai, E.W.T., Xiu, L., Chau, D.C.K.: Application of Data Mining techniques in customer relationship management: a literature review and classification. Expert Syst. with Applicat. 36, 2592–2602 (2009)

    Article  Google Scholar 

  36. Malerba, D., Esposito, F., Lisi, F., Appice, A.: Mining spatial association rules in census data. Research in Official Statistics 1, 19–45 (2002)

    Google Scholar 

  37. Malerba, D., Lisi, F., Appice, A., Sblendorio, F.: Mining census and geographic data in urban planning environments. In: Santini, L., Zotta, D. (eds.) Atti della Terza Conferenza Nazionale su Informatica e Pianificazione Urbana e Territoriale (INPUT 2003), Alinea Editrice, Firenze, Italy (2003)

    Google Scholar 

  38. Appice, A., Ceci, M., Lanza, A., Lisi, F., Malerba, D.: Discovery of spatial association rules in geo-referenced census data: a relational mining approach. Intelligent Data Analysis 7, 541–566 (2003)

    Google Scholar 

  39. Zaki, M., Hsiao, C.: CHARM: an efficient algorithm for closed itemset mining. In: Proc. 2002 SIAM Int. Conf. Data Mining, Arlington, VA, USA, pp. 457–473 (2002)

    Google Scholar 

  40. Park, J., Chen, M., Yu, P.: An effective hash-based algorithm for mining association rules. In: Proc. 1995 ACM SIGMOD Int. Conf. Manage. Data, San Jose, CA, USA (1995)

    Google Scholar 

  41. Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. 1997 ACM SIGMOD Int. Conf. Manage. Data, Tucson, AZ, USA (1997)

    Google Scholar 

  42. Klosgen, W., May, M.: Census data mining – an application. In: Proc. 6th European Conf. Principles of Data Mining, Knowledge Discovery (PKDD 2002), pp. 65–79 (2002)

    Google Scholar 

  43. Chertov, O., Aleksandrova, M.: Clustering with prototype extraction for census data analysis. In: Proc. World Conf. Soft Computing WConSC 2011, San Francisco, CA, USA (2011), http://arxiv.org/abs/1106.5122

  44. U.S. Census. 5-Percent Public Use Microdata Sample Files (2000), http://www.census.gov/Press-Release/www/2003/PUMS5.html

  45. Priyono, A., et al.: Generation of fuzzy rules with subtractive clustering. J. Technology 43, 143–153 (2005)

    Google Scholar 

  46. Bezdek, J.C., Ehrlich, R., Full, W.: FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences 10(2-3), 191–203 (1984)

    Article  Google Scholar 

  47. Chiu, S.L.: Fuzzy model identification based on cluster estimation. J. Intelligent Fuzzy Syst. 2, 267–278 (1994)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oleg Chertov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Chertov, O., Aleksandrova, M. (2013). Fuzzy Clustering with Prototype Extraction for Census Data Analysis. In: Yager, R., Abbasov, A., Reformat, M., Shahbazova, S. (eds) Soft Computing: State of the Art Theory and Novel Applications. Studies in Fuzziness and Soft Computing, vol 291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34922-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34922-5_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34921-8

  • Online ISBN: 978-3-642-34922-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics