Abstract
Not long ago primary census data became available to publicity. It opened qualitatively new perspectives not only for researchers in demography and sociology, but also for those people, who somehow face processes occurring in society.
In this paper authors propose using Data Mining methods for searching hidden interconnections in census data. A novel clustering-based technique is described as well. It allows determining factors which influence people behavior, in particular decision-making process (as an example, a decision whether to have a baby or not). Proposed technique concerns contrast mining as it is based on dividing the whole set of respondents on two contrasting groups. The first group consists of those, who possess a certain feature (for instance, has a baby) unlike members of the second group. We propose define clustering based subgroups out of the first group and their prototypes out of the second one. By means of analyzing subgroups’ and their prototypes’ characteristics it is possible to identify which factors influence the decision-making process. Authors also provide an experimental example of the described approach usage, which additionally shows that fuzzy clustering provides more accurate results than hard clustering techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gantz, J., Reinsel, D.: The 2011 Digital Universe study: extracting value from chaos. IDC iview (June 2011), http://www.emc.com/collateral/demos/microsites/emc-digital-universe-2011/index.htm
Mullins, I., et al.: Data Mining and clinical data repositories: insights from a 667,000 patient data set. Comput. in Biology and Medicine 36(12), 1351–1377 (2006)
Public Law 104-191, 104th Congress. Health Insurance Portability and Accountability Act of 1996 (HIPAA) (August 21, 1996), http://aspe.hhs.gov/admnsimp/pl104191.htm
Patient Safety and Quality Improvement Act of 2005 (PSQIA). Federal Register 73(266) (2001)
Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002. Official J. of the European Communities L 201 (July 31, 2002)
The Law of Ukraine On State Statistics as for 2009, March 5 (in Ukrainian), http://zakon1.rada.gov.ua/cgi-bin/laws/main.cgi?nreg=2614-12&p=1265575855780241
Minnesota Population Center, University of Minnesota. Integrated Public Use Microdata Series International, https://international.ipums.org/international/
Lenz, H.-J., Shoshani, A.: Summarizability in OLAP and statistical data bases. In: Proc. 9th Int. Conf. Scientific and Statistical Database Manage, SSDBM 1997, Olympia, WA, USA (1997)
U.S. Census Bureau. Statistical Quality Standard E1: Analyzing Data, http://www.census.gov/quality/standards/standarde1.html
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2006)
Berson, A., Smith, S., Thearling, K.: An Overview of Data Mining Techniques (2005), http://www.stat.ucla.edu/~hqxu/stat19/DM-Techniques.pdf
Colet, E.: Clustering and Classification: Data Mining Approaches (July 4, 2004), http://www.taborcommunications.com/dsstar/00/0704/101861.html
Hammouda, K., Karay, F.: A comparative study of data clustering techniques, http://pami.uwaterloo.ca/pub/hammouda/sde625-paper.pdf
Dong, G.: International Workshop on Contrast Data Mining and Applications (2011), http://www.cs.wright.edu/~gdong/ContrastDMWorkshop.pdf
Novak, P.K., Lavrac, N., Webb, G.I.: Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining. J. of Mach. Learning Research 10, 377–403 (2009)
Dong, G., Bailey, J.: Overview of Contrast Data Mining as a Field and Preview of an Upcoming Book. In: 2011 IEEE 11th Int. Conf. on Data Mining Workshops, Vancouver, Canada (2011)
Bay, S.D., Pazzani, M.J.: Detecting change in categorical data: mining contrast sets. In: Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KKD 1999, San Diego, CA, USA (1999)
Dong, G., Li, J.: Efficient Mining of Emerging Patterns: Discovering Trends and Differences. In: Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KKD 1999, San Diego, CA, USA (1999)
Liu, B., Hsu, W., Ma, Y.: Discovering the set of fundamental rule changes. In: Proc. of the 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, USA, pp. 335–340 (2001)
Garriga, G.C., Kralj, P., Lavrač, N.: Closed Sets for Labeled Data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 163–174. Springer, Heidelberg (2006)
Daly, O., Taniar, D.: Exception rules in data mining. In: Encyclopedia of Information Science and Technology (II), pp. 1144–1148. Idea Group Reference (2005)
Loekito, E., Bailey, J.: Fast Mining of High Dimensional Expressive Contrast Patterns Using Zero-suppressed Binary Decision Diagrams. In: Proc. of the 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2006), Philadelphia, PA, USA, pp. 307–316 (2006)
García-Borroto, M., Trinidad, J.F.M., Carrasco-Ochoa, J.A.: Fuzzy emerging patterns for classifying hard domains. Knowledge and Inform. Syst. 28(2), 473–489 (2011)
Duan, L., et al.: Mining contrast inequalities in numeric dataset. In: Int. Conf. on Web-Age Inform. Manage. (WAIM), Jiuzhaigou, China, pp. 194–205 (2010)
Duan, L., Tang, C., Tang, L., Zhang, T., Zuo, J.: Mining Class Contrast Functions by Gene Expression Programming. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds.) ADMA 2009. LNCS, vol. 5678, pp. 116–127. Springer, Heidelberg (2009)
Nedjar, S., Cicchetti, R., Lakhal, L.: Extracting semantics in OLAP databases using emerging cubes. Inform. Sci. 181, 2036–2059 (2011)
Ramamohanarao, K., Bailey, J., Fan, H.: Efficient Mining of Contrast Patterns and Their Applications to Classification. In: Proc. of the 2005 3rd Int. Conf. on Intelligent Sensing and Inform. Process (ICISIP 2005), pp. 39–47. IEEE Computer Society, Washington, DC (2005)
Fore, N., Dong, G.: CPC: A contrast pattern based clustering algorithm requiring no distance function. Department of Computer Science and Engineering, Wright State University, OH, USA, Tech. Rep. (2011)
Li, J., Wong, L.: Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics 18(5), 725–734 (2002)
Dong, G., Fore, N.: Discovering dynamic logical blog communities based on their distinct interest profiles. In: The First Int. Conf. on Social Eco-Informatics (SOTICS 2011), Barcelona, Spain, pp. 24–30 (2011)
Kobyliński, Ł., Walczak, K.: Jumping Emerging Patterns with Occurrence Count in Image Classification. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 904–909. Springer, Heidelberg (2008)
Encheva, S., Tumin, S.: Problem identification based on fuzzy functions. WSEAS Trans. on Advances in Eng. Educ. 6(9), 111–120 (2009)
Gupta, A., Kumar, N., Bhatnagar, V.: Analysis of medical data using Data Mining and formal concept analysis. Proc. World Academy of Sci., Ing. and Technology 6 (2005)
Nonyelum, O.: Potential value of Data Mining for customer relationship marketing in the banking industry. Adv. in Nat. Appl. Sci. 3(1), 73–78 (2009)
Ngai, E.W.T., Xiu, L., Chau, D.C.K.: Application of Data Mining techniques in customer relationship management: a literature review and classification. Expert Syst. with Applicat. 36, 2592–2602 (2009)
Malerba, D., Esposito, F., Lisi, F., Appice, A.: Mining spatial association rules in census data. Research in Official Statistics 1, 19–45 (2002)
Malerba, D., Lisi, F., Appice, A., Sblendorio, F.: Mining census and geographic data in urban planning environments. In: Santini, L., Zotta, D. (eds.) Atti della Terza Conferenza Nazionale su Informatica e Pianificazione Urbana e Territoriale (INPUT 2003), Alinea Editrice, Firenze, Italy (2003)
Appice, A., Ceci, M., Lanza, A., Lisi, F., Malerba, D.: Discovery of spatial association rules in geo-referenced census data: a relational mining approach. Intelligent Data Analysis 7, 541–566 (2003)
Zaki, M., Hsiao, C.: CHARM: an efficient algorithm for closed itemset mining. In: Proc. 2002 SIAM Int. Conf. Data Mining, Arlington, VA, USA, pp. 457–473 (2002)
Park, J., Chen, M., Yu, P.: An effective hash-based algorithm for mining association rules. In: Proc. 1995 ACM SIGMOD Int. Conf. Manage. Data, San Jose, CA, USA (1995)
Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. 1997 ACM SIGMOD Int. Conf. Manage. Data, Tucson, AZ, USA (1997)
Klosgen, W., May, M.: Census data mining – an application. In: Proc. 6th European Conf. Principles of Data Mining, Knowledge Discovery (PKDD 2002), pp. 65–79 (2002)
Chertov, O., Aleksandrova, M.: Clustering with prototype extraction for census data analysis. In: Proc. World Conf. Soft Computing WConSC 2011, San Francisco, CA, USA (2011), http://arxiv.org/abs/1106.5122
U.S. Census. 5-Percent Public Use Microdata Sample Files (2000), http://www.census.gov/Press-Release/www/2003/PUMS5.html
Priyono, A., et al.: Generation of fuzzy rules with subtractive clustering. J. Technology 43, 143–153 (2005)
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences 10(2-3), 191–203 (1984)
Chiu, S.L.: Fuzzy model identification based on cluster estimation. J. Intelligent Fuzzy Syst. 2, 267–278 (1994)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chertov, O., Aleksandrova, M. (2013). Fuzzy Clustering with Prototype Extraction for Census Data Analysis. In: Yager, R., Abbasov, A., Reformat, M., Shahbazova, S. (eds) Soft Computing: State of the Art Theory and Novel Applications. Studies in Fuzziness and Soft Computing, vol 291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34922-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-34922-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34921-8
Online ISBN: 978-3-642-34922-5
eBook Packages: EngineeringEngineering (R0)