Skip to main content

Comparision of Classifiers Accuracies from FAVF and NOFI for Categorical Data

  • Conference paper
  • First Online:
Information Systems Design and Intelligent Applications

Abstract

Outlier analysis is an important task in data science. Specifically finding outliers in categorical data is a tough task. To build an accurate Classifier, it is needed to eliminate exact number of outliers from the data. If less number of outliers is found, the obstacles will remain in the original data. An accurate classifier cannot be built on this data. Similarly if more number of outliers is found and eliminated, some original records may be missed. From this data too an accurate classifier cannot be built. So it is needed to eliminate the exact number of outliers while modeling a classifier. Since the data is categorical, in classification modeling, most infrequent records are treated as outliers. These infrequent objects disturb the data in modeling classifier. But how many outliers needed to be found is a problem. This paper presents the new approach normally distributed Outlier factor by infrequency (NOFI) to improve the Classifier accuracy. In modeling a classifier for categorical data, high frequent records are most useful and infrequent records are most useless. So the infrequent records are obstacles in modeling the classifier. There are many effective approaches to detect outliers for numerical data. But for categorical datasets there are few numbers of methods exists. The experiments are conducted for this new method has been applied on bank dataset which is taken from UCI ML Repository. This approach is not needed any input of k, the required number of outliers. NOFI would find number of outliers automatically using infrequency of all possible combinations framed from attribute values included in any record.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Koufakou, A., Georgiopoulos, M.: A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes. Data Min. Knowl. Disc 20, 259–289 (2010)

    Article  MathSciNet  Google Scholar 

  2. Lakshmi Sreenivasa Reddy, D., Raveendra Babu, B., Govardhan, A.: Outlier analysis of categorical data using NAVF. Informatica Economica 17(1), 5–13 (2013)

    Article  Google Scholar 

  3. Lakshmi Sreenivasa Reddy, D., Raveendra Babu, B.: Outlier analysis of categorical data using FuzzyAVF. Presented at IEEE International Conference ICCPCT-2013, pp. 1259–1263

    Google Scholar 

  4. He, Z., Xu, X., Huang, J., Deng, S.: FP-Outlier: frequent pattern based outlier detection. Comput. Sci. Inf. Syst. (ComSIS’05) 2(1), 103–118 (2005)

    Article  Google Scholar 

  5. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings International Conference on Very Large Data Bases, pp. 487–499 (1994)

    Google Scholar 

  6. He, Z., Deng, S., Xu, X.: A fast greedy algorithm for outlier mining. In: Proceedings of PAKDD (2006)

    Google Scholar 

  7. Lakshmi Sreenivasa Reddy, D., Raveendra Babu, B.: Efficient model to find outliers in categorical data using outlier factor by infrequency. Presented at IEEE International Conference ICCPCT-2014, pp. 1324–1328

    Google Scholar 

  8. Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Lakshmi Sreenivasa Reddy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer India

About this paper

Cite this paper

Lakshmi Sreenivasa Reddy, D., Raveendra Babu, B., Govardhan, A., Kalpana, A., Krishna Murthy, M. (2015). Comparision of Classifiers Accuracies from FAVF and NOFI for Categorical Data. In: Mandal, J., Satapathy, S., Kumar Sanyal, M., Sarkar, P., Mukhopadhyay, A. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 339. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2250-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2250-7_12

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2249-1

  • Online ISBN: 978-81-322-2250-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics