Abstract
Nowadays, the veracity related to data quality such as incomplete, inconsistent, vague or noisy data creates a major challenge to data mining and data analysis. Rough set theory presents a special tool for handling the incomplete and imprecise data in information systems. In this paper, rough set based matrix-represented approximations are presented to compute lower and upper approximations. The induced approximations are conducted as inputs for data analysis method, LERS (Learning from Examples based on Rough Set) used with LEM2 (Learning from Examples Module, Version2) rule induction algorithm. Analyzes are performed on missing datasets with “do not care” conditions and missing datasets with lost values. In addition, experiments on missing datasets with different missing percent by using different thresholds are also provided. The experimental results show that the system outperforms when missing data are characterized as “do not care” conditions than represented as lost values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Grzymala-Busse, J. W. (2005). Characteristic relations for incomplete data: A generalization of the indiscernibility relation. Transactions on rough sets IV (pp. 58–68). Berlin, Heidelberg: Springer.
Grzymala-Busse, J. W. (2008). Three approaches to missing attribute values: A rough set perspective. Data mining: Foundations and practice (pp. 139–152). Berlin, Heidelberg: Springer.
Kryszkiewicz, M. (1998). Rough set approach to incomplete information systems. Information Sciences, 112(1–4), 39–49.
Kryszkiewicz, M. (1999). Rules in incomplete information systems. Information Sciences, 113(3–4), 271–292.
Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. In System Theory. Boston, London, Dordrecht: Kluwer Academic Publishers.
Stefanowski, J., & Tsoukiàs, A. (1999, November). On the extension of rough sets under incomplete information. In International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing (pp. 73–81). Springer, Berlin, Heidelberg.
Zhang, J., Wong, J. S., Pan, Y., & Li, T. (2015). A parallel matrix-based method for computing approximations in incomplete information systems. IEEE Transactions on Knowledge and Data Engineering, 27(2), 326–339.
Soe, T. T., & Min, M. M. (2018, June). Speeding up incomplete data analysis using matrix-represented approximations. In 2018 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (pp. 206–211). IEEE.
Zhang, J., Li, T., Ruan, D., & Liu, D. (2012). Rough sets based matrix approaches with dynamic attribute variation in set-valued information systems. International Journal of Approximate Reasoning, 53(4), 620–635.
Qian, Y., Dang, C., Liang, J., & Tang, D. (2009). Set-valued ordered information systems. Information Sciences, 179(16), 2809–2832.
Grzymala-Busse, J. W., & Wang, C. P. B. (1996, June). Classification methods in rule induction. In Proceedings of the 5th Intelligent Information Systems Workshop (pp. 120–126).
Grzymala-Busse, J. W. (1992). LERS-a system for learning from examples based on rough sets. Intelligent decision support (pp. 3–18). Dordrecht: Springer.
Grzymala-Busse, J. W. (2006). Rough set strategies to data with missing attribute values. Foundations and novel approaches in data mining (pp. 197–212). Berlin, Heidelberg: Springer.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Soe, T.T., Min, M.M. (2020). Analysis of Missing Data Using Matrix-Characterized Approximations. In: Lee, R. (eds) Software Engineering Research, Management and Applications. SERA 2019. Studies in Computational Intelligence, vol 845. Springer, Cham. https://doi.org/10.1007/978-3-030-24344-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-24344-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24343-2
Online ISBN: 978-3-030-24344-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)