NCBI: A Novel Correlation Based Imputing Technique Using Biclustering

Chowdhury, Hussain A.; Ahmed, Hasin A.; Bhattacharyya, Dhruba Kumar; Kalita, Jugal K.

doi:10.1007/978-981-13-9042-5_43

Hussain A. Chowdhury¹⁹,
Hasin A. Ahmed²⁰,
Dhruba Kumar Bhattacharyya¹⁹ &
…
Jugal K. Kalita²¹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 999))

1997 Accesses
1 Citations

Abstract

Presence of missing values (MV) in gene expression data is commonplace. It significantly affects the performance of statistical analysis and machine learning algorithms. Discarding objects or attributes with missing values and inappropriate estimation of MVs lead to high information loss and misleading results. So, it is necessary to have an accurate technique for missing value imputation. In this paper, we present a novel correlation based missing value imputation technique for gene expression datasets. We refer to our method as NCBI. We compare the estimation accuracy of our technique with two widely used methods such as KNNI and KMI, on four benchmark datasets by randomly knocking out data values as missing. Our technique can estimate missing values almost 20–25% more accurately than KNNI and KMI in all datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmed, H.A., Mahanta, P., Bhattacharyya, D.K., Kalita, J.K.: Shifting-and-scaling correlation based biclustering algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 11(6), 1239–1252 (2014)
Article Google Scholar
Batista, G.E., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 17(5–6), 519–533 (2003)
Article Google Scholar
Benesty, J., Chen, J., Huang, Y., et al.: Pearson correlation coefficient. In: Noise Reduction in Speech Processing, pp. 1–4. Springer (2009)
Google Scholar
Bennett, D.A.: How can I deal with missing data in my study? Aust. N. Z. J. Public Health 25(5), 464–469 (2001)
Article Google Scholar
Bø, T.H., Dysvik, B., Jonassen, I.: LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucl. Acids Res. 32(3), e34–e34 (2004)
Article Google Scholar
Chowdhury, H.A., Bhattacharyya, D.K.: mRMR+: an effective feature selection algorithm for classification. In: International Conference on Pattern Recognition and Machine Intelligence, pp. 424–430. Springer (2017)
Google Scholar
Li, D., Deogun, J., Spaulding, W., et al.: Towards missing data imputation: a study of fuzzy k-means clustering method. In: Rough Sets and Current Trends in Computing, pp. 573–579. Springer (2004)
Google Scholar
Liew, A.W.C., Law, N.F., Yan, H.: Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Brief. Bioinform. 12(5), 498–513 (2011)
Article Google Scholar
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley (2014)
Google Scholar
Mahanta, P., Ahmed, H.A., Bhattacharyya, D.K., Kalita, J.K.: An effective method for network module extraction from microarray data. BMC Bioinform. 13(13), S4 (2012)
Article Google Scholar
Troyanskaya, O., Cantor, M., Sherlock, G., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, Tezpur University, Sonitpur, 784028, Assam, India
Hussain A. Chowdhury & Dhruba Kumar Bhattacharyya
Department of Information and Computer Science, Assam Women’s University, Jorhat, 785004, Assam, India
Hasin A. Ahmed
Computer Science, University of Colorado, Colorado Springs, CO, 80933-7150, USA
Jugal K. Kalita

Authors

Hussain A. Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Hasin A. Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Dhruba Kumar Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar
Jugal K. Kalita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dhruba Kumar Bhattacharyya .

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Howrah, West Bengal, India
Asit Kumar Das
Department of Computer Science and Engineering, Sri Sivani College of Engineering, Srikakulam, Andhra Pradesh, India
Janmenjoy Nayak
Department of Computer Application, Veer Surendra Sai University of Technology, Burla, Sambalpur, Odisha, India
Bighnaraj Naik
Department of Bioinformatics, Maulana Abul Kalam Azad University of Technology, Kolkata, West Bengal, India
Soumen Kumar Pati
Faculty of Communication Sciences, University of Teramo, Teramo, Italy
Danilo Pelusi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chowdhury, H.A., Ahmed, H.A., Bhattacharyya, D.K., Kalita, J.K. (2020). NCBI: A Novel Correlation Based Imputing Technique Using Biclustering. In: Das, A., Nayak, J., Naik, B., Pati, S., Pelusi, D. (eds) Computational Intelligence in Pattern Recognition. Advances in Intelligent Systems and Computing, vol 999. Springer, Singapore. https://doi.org/10.1007/978-981-13-9042-5_43

Download citation

DOI: https://doi.org/10.1007/978-981-13-9042-5_43
Published: 18 August 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9041-8
Online ISBN: 978-981-13-9042-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics