The Regression of MNIST Dataset Based on Convolutional Neural Network

  • Ziheng Wang
  • Su Wu
  • Chang Liu
  • Shaozhi Wu
  • Kai XiaoEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 921)


The MNIST dataset of handwritten digits has been widely used for validating the effectiveness and efficiency of machine learning methods. Although this dataset was primarily used for classification and results of very high accuracy (99.3%+) on it have been obtained, its important application of regression is not directly applicable, thus substantially deteriorates its usefulness and the development of regression methods for such types of data. In this paper, to allow MNIST to be usable for regression, we firstly apply its class/label with normal distribution thereby convert the original discrete class numbers into float ones. Modified Convolutional Neural Networks (CNN) is then applied to generate a regression model. Multiple experiments have been conducted in order to select optimal parameters and layer settings for this application. Experimental results suggest that, optimal outcome of mean-absolute-error (MAE) value can be obtained when ReLu function is adopted for the first layer with other layers activated by the softplus functions. In the proposed approach, two indicators of MAE and Log-Cosh loss have been applied to optimize the parameters and score the predictions. Experiments on 10-fold cross-validation demonstrate that, desired low values of MAE and Log-Cosh error respectively at 0.202 and 0.079 can be achieved. Furthermore, multiple values of standard deviation of the normal distribution have been applied to verify the applicability when data of label number at varied distributions is used. The experimental results suggest that a positive correlation exists between the adopted standard deviation and the loss value, that is, the higher concentration degree of data will contribute to the lower MAE value.


MNIST dataset Convolutional neural network Regression 


  1. 1.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  2. 2.
    Grother, P.J.: NIST special database 19. Handprinted forms and characters database, National Institute of Standards and Technology (1995)Google Scholar
  3. 3.
    Matsugu, M., Mori, K., Mitari, Y., Kaneda, Y.: Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Netw. 16(5–6), 555–559 (2003)CrossRefGoogle Scholar
  4. 4.
    Cireşan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification (2012). arXiv preprint arXiv:1202.2745
  5. 5.
    Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Netw. 8(1), 98–113 (1997)CrossRefGoogle Scholar
  6. 6.
    Le Callet, P., Viard-Gaudin, C., Barba, D.: A convolutional neural network approach for objective video quality assessment. IEEE Trans. Neural Netw. 17(5), 1316–1327 (2006)CrossRefGoogle Scholar
  7. 7.
    van den Oord, A., Dieleman, S., Schrauwenvan, B.: Deep content-based music recommendation. Curran Associates, Inc., pp. 2643–2651 (2013)Google Scholar
  8. 8.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine learning, pp. 160–167. ACM (2008)Google Scholar
  9. 9.
    Pyrkov, T.V., Slipensky, K., Barg, M., Kondrashin, A., Zhurov, B., Zenin, A., Fedichev, P.O.: Extracting biological age from biomedical data via deep learning: too much of a good thing? Sci. Reports 8(1), 5210 (2018)CrossRefGoogle Scholar
  10. 10.
    Zang, J., Wang, L., Liu, Z., Zhang, Q., Hua, G., Zheng, N.: Attention-based temporal weighted convolutional neural network for action recognition. In: IFIP International Conference on Artificial Intelligence Applications and Innovations, pp. 97–108. Springer, Cham (2018)Google Scholar
  11. 11.
    Wald, A.: Statistical decision functions (1950)Google Scholar
  12. 12.
    Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends\(^{\textregistered }\) in Signal Process. 7(3–4), 197–387 (2014)Google Scholar
  13. 13.
    LeCun, Y.: LeNet-5, convolutional neural networks (2015)., 20
  14. 14.
    Zhang, W.: Shift-invariant pattern recognition neural network and its optical architecture. In: Proceedings of Annual Conference of the Japan Society of Applied Physics (1988)Google Scholar
  15. 15.
    Zhang, W., Itoh, K., Tanida, J., Ichioka, Y.: Parallel distributed processing model with local space-invariant interconnections and its optical architecture. Appl. Opt. 29(32), 4790–4797 (1990)CrossRefGoogle Scholar
  16. 16.
    McLachlan, G., Do, K.A., Ambroise, C.: Analyzing Microarray Gene Expression Data, vol. 422. Wiley, London (2005)zbMATHGoogle Scholar
  17. 17.
    Keras backends. Accessed 23 Feb 2018Google Scholar
  18. 18.
    Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30(1), 79–82 (2005)CrossRefGoogle Scholar
  19. 19.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint arXiv:1412.6980
  20. 20.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)Google Scholar
  21. 21.
    Dugas, C., Bengio, Y., Bélisle, F., Nadeau, C., Garcia, R.: Incorporating second-order functional knowledge for better option pricing. In: Advances in Neural Information Processing Systems, pp. 472–478 (2001)Google Scholar
  22. 22.
    Iglovikov, V.I., Rakhlin, A., Kalinin, A.A., Shvets, A.A.: Paediatric Bone age assessment using deep convolutional neural networks. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 300–308. Springer, Cham (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Ziheng Wang
    • 1
  • Su Wu
    • 2
  • Chang Liu
    • 2
  • Shaozhi Wu
    • 3
  • Kai Xiao
    • 4
    Email author
  1. 1.School of Aerospace Engineering and Applied MechanicsTongji UniversityShanghaiChina
  2. 2.School of Information and Software EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina
  3. 3.School of Computer Science and EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina
  4. 4.School of Electronic Information and Electrical EngineeringShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations