Abstract
Classification in the hidden layer of a single layer neural network with random weights has shown high accuracy in recent experimental studies. We further explore its classification and clustering performance in a compressed hidden space on a large cohort of datasets from the UCI machine learning archive. We compress the hidden layer with a simple bit-encoding that yields a comparable error to the original hidden layer thus reducing memory requirements and allowing to study up to a million random nodes. In comparison to the uncompressed hidden space we find classification error with the linear support vector machine to be statistically indistinguishable from that of the network’s compressed layer. We see that test error of the linear support vector machine in the compressed hidden layer improves marginally after 10,000 nodes and even rises when we reach one million nodes. We show that k-means clustering has an improved adjusted rand index and purity in the compressed hidden space compared to the original input space but only the latter by a statistically significant margin. We also see that semi-supervised k-nearest neighbor improves by a statistically significant margin when only 10% of labels are available. Finally we show that different classifiers have statistically significant lower error in the compressed hidden layer than the original space with the linear support vector machine reaching the lowest error. Overall our experiments show that while classification in our compressed hidden layer can achieve a low error competitive to the original space there is a saturation point beyond which the error does not improve, and that clustering and semi-supervised is better in the compressed hidden layer by a small yet statistically significant margin.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Sign. Syst. 2(4), 303–314 (1989)
Cover, T.M.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 14(3), 326–334 (1965)
Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)
Caruana, R., Lawrence, S., Lee Giles, C.: Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In: Advances in Neural Information Processing Systems, pp. 402–408 (2001)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT 2010, pp. 177–186. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-7908-2604-3_16
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev., 65–386 (1958)
Schmidt, W.F., Kraaijveld, M.A., Duin, R.P.W.: Feedforward neural networks with random weights. In: 11th IAPR International Conference on Pattern Recognition, vol. II. Conference B: Pattern Recognition Methodology and Systems, pp. 1–4. IEEE
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
Wang, Y., Li, Y., Xiong, M., Shugart, Y.Y., Jin, L.: Random bits regression: a strong general predictor for big data. Big Data Analytics 1(1), 12 (2016)
Wang, Y., et al.: Random bits forest: a strong classifier/regressor for big data. Sci. Rep. 6 (2016)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM (2001)
Johnson, W.B., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemp. Math. 26, 189–206 (1984)
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat., 1189–1232 (2001)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Manning, C., Raghavan, P., Schütze, H.: Introduction to information retrieval. Nat. Lang. Eng. 16(1), 100–103 (2010)
Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report. Citeseer (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Xie, M., Roshan, U. (2019). Exploring Classification, Clustering, and Its Limits in a Compressed Hidden Space of a Single Layer Neural Network with Random Weights. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science(), vol 11506. Springer, Cham. https://doi.org/10.1007/978-3-030-20521-8_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-20521-8_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20520-1
Online ISBN: 978-3-030-20521-8
eBook Packages: Computer ScienceComputer Science (R0)