Cluster Computing

, Volume 19, Issue 3, pp 1309–1321 | Cite as

An examination of on-line machine learning approaches for pseudo-random generated data

  • Jia Zhu
  • Chuanhua Xu
  • Zhixu Li
  • Gabriel Fung
  • Xueqin Lin
  • Jin Huang
  • Changqin Huang


A pseudo-random generator is an algorithm to generate a sequence of objects determined by a truly random seed which is not truly random. It has been widely used in many applications, such as cryptography and simulations. In this article, we examine current popular machine learning algorithms with various on-line algorithms for pseudo-random generated data in order to find out which machine learning approach is more suitable for this kind of data for prediction based on on-line algorithms. To further improve the prediction performance, we propose a novel sample weighted algorithm that takes generalization errors in each iteration into account. We perform intensive evaluation on real Baccarat data generated by Casino machines and random number generated by a popular Java program, which are two typical examples of pseudo-random generated data. The experimental results show that support vector machine and k-nearest neighbors have better performance than others with and without sample weighted algorithm in the evaluation data set.


On-line learning Machine learning Sample weighted algorithm 



This work was supported by the Youth Teacher Startup Fund of South China Normal University (No. 14KJ18), the Natural Science Foundation of Guangdong Province, China (No. 2015A030310509), the National Natural Science Foundation of China(61370229,61272067), the National Key Technology R&D Program of China (No. 2014BAH28F02) and the S&T Projects of Guangdong Province (Nos. 2014B010103004, 2014B010117007, 2015A030401087, 2015B010110002, 2016B030305004, 2016A030303055 and 2016B010109008).


  1. 1.
    Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992)MathSciNetGoogle Scholar
  2. 2.
    Barker, E., Barker, W., Burr, W., Polk, W., Smid, M.: Recommendation for key management. NIST Special Publication (2013)Google Scholar
  3. 3.
    Belmouhcine. A., Benkhalifa, M.: Implicit links-based techniques to enrich k-nearest neighbors and naive Bayes algorithms for web page classification. In: Proceedings of the 9th International Conference on Computer Recognition Systems, pp. 755–766 (2016)Google Scholar
  4. 4.
    Bhalke, D.G., Rama Rao, C.B., Bormane, D.S.: Automatic musical instrument classification using fractional Fourier transform based-MFCC features and counter propagation neural network. J. Intell. Inf. Syst. 20(5), 425–426 (2015)Google Scholar
  5. 5.
    Bjorck, A.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)CrossRefMATHGoogle Scholar
  6. 6.
    Bottou, L.: Online algorithms and stochastic approximations. In: Saad, D. (ed.) Online Learning and Neural Networks. Cambridge University Press, Cambridge (1998)Google Scholar
  7. 7.
    Breiman, L., Friedman, J.H., Olshen, A.R., Stone, C.J.: Support-Vector Networks. Wadsworth and Brooks Cole Advanced Books and Software, Monterey (1984)Google Scholar
  8. 8.
    Caruana, R., Caruana, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning (2006)Google Scholar
  9. 9.
    Chen, F.H., Howard, H.: An alternative model for the analysis of detecting electronic industries earnings management using stepwise regression, random forest, and decision tree. Soft Comput. 20(5), 1945–1960 (2015)CrossRefGoogle Scholar
  10. 10.
    Connor, J.J., Robertson, E.F.: Student’s t-test. MacTutor History of Mathematics Archive, University of St Andrews (1908)Google Scholar
  11. 11.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATHGoogle Scholar
  12. 12.
    Dasarathy, B., Los Alamitos: Nearest Neighbor (NN) Norms: Nn Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)Google Scholar
  13. 13.
    Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–137 (1997)CrossRefMATHGoogle Scholar
  14. 14.
    Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs (1999)MATHGoogle Scholar
  15. 15.
    Jiang, M.W., Li, H.L.: Vehicle classification based on hierarchical support vector machine. In: Proceedings of the International Conference on Computer Engineering and Network, pp. 593–600 (2014)Google Scholar
  16. 16.
    Kennedy, A., Shepherd, M.: Automatic identification of home pages on the web. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, pp. 99–108 (2005)Google Scholar
  17. 17.
    Kumar, S., Sahoo, G.: Classification of heart disease using Naive Bayes and genetic algorithm. In: Proceedings of the International Conference on CIDM, pp. 269–282 (2014)Google Scholar
  18. 18.
    Li, D.G., Liu, X.B., Zhao, J.M.: An approach for J wave auto-detection based on support vector machine. In: Big Data Computing and Communications, pp. 435–461 (2015)Google Scholar
  19. 19.
    Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications—a decade review from 2000 to 2011. Expert Syst. Appl. 39, 11303–11311 (2012)CrossRefGoogle Scholar
  20. 20.
    Littlestone, N.: Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach. Learn. 2(4), 285–318 (1988)Google Scholar
  21. 21.
    Littlestone, N.: Mistake bounds and logarithmic linear-threshold learning algorithms. Technical report UCSC-CRL-89-11 (1989)Google Scholar
  22. 22.
    Mohri, M., Rostamizadeh, A., Talwalker, A.: Foundations of Machine Learning. MIT, Cambridge (2012)MATHGoogle Scholar
  23. 23.
    Nello, C., John, S.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)MATHGoogle Scholar
  24. 24.
    Prakash, V.J., Nithya, L.M.: A survey on semi-supervised learning techniques. Int. J. Comput. Trends Technol. 8(1), 25–29 (2014)CrossRefGoogle Scholar
  25. 25.
    Provost, F.J., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. Proc. ICML 98, 445–453 (1998)Google Scholar
  26. 26.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)Google Scholar
  27. 27.
    Rosenblatt, F.: The perceptron—a perceiving and recognizing automaton. Report 85-460-1 (1957)Google Scholar
  28. 28.
    von Neumann, J.: Various Techniques Used in Connection with Random Digits. Applied Mathematics Series, pp. 36–38. U.S. Government Printing Office, Washington (1951)Google Scholar
  29. 29.
    Wang, S.S., Jiang, L.X., Li, C.Q.: Adapting Naive Bayes tree for text classification. Knowl. Inf. Syst. 44(1), 77–89 (2015)CrossRefGoogle Scholar
  30. 30.
    Widrow, B., Hoff, M.E.: Adaptive switching circuits. In: Proceedings of WESCON Convention, pp. 96–140 (1960)Google Scholar
  31. 31.
    Yeung, D.S., Chan, P.P.K.: A novel dynamic fusion method using localized generalization error model. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 623–628 (2009)Google Scholar
  32. 32.
    Zhu, J., Yang, Y., Xie, Q., Wang, L., Hassan, S.: Robust hybrid name disambiguation framework for large databases. Scientometrics 98, 2255–2274 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Jia Zhu
    • 1
  • Chuanhua Xu
    • 1
  • Zhixu Li
    • 2
  • Gabriel Fung
    • 3
  • Xueqin Lin
    • 1
  • Jin Huang
    • 1
  • Changqin Huang
    • 1
  1. 1.School of Computer ScienceSouth China Normal UniversityGuangzhouChina
  2. 2.School of Computer Science and TechnologySoochow UniversitySoochowChina
  3. 3.Department of Systems Engineering and Engineering ManagementThe Chinese University of Hong KongHong KongChina

Personalised recommendations