Abstract
The generating of training datasets for machine learning projects is a topical problem. The cost of dataset formation can be considerably high, yet there is no guarantee of an acceptable quality of prepared data. The important issue of dataset generation is the labeling noise. The main causes of this phenomena are: expert errors, information insufficiency, subjective factors and so on. Labeling noise affects the learning stage of a neuronet and so increases the number of errors during the one’s functioning. In the current paper the technique to decrease the labeling noise level is proposed. It is based on the principals of the distributed ledger technology. While there is a possibility to decrease the labeling errors number, the services integration on the basis of distributed ledger allows to improve the efficiency of dataset forming.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Machine Learning Project Structure: Stages, Roles, and Tools. https://www.altexsoft.com/blog/datascience/machine-learning-project-structure-stages-roles-and-tools/. Accessed 21 May 2019
How much training data do you need? https://medium.com/@malay.haldar/how-much-training-data-do-you-need-da8ec091e956. Accessed 21 May 2019
Open Datasets for Deep Learning Every Data Scientist Must Work With. https://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/. Accessed 21 May 2019
How to Organize Data Labeling for Machine Learning: Approaches and Tools. https://www.kdnuggets.com/2018/05/data-labeling-machine-learning.html. Accessed 21 May 2019
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)
Hickey, R.J.: Noise modeling and evaluating learning from examples. Artif. Intell. 82(1–2), 157–179 (1996)
Beigman, E., Klebanov, B.B.: Learning with annotation noise. In: Su, K.-Y., Su, J., Wiebe, J., Li, H. (eds.) Proceedings of the Joint Conference of the 47th Annual Meeting ACL and 4th International Joint Conference on Natural Language Processing AFNLP, Suntec, Singapore, August 2009, vol. 1, pp. 280–287. World Scientific Publishing Co Pte Ltd, Singapore (2009)
Manwani, N., Sastry, P.S.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1–6 (2013)
Abellán, J., Masegosa, A.R.: Bagging schemes on the presence of class noise in classification. Expert Syst. Appl. 39(8), 6827–6837 (2012)
Bouveyron, C., Girard, S.: Robust supervised classification with mixture models: learning from data with uncertain labels. Pattern Recogn. 42(11), 2649–2658 (2009)
Evans, M., Guttman, I., Haitovsky, Y., Swartz, T.: Bayesian analysis of binary data subject to misclassification. In: Berry, D., Chaloner, K., Geweke, J. (eds.) Bayesian Analysis in Statistics and Econometrics: Essays in Honor of Arnold Zellner, IEEE Transactions on Neural Networks and Learning Systems, vol. 24, pp. 67–77. Wiley, New York (1996)
Gamberger, D., Lavrac, N., Dzeroski, S.: Noise detection and elimination in data preprocessing: experiments in medical domains. Appl. Artif. Intell. 14, 205–223 (2000)
Segata, N., Blanzieri, E., Delany, S., Cunningham, P.: Noise reduction for instance-based learning with a local maximal margin approach. J. Intell. Inf. Syst. 35(2), 301–331 (2010)
Karmaker, A., Kwek, S.: A boosting approach to remove class label noise. In: Nedjah, N., Mourelle, L.M., Vellasco, M.M.B.R., Abraham, A., Köppen, M. (eds.) Proceedings of the Fifth International Conference on Hybrid Intelligent Systems (HIS’05), Rio de Janeiro, Brazil. IEEE, Los Alamitos (2006)
Zhang, M.-L., Zhou, Z.-H.: CoTrade: confident co-training with data editing. IEEE Trans. Syst. Man Cybern. 41, 1612–1626 (2011)
An, W., Liang, M.: Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises. Neurocomputing 110(13), 101–110 (2013)
Distributed ledger technology: beyond block chain. https://www.gov.uk/government/news/distributed-ledger-technology-beyond-block-chain. Accessed 20 May 2019
Block lattice. https://github.com/nanocurrency/nano-node/wiki/Block-lattice. Accessed 21 May 2019
Nguyen, G., Kim, K.: A survey about consensus algorithms used in blockchain. J. Inf. Process. Syst. 14(1), 101–128 (2018)
BlockChain Technology: Beyond Bitcoin. http://scet.berkeley.edu/wp-content/uploads/AIR-2016-Blockchain.pdf. Accessed 21 May 2019
Acknowledgements
The paper has been prepared within the RFBR project 18-29-22086 and 18-29-22046.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Melnik, E.V., Klimenko, A.B., Ivanov, D.Y. (2019). The Distributed Ledger-Based Technique of the Neuronet Training Set Forming. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds) Computational Statistics and Mathematical Modeling Methods in Intelligent Systems. CoMeSySo 2019 2019. Advances in Intelligent Systems and Computing, vol 1047. Springer, Cham. https://doi.org/10.1007/978-3-030-31362-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-31362-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31361-6
Online ISBN: 978-3-030-31362-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)