Preliminary Cleaning and Transformation of Data in Data Mining Using PHP Pthreads Library

  • Yulia ShichkinaEmail author
  • Alexander KoblovEmail author
  • Kirill LysovEmail author
  • Oleg Iakushkin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10408)


The article deals with a special case of the preparation of data about the vehicles movements which comes in large volumes from the source to the accelerated applied methods of data mining. Data preparation goes through several stages from selecting the necessary fields and records to saving them with modified values into a new data structure. The source data which consist of 18 fields has a share of incorrect information and formats of numerical information that are not suitable for further processing. The source data is large in volume and processing it in the original form takes a very long time. The article shows how to use the pthreads library to organize multi-threaded processing of this data. To confirm the applicability of this library, the article presents the results of numerical experiments.


Data mining Data cleaning Data transformation PHP Pthreads 



The paper has been prepared within the scope of the state project “Initiative scientific project” of the main part of the state plan of the Ministry of Education and Science of Russian Federation (task № 2.6553.2017/BCH Basic Part) as well as supported by grant of Russian Fund for Basic Research (16-07-00886).


  1. 1.
    Piatetsky-Shapiro, G., Frawley, W.: Knowledge discovery in databases, 539 p. AAAI Press, December 1991. ISBN: 9780262660709Google Scholar
  2. 2.
    Shichkina, Y., Degtyarev, A., Koblov A.: Technology of cleaning and transforming data using the knowledge discovery in databases (KDD) technology for fast application of data mining methods. In: CEUR Workshop Proceedings. Selected Papers of the 7th International Conference Distributed Computing and Grid-Technologies in Science and Education, vol. 1787, pp. 428–434 (2017). urn:nbn:de:0074-1787-5Google Scholar
  3. 3.
    The state of the Octoverse. GitHub Octoverse (2016). Last accessed 1 Mar 2017
  4. 4.
    Programming languages ranking 2016, Tagline — fresh rankings and researches of Runet, 11 April 2016. Last accessed 1 Mar 2017
  5. 5.
    PHP: pthreads – Manual, PHP: PHP Manual. Last accessed 1 Mar 2017

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringSaint Petersburg Electrotechnical University “LETI”St. PetersburgRussia
  2. 2.Saint Petersburg State UniversitySt. PetersburgRussia

Personalised recommendations