Using Random Forests for Data Mining and Drowsy Driver Classification Using FOT Data

  • Cristofer Englund
  • Jordanka Kovaceva
  • Magdalena Lindman
  • John-Fredrik Grönvall
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7566)


Data mining techniques based on Random forests are explored to gain knowledge about data in a Field Operational Test (FOT) database. We compare the performance of a Random forest, a Support Vector Machine and a Neural network used to separate drowsy from alert drivers. 25 variables from the FOT data was utilized to train the models. It is experimentally shown that the Random forest outperforms the other methods while separating drowsy from alert drivers. It is also shown how the Random forest can be used for variable selection to find a subset of the variables that improves the classification accuracy. Furthermore it is shown that the data proximity matrix estimated from the Random forest trained using these variables can be used to improve both classification accuracy, outlier detection and data visualization.


Data mining Random Forest Drowsy Driver Detection Proximity Outlier detection Variable selection Field operational test 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann Publishers Inc., San Francisco (2005)zbMATHGoogle Scholar
  2. 2.
    Shneiderman, B.: Inventing Discovery Tools: Combining Information Visualization with Data Mining. In: Abe, N., Khardon, R., Zeugmann, T. (eds.) ALT 2001. LNCS (LNAI), vol. 2225, p. 58. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  3. 3.
    Zhu, D.: A hybrid approach for efficient ensambles. Decision Support Systems 48, 480–487 (2010)CrossRefGoogle Scholar
  4. 4.
    Bishop, C.: Pattern Recognition and Machine Learning. Springer, Singapore (2006)zbMATHGoogle Scholar
  5. 5.
    Vapnik, V.: Statistical Learning Theory. Whiley, New York (1998)zbMATHGoogle Scholar
  6. 6.
    Devroye, L., Gyorfi, L., Krzyzak, A., Lugosi, G.: On the strong universal consistency of nearest neighbor regression function estimates. Annals of Statistics 22, 1371–1385 (1994)MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)zbMATHGoogle Scholar
  8. 8.
    Kohonen, T.: Self-Organizing Maps. Springer, Berlin (1995) (Second Extended Edition 1997)CrossRefGoogle Scholar
  9. 9.
    Lesemann, M.: Testing and evaluation methods for ict-based safety systems, deliverable D1.1: State of the art and evalue scope. Technical report, eValue project (2008),
  10. 10.
    Kircher, A.: Vehicle control and drowsiness. VTI Meddelande 922A, Swedish National Road Transport Resesarch Institute, Linköping (2002)Google Scholar
  11. 11.
    Liu, C.C., Hosking, S.G., Lenné, M.G.: Predicting driver drowsiness using vehicle measures: Recent insights and future challenges. Journal of Safety Research 40, 239–245 (2009)CrossRefGoogle Scholar
  12. 12.
    Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)zbMATHCrossRefGoogle Scholar
  14. 14.
    Breiman, L., Cutler, A.: RFtools—for predicting and understanding data, Technical Report. Berkeley University, Berkeley, USA (2004)Google Scholar
  15. 15.
    Breiman, L.: Manual on setting up, using, and understanding random forests v3.1. Berkeley University, Berkeley (2002)Google Scholar
  16. 16.
    Kruskal, J., Wish, M.: Multidimensional scaling. Quantitative applications in the social sciences. Sage Publications (1978)Google Scholar
  17. 17.
    van der Maaten, L., Hinton, G.: Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research 9, 2579–2605 (2008)zbMATHGoogle Scholar
  18. 18.
    Wolpert, D.H., Macready, W.G.: No free lunch theorems for search. Technical Report SFI-TR-05-010, Santa Fe Institute (1995)Google Scholar
  19. 19.
    Verikas, A., Gelzinis, A., Bacauskiene, M.: Mining data with random forests: A survey and results of new tests. Pattern Recognition 44, 330–349 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Cristofer Englund
    • 1
  • Jordanka Kovaceva
    • 2
  • Magdalena Lindman
    • 2
  • John-Fredrik Grönvall
    • 2
  1. 1.Viktoria InstituteGothenburgSweden
  2. 2.Volvo Car CoorporationGothenburgSweden

Personalised recommendations