Frontiers of Earth Science

, Volume 13, Issue 1, pp 180–190 | Cite as

Unsupervised learning on scientific ocean drilling datasets from the South China Sea

  • Kevin C. TseEmail author
  • Hon-Chim Chiu
  • Man-Yin Tsang
  • Yiliang Li
  • Edmund Y. Lam
Research Article


Unsupervised learning methods were applied to explore data patterns in multivariate geophysical datasets collected from ocean floor sediment core samples coming from scientific ocean drilling in the South China Sea. Compared to studies on similar datasets, but using supervised learning methods which are designed to make predictions based on sample training data, unsupervised learning methods require no a priori information and focus only on the input data. In this study, popular unsupervised learning methods including K-means, self-organizing maps, hierarchical clustering and random forest were coupled with different distance metrics to form exploratory data clusters. The resulting data clusters were externally validated with lithologic units and geologic time scales assigned to the datasets by conventional methods. Compact and connected data clusters displayed varying degrees of correspondence with existing classification by lithologic units and geologic time scales. K-means and self-organizing maps were observed to perform better with lithologic units while random forest corresponded best with geologic time scales. This study sets a pioneering example of how unsupervised machine learning methods can be used as an automatic processing tool for the increasingly high volume of scientific ocean drilling data.


machine learning unsupervised learning ODP IODP clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Augustijn E W, Zurita-Milla R (2013). Self-organizing maps as an approach to exploring spatiotemporal diffusion patterns. Int J Health Geogr, 12(1): 60CrossRefGoogle Scholar
  2. Baarsch J, Celebi M (2012). Investigation of internal validity measures for k-means clustering. In: Proceedings of the International Multi Conference of Engineers and Computer ScientistsGoogle Scholar
  3. Bedini E (2009). Mapping lithology of the Sarfartoq carbonatite complex, southern West Greenland, using HyMap imaging spectrometer data. Remote Sens Environ, 113(6): 1208–1219CrossRefGoogle Scholar
  4. Bedini E (2012). Mapping alteration minerals at Malmbjerg molybdenum deposit, central East Greenland, by Kohonen self-organizing maps and matched filter analysis of HyMap data. Int J Remote Sens, 33(4): 939–961CrossRefGoogle Scholar
  5. Benaouda D, Wadge G, Whitmarsh R B, Rothwell R G, MacLeod C (1999). Inferring the lithology of borehole rocks by applying neural network classifiers to downhole logs: an example from the ocean drilling program. Geophys J Int, 136(2): 477–491CrossRefGoogle Scholar
  6. Bierlein F P, Fraser S J, Brown W, Lees T (2008). Advanced methodologies for the analysis of databases of mineral deposits and major faults. Aust J Earth Sci, 55(1): 79–99CrossRefGoogle Scholar
  7. Breiman L (1984). Classification and Regression Trees. New York: Chapman & Hall, 87–91Google Scholar
  8. Breiman L (2001). Random forests. Mach Learn, 45(1): 5–32CrossRefGoogle Scholar
  9. Cantrell C D (2000). Modern Mathematical Methods for Physicists and Engineers. Cambridge University PressGoogle Scholar
  10. Chauhan S, Ruhaak W, Khan F, Enzmann F, Mielke P, Kersten M, Sass I (2016). Processing of rock core microtomogrpahy images: using seven different machine learning algorithms. Comput Geosci, 86: 120–128CrossRefGoogle Scholar
  11. Cracknell M J, Reading A M, McNeill A W (2014). Mapping geology and volcanic-hosted massive sulfide alteration in the Hellyer-Mt Charter region, Tasmania, using Random Forest and Self-Organising Maps. Aust J Earth Sci, 61(2): 287–304CrossRefGoogle Scholar
  12. Goetz J N, Brenning A, Petschko H, Leopold P (2015). Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Computers & Geosciences, 81: 1–11CrossRefGoogle Scholar
  13. Halkidi M, Batistakis Y, Vazirgiannis M (2002). Clustering validity checking methods: part II. ACM SIGMOD Rec, 31(3): 19–27CrossRefGoogle Scholar
  14. Hamel L (2009). Knowledge Discovery with Support Vector Machines. New York: John Wiley and Sons, 89–132CrossRefGoogle Scholar
  15. Hennig C (2015). What are the true clusters? Pattern Recognit Lett, 64: 53–62CrossRefGoogle Scholar
  16. Hubert L, Arabie P (1985). Comparing partitions. J Classif, 2(1): 193–218CrossRefGoogle Scholar
  17. Insua T L, Hamel L, Moran K, Anderson L M, Webster J M (2015). Advanced classification of carbonate sediments based on physical properties. Sedimentology, 62(2): 590–606CrossRefGoogle Scholar
  18. Jeong J, Park E (2016). Comparative Application of Various Machine Learning Techniques for Lithology Predictions. J Soil Groundw Environ, 21(3): 21–34CrossRefGoogle Scholar
  19. Kabacoff R I (2015). R in Action- Data analysis and graphics with R. Greenwich, CT: Manning, 102–112Google Scholar
  20. Kohonen T (1982). Self-organized formation of topologically correct feature maps. Biol Cybern, 43(1): 59–69CrossRefGoogle Scholar
  21. Kohonen T (2001). Self-Organizing Maps (3rd ed). New York: Springer, 132–154CrossRefGoogle Scholar
  22. Krause E F (1987). Taxicab Geometry- An Adventure in Non-Euclidean Geometry. Stroud, UK: Dover, 120–132Google Scholar
  23. Lary D J, Alavi A H, Gandomi A H, Walker A L (2016). Machine learning in geosciences and remote sensing. Geoscience Frontiers, 7 (1): 3–10CrossRefGoogle Scholar
  24. Li C F, Lin J, Kulhanek D K (2014). IODP expedition 349 preliminary report, South China Sea tectonics–Opening of the South China Sea and its implications for Southeast Asian tectonics, climates and deep mantle processes since the late Mesozoic. Technical reportGoogle Scholar
  25. Longo G, Brescia M, Djorgovski S, Cavuoti S, Donalek C (2014). Data driven discovery in astrophysics. Proceedings of ESA-ESRIN Conference: Big Data from Space 2014, Frascati, ItalyGoogle Scholar
  26. MacQueen J (1967). Some methods for classification and analysis of multivariate observations. In: Le Cam L M, Neyman J, eds. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. University of California, 281–297Google Scholar
  27. Marzo G A, Roush T L, Blanco A, Fonti S, Orofino V (2006). Cluster analysis of planetary remote sensing spectral data. Journal of Geophysical Research, 111: E03002CrossRefGoogle Scholar
  28. Moore G, Taira A, Klaus A, Becker K, Saffer M, Screaton E (2001). Proc. ODP, Init. Repts., 190. College Station, TX (Ocean Drilling Program)Google Scholar
  29. Murphy K P (2012). Machine Learning A Probabilistic Perspective. Cambridge: The MIT Press, 578–490Google Scholar
  30. Peeters L, Bação F, Lobo V, Dassargues A (2007). Exploratory data analysis and clustering of multivariate spatial hydrogeological data by means of GEO3DSOM, a variant of Kohonen’s self-organizing map. Hydrol Earth Syst Sci, 11(4): 1309–1321CrossRefGoogle Scholar
  31. Penn B S (2005). Using self-organizing maps to visualize highdimensional data. Comput Geosci, 31(5): 531–544CrossRefGoogle Scholar
  32. Pham B T, Bui D T, Prakash I (2017a). Landslide susceptibility assessment using bagging ensemble based alternating decision trees, logistic regression and J48 decision trees methods: a comparative study. Geotech Geol Eng, 35(6): 2597–2611CrossRefGoogle Scholar
  33. Pham B T, Khosravi K, Prakash I (2017b). Application and comparison of decision tree-based machine learning methods in landside susceptibility assessment at Pauri Garhwal area, Uttarakhand, India. Environmental Processes, 2017, 4(3): 711–730Google Scholar
  34. Pham B T, Tien Bui D, Pham H V, Le H Q, Prakash I, Dholakia M B (2016). Landslide hazard assessment using random subspace fuzzy rules based classifier ensemble and probability analysis of rainfall data: a case study at Mu Cang Chai District, Yen Bai Province (Viet Nam). Journal of the Indian Society of Remote Sensing, 45: 673–683CrossRefGoogle Scholar
  35. Rand W M (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336): 846–850CrossRefGoogle Scholar
  36. Ripley B D (1996). Pattern Recognition and Neural Networks. Cambridge University Press, 248–290CrossRefGoogle Scholar
  37. Romary T, Ors F, Rivoirard J, Deraisme J (2015). Unsupervised classification of multivariate geostatistical data: two algorithms. Comput Geosci, 85: 96–103CrossRefGoogle Scholar
  38. Schnase J L, Lee T J, Mattmann C A, Lynnes C S, Cinquini L, Ramirez P M, Hart A F, Williams D N, Waliser D, Rinsland P, Webster W P, Duffy D Q, McInerney M A, Tamkin G S, Potter G L, Carriere L (2016). Big data challenges in climate science. IEEE Geosciences and Remote Sensing, 4(3): 10–22CrossRefGoogle Scholar
  39. Templ M, Filzmoser P, Reimann C (2008). Cluster analysis applied to regional geochemical data: problems and possibilities. Appl Geochem, 23(8): 2198–2213CrossRefGoogle Scholar
  40. Wang P X, Li Q Y (2009). The South China Sea Paleoceanography and Sedimentology. New York: Springer, 388–421Google Scholar
  41. Warren Liao T (2005). Clustering of time series data- a survey. Pattern Recognit, 38(11): 1857–1874CrossRefGoogle Scholar
  42. Way M J, Scargle J D, Ali K M, Srivastava A N (2012). Advances in Machine Learning and Data Mining for Astronomy. New York: CRC Press, 240–312Google Scholar
  43. Wehrens R, Buydens L M C (2007). Self- and super-organising maps in R: the Kohonen package. Journal of Statistical Software, 21(5):1–19CrossRefGoogle Scholar
  44. Xiong Y, Zuo R (2016). Recognition of geochemical anomalies using a deep autoencoder network. Comput Geosci, 86: 75–82CrossRefGoogle Scholar
  45. Yao X, Tham L G, Dai F C (2008). Landslide susceptibility mapping based on Support Vector Machine: a case study on natural slopes of Hong Kong, China. Geomorphology, 101(4): 572–582CrossRefGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Kevin C. Tse
    • 1
    Email author
  • Hon-Chim Chiu
    • 2
  • Man-Yin Tsang
    • 3
  • Yiliang Li
    • 1
  • Edmund Y. Lam
    • 4
  1. 1.Department of Earth SciencesThe University of Hong KongPokfulam, Hong KongChina
  2. 2.Department of Geography and Centre for Geo-computation StudiesHong Kong Baptist UniversityKowloon Tong, Hong KongChina
  3. 3.Department of Earth SciencesUniversity of TorontoTorontoCanada
  4. 4.Department of Electrical and Electronic EngineeringThe University of Hong KongPokfulam, Hong KongChina

Personalised recommendations