Advertisement

Detection of Outlier in 3D Flow Velocity Collection in an Open-Channel Bend Using Various Data Mining Techniques

Research Paper
  • 6 Downloads

Abstract

Data collection related to the flow pattern has always been associated with outliers due to various reasons. Outlier detection in flow pattern experiments is of high importance and results in a better and more accurate understanding of the flow pattern. In this study, six data mining methods have been used to identify the outliers in flow pattern experiments. The discussed methods include box plot, histograms, linear regression, k-nearest neighbors, local outlier factor, k-medoids clustering, multilayer perceptron, and self-organizing map. The main aim of this study is to detect the outliers in data collection in order to conduct flow pattern experiments using the data mining methods. These methods have been analyzed and compared with each other in a case study and their performance evaluated. The experimental outliers under investigation were emanated from flow pattern experiments around a spur dike located in a 90° bend using Vectrino velocimeter (ADV). The range of velocity measurement of this device is between ± 0.01 and ± 4 m/s, and measurement accuracy is 1 mm/s. Also, the frequency is set at 50 Hz. The comparisons of different outlier detection methods results demonstrated that the box plot and the local outlier factor methods have the best performance.

Keywords

Flow pattern Outlier detection Data mining ADV 

References

  1. Aggarwal CC, Procopiuc CM, Wolf JL, Yu PS (1999) Fast algorithms for projected clustering. In: Proceeding of international conference on management of data, PhiladelphiaGoogle Scholar
  2. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceeding of international conference on management of data, SeattleGoogle Scholar
  3. Alarcon-Aquino V, Garcia-Baleon HA, Ramirez-Cortes JM, Gomez-Gil P, Starostenko O (2011) Biometric cryptosystem based on keystroke dynamics and k-Medoids. IETE J Res 57:385–394.  https://doi.org/10.4103/0377-2063.86341 CrossRefGoogle Scholar
  4. Alih E, Ong HC (2015) Cluster-based multivariate outlier identification and re-weighted regression in linear models. J Appl Stat 42:938–955.  https://doi.org/10.1080/02664763.2014.993366 MathSciNetCrossRefGoogle Scholar
  5. Amiri M, Amnieh HB, Hasanipanah M, Khanli LM (2016) A new combination of artificial neural network and k-nearest neighbors models to predict blast-induced ground vibration and air-overpressure. Eng Comput.  https://doi.org/10.1007/s00366-016-0442-5 Google Scholar
  6. Azari T, Samani N, Mansoori E (2015) An artificial neural network model for the determination of leaky confined aquifer parameters: an accurate alternative to type curve matching methods. Iran J Sci Technol 39:463–472Google Scholar
  7. Bishop C (1995) Neural networks for pattern recognition. Oxford University, New YorkMATHGoogle Scholar
  8. Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM Sigmod international conference on management of data, vol 29. ACM, New York, NY, USA, pp 93–104Google Scholar
  9. Burago D, Burago YD, Ivanov S (2001) A course in metric geometry. American Mathematical Society, Rhode IslandCrossRefMATHGoogle Scholar
  10. Cea L, Puertas J, Pena L (2007) Velocity measurements on highly turbulent free surface flow using ADV. Exp Fluids 42:333–348.  https://doi.org/10.1007/s00348-006-0237-3 CrossRefGoogle Scholar
  11. Corona F, Mulas M, Baratti R, Romagnoli JA (2010) On the topological modeling and analysis of industrial process data using the SOM. Comput Chem Eng 34:2022–2032.  https://doi.org/10.1016/j.compchemeng.2010.07.002 CrossRefGoogle Scholar
  12. De la Hoz E, De La Hoz E, Ortiz A, Ortega J, Prieto B (2015) PCA filtering and probabilistic SOM for network intrusion detection. Neurocomp 164:71–81.  https://doi.org/10.1016/j.neucom.2014.09.083 CrossRefGoogle Scholar
  13. Deza E, Deza MM (2009) Encyclopedia of distances. Springer, New YorkCrossRefMATHGoogle Scholar
  14. Dhhan W, Rana S, Midi H (2015) Non-sparse ∈-insensitive support vector regression for outlier detection. J Appl Stat 42:1723–1739.  https://doi.org/10.1080/02664763.2015.1005064 MathSciNetCrossRefGoogle Scholar
  15. Durgesh V, Thomson J, Richmond MC, Polagye BL (2014) Noise correction of turbulent spectra obtained from acoustic doppler velocimeters. Flow Meas Instrum 37:29–41.  https://doi.org/10.1016/j.flowmeasinst.2014.03.001 CrossRefGoogle Scholar
  16. Eskin E (2000) Anomaly detection over noisy data using learned probability distributions. In: Proceeding of 7th international conference on machine learning, StanfordGoogle Scholar
  17. Fustes D, Dafonte C, Arcay B, Manteiga M, Smith K, Vallenari A, Luri X (2013) SOM ensemble for unsupervised outlier analysis. Application to outlier identification in the Gaia astronomical survey. Expert Syst Appl 40:1530–1541.  https://doi.org/10.1016/j.eswa.2012.08.069 CrossRefGoogle Scholar
  18. Ghodsian M, Vaghefi M (2009) Experimental study on scour and flow field in a scour hole around a T-shaped spur dike in a 90 degree bend. J Sediment Res 24:145–158.  https://doi.org/10.1016/S1001-6279(09)60022-6 CrossRefGoogle Scholar
  19. Giraudel JL, Lek S (2001) A comparison of self-organizing map algorithm and some conventional statistical methods for ecological community ordination. Ecol Model 146:329–339.  https://doi.org/10.1016/S0304-3800(01)00324-6 CrossRefGoogle Scholar
  20. Goring DG, Nikora VI (2002) Despiking acoustic doppler velocimeter data. J Hydraul Eng 128:117–126.  https://doi.org/10.1061/(ASCE)0733-9429(2002)128:1(117) CrossRefGoogle Scholar
  21. Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San FranciscoMATHGoogle Scholar
  22. Hawkins D (1980) Identification of outliers. Chapman and Hall, LondonCrossRefMATHGoogle Scholar
  23. Heidari E, Sobati MA, Movahedirad S (2016) Accurate prediction of nanofluid viscosity using a multilayer perceptron artificial neural network (MLP-ANN). Chemom Intell Lab 155:73–85.  https://doi.org/10.1016/j.chemolab.2016.03.031 CrossRefGoogle Scholar
  24. Hejazi K, Falconer RA, Seifi E (2016) Denoising and despiking ADV velocity and salinity concentration data in turbulent stratified flows. Flow Meas Instrum 52:83–91.  https://doi.org/10.1016/j.flowmeasinst.2016.09.010 CrossRefGoogle Scholar
  25. Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4:251–257.  https://doi.org/10.1016/0893-6080(91)90009-T CrossRefGoogle Scholar
  26. Islam MR, Zhu DZ (2013) Kernel density–based algorithm for despiking ADV data. J Hydraul Eng 139:785–793.  https://doi.org/10.1061/(ASCE)HY.1943-7900.0000734 CrossRefGoogle Scholar
  27. Kang H (2013) Flow characteristics and morphological changes in open-channel flows with alternate vegetation zones. KSCE J Civ Eng 17:1157–1165.  https://doi.org/10.1007/s12205-013-0346-5 CrossRefGoogle Scholar
  28. Kaufman L, Rousseeuw PJ (1987) Clustering by means of medoids, in statistical data analysis based on the L1-norm and related methods. North-Holland, New YorkGoogle Scholar
  29. Khorsandi B, Mydlarski L, Gaskin S (2012) Noise in turbulence measurements using acoustic Doppler velocimetry. J Hydraul Eng 138:829–838.  https://doi.org/10.1061/(ASCE)HY.1943-7900.0000589 CrossRefGoogle Scholar
  30. Krause EF (1986) Taxicab geometry: an adventure in non-Euclidean geometry. Courier Dover, New YorkGoogle Scholar
  31. Liu X, Wanga X, Pedryczc W (2015) Fuzzy clustering with semantic interpretation. J Appl Soft Com 26:21–30.  https://doi.org/10.1016/j.asoc.2014.09.037 CrossRefGoogle Scholar
  32. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceeding of 5th Berkeley symposium on mathematical statistics and probability, BerkeleyGoogle Scholar
  33. Mahmoodi K, Rostami H, Saybany M, Moradi A (2013) An overview of the science of data mining and its applications in the offshore industry. In: Proceeding of 5th national offshore industries conference, TehranGoogle Scholar
  34. Mahmoodi K, Vaghefi M, Moradi A, Sayehbany M (2013) Identifying the errors in the data collection related to the flow and score pattern using the local outlier factor. In: Proceeding of 5th national offshore industries conference, TehranGoogle Scholar
  35. Nikora VI, Goring DG (2000) Flow turbulence over fixed weakly mobile gravel beds. J Hydraul Eng 126:679–690.  https://doi.org/10.1061/(ASCE)0733-9429(2000)126:9(679) CrossRefGoogle Scholar
  36. Nortek AS (2004) Nortek Vectrino velocimeter user guide. Nortek, NorwayGoogle Scholar
  37. Olawoyin R, Nieto A, Larry Grayson R, Hardisty F, Oyewole S (2013) Application of artificial neural network (ANN)—self-organizing map (SOM) for the categorization of water, soil and sediment quality in petrochemical regions. Expert Syst Appl 40:3634–3648.  https://doi.org/10.1016/j.eswa.2012.12.069 CrossRefGoogle Scholar
  38. Papadopoulos A (2014) Metric spaces, convexity and nonpositive curvature. European Mathematical Society, StrasbourgCrossRefMATHGoogle Scholar
  39. Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42:203–231.  https://doi.org/10.1023/A:1007601015854 CrossRefMATHGoogle Scholar
  40. Ramaswamy S, Rastogi R, Kyuseok S (2002) Efficient algorithms for mining outliers from large data sets. In: Proceeding international conference on management of data, MadisonGoogle Scholar
  41. Rashedi E, Mirzaei A, Rahmati M (2015) An information theoretic approach to hierarchical clustering combination. J Neurocomput 148:487–497.  https://doi.org/10.1016/j.neucom.2014.07.014 CrossRefGoogle Scholar
  42. Rehman MZ, Li T, Yang Y, Wang H (2014) Hyper-ellipsoidal clustering technique for evolving data stream. J Knowl Based Syst 70:3–14.  https://doi.org/10.1016/j.knosys.2013.11.022 CrossRefGoogle Scholar
  43. Shamim MA, Hassan M, Ahmad S, Zeeshan M (2015) A comparison of artificial neural networks (ANN) and local linear regression (LLR) techniques for predicting monthly reservoir levels. KSCE J Civ Eng.  https://doi.org/10.1007/s12205-015-0298-z Google Scholar
  44. Solberg HE, Lahti A (2005) Detection of outliers in reference distributions: performance of Horn’s algorithm. Clin Chem 51:2326–2332CrossRefGoogle Scholar
  45. Srimani PK, Koti MS (2012) Outliers mining in medical databases by using statistical methods. Int J Eng Sci Technol 4:239–246Google Scholar
  46. Sulaiman MS, Sinnakaudan SK, Shukor MR (2013) Near bed turbulence measurement with acoustic doppler velocimeter (ADV). KSCE J Civ Eng 17:1515–1528.  https://doi.org/10.1007/s12205-013-0084-8 CrossRefGoogle Scholar
  47. Theodoridis S, Koutroumbas K (2006) Pattern recognition. Academic Press, Inc., OrlandoMATHGoogle Scholar
  48. Vaghefi M, Ghodsian M, Salehi Neyshabori SAA (2009) Experimental study on the effect of a T-shaped spur dike length on scour in a 90 degree channel bend. Arab J Sci Eng 34:337–348Google Scholar
  49. Vaghefi M, Ghodsian M, Adib A (2010) Review of errors in data recovery laboratory. In: Proceeding of 9th Iranian hydraulic conference, TehranGoogle Scholar
  50. Vaghefi M, Ghodsian M, Salehi Neyshabori SAA (2012) Experimental study on scour around a T-shaped spur dike in a channel bend. J Hydraul Eng 138:471–474.  https://doi.org/10.1061/(ASCE)HY.1943-7900.0000536 CrossRefGoogle Scholar
  51. Vaghefi M, Akbari M, Fiouz AR (2015a) An experimental study of mean and turbulent flow in a 180 degree sharp open channel bend: secondary flow and bed shear stress. KSCE J Civ Eng.  https://doi.org/10.1007/s12205-015-1560-0 Google Scholar
  52. Vaghefi M, Safarpoor Y, Akbari M (2015b) Numerical investigation of flow pattern and components of three-dimensionalv around a submerged T-shaped spur dike in a 90 degree bend. J Cent South Univ 0: 1–15Google Scholar
  53. Wang Y, Zhang M, Wilson PA, Liu X (2015) Adaptive neural network-based backstepping fault tolerant control for underwater vehicles with thruster fault. Ocean Eng 110:15–24.  https://doi.org/10.1016/j.oceaneng.2015.09.035 CrossRefGoogle Scholar
  54. Wu ST, Chow TWS (2004) Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density. Pattern Recogn 37:175–188.  https://doi.org/10.1016/S0031-3203(03)00237-1 CrossRefMATHGoogle Scholar
  55. Xiekang W, Xingnian L (2016) Experimental investigation of flow structures and bed deformation with small width-to-depth ratio in a bend flume. KSCE J Civ Eng 20:497–508.  https://doi.org/10.1007/s12205-015-0654-z CrossRefGoogle Scholar
  56. Yafei H (2015) Discussion on the development of algorithm for despiking ADV data. Int J Sci Res 4:1018–1020Google Scholar
  57. Yan X (2011) Multivariate outlier detection based on self-organizing map and adaptive nonlinear map and its application. Chemom Intell Lab 107:251–257.  https://doi.org/10.1016/j.chemolab.2011.04.007 CrossRefGoogle Scholar
  58. Yang B, Zhang Q, Zhou Z (2015) Solving truss topological optimization via swarm intelligence. KSCE J Civ Eng.  https://doi.org/10.1007/s12205-015-0501-2 Google Scholar
  59. Zhang J (2008) Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy. Dissertation, Dalhousie UniversityGoogle Scholar
  60. Zhang T, Chen L, Ma F (2014) A modified rough c-means clustering algorithm based on hybrid imbalanced measure of distance and density. Intl J Approx Reason 55:1805–1818.  https://doi.org/10.1016/j.ijar.2014.05.004 CrossRefMATHGoogle Scholar

Copyright information

© Shiraz University 2018

Authors and Affiliations

  • Mohammad Vaghefi
    • 1
  • Kumars Mahmoodi
    • 2
  • Maryam Akbari
    • 3
  1. 1.Department of Civil EngineeringPersian Gulf UniversityBushehrIran
  2. 2.Department of Marine EngineeringAmirkabir University of TechnologyTehranIran
  3. 3.Department of Civil EngineeringPersian Gulf UniversityBushehrIran

Personalised recommendations