Investigating Engineering Data by Probabilistic Measures
A critical issue for data-based engineering is a lack of descriptive labels for the measured data. For many engineering systems, these labels are costly/impractical to obtain, and as a result, conventional supervised learning is not feasible. This article outlines a probabilistic framework for the investigation and labelling of engineering datasets. Two alternative probabilistic measures are suggested to define the most informative observations to investigate and annotate, in order to maximise the classification performance of a statistical model.
KeywordsActive learning Guided sampling Semi-supervised learning Online structural health monitoring
The authors gratefully acknowledge the support of the UK Engineering and Physical Sciences Research Council (EPSRC) through Grant reference number EP/R003645/1. Further thanks are extended to Karen Holford and Rhys Pullin at Cardiff University for providing the AE data.
- 6.Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from data streams. Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 757–762 (2007)Google Scholar
- 7.Murphy, K.P.: Conjugate bayesian analysis of the Gaussian distribution. Def 1(7), 1–29 (2007)Google Scholar
- 8.Dasgupta, S., Hsu, D.: Hierarchical sampling for active learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 208–215. ACM, New York (2008)Google Scholar
- 9.Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Advances in Neural Information Processing Systems, pp. 892–900 (2010)Google Scholar