Abstract
In multi-class classification tasks, like human activity recognition, it is often assumed that classes are separable. In real applications, this assumption becomes strong and generates inconsistencies. Besides, the most commonly used approach is to learn classes one-by-one against the others. This computational simplification principle introduces strong inductive biases on the learned theories. In fact, the natural connections among some classes, and not others, deserve to be taken into account. In this paper, we show that the organization of overlapping classes (multiple inheritances) into hierarchies considerably improves classification performances. This is particularly true in the case of activity recognition tasks featured in the SHL dataset. After theoretically showing the exponential complexity of possible class hierarchies, we propose an approach based on transfer affinity among the classes to determine an optimal hierarchy for the learning process. Extensive experiments show improved performances and a reduction in the number of examples needed to learn.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In our case, we select several body-motion modalities to be included in our experiments, among the 16 input modalities of the original dataset: accelerometer, gyroscope, etc. Segmentation and processing details are detailed in experimental part.
- 2.
Software package and code to reproduce empirical results are publicly available at https://github.com/sensor-rich/hierarchicalSHL.
References
Cai, L., Hofmann, T.: Hierarchical document categorization with support vector machines. In: CIKM, pp. 78–87 (2004)
Carpineti, C., et al.: Custom dual transportation mode detection by smartphone devices exploiting sensor diversity. In: PerCom wksh, pp. 367–372. IEEE (2018)
Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Incremental algorithms for hierarchical classification. JMLR 7, 31–54 (2006)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Costa, E., Lorena, A., Carvalho, A., Freitas, A.: A review of performance evaluation measures for hierarchical classifiers. In: Evaluation Methods for machine Learning II: papers from the AAAI-2007 Workshop, pp. 1–6 (2007)
Essaidi, M., Osmani, A., Rouveirol, C.: Learning dependent-concepts in ilp: Application to model-driven data warehouses. In: ILP, pp. 151–172 (2015)
Gjoreski, H., et al.: The university of sussex-huawei locomotion and transportation dataset for multimodal analytics with mobile devices. IEEE Access 6, 42592-42604 (2018)
Hamidi, M., Osmani, A.: Data generation process modeling for activity recognition. In: ECML-PKDD. Springer (2020)
Hamidi, M., Osmani, A., Alizadeh, P.: A multi-view architecture for the shl challenge. In: UbiComp/ISWC Adjunct, pp. 317–322 (2020)
Kosmopoulos, A., Partalas, I., Gaussier, E., Paliouras, G., Androutsopoulos, I.: Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Min. Knowl. Disc. 29(3), 820–865 (2014). https://doi.org/10.1007/s10618-014-0382-x
Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies: 1. hierarchical systems. Comput. J. 9(4), 373–380 (1967)
Nakamura, Y., et al.: Multi-stage activity inference for locomotion and transportation analytics of mobile users. In: UbiComp/ISWC, pp. 1579–1588 (2018)
Nguyen-Dinh, L.V., Calatroni, A., Tröster, G.: Robust online gesture recognition with crowdsourced annotations. JMLR 15(1), 3187–3220 (2014)
Peters, M.E., Ruder, S., Smith, N.A.: To tune or not to tune? adapting pretrained representations to diverse tasks. arXiv preprint arXiv:1903.05987 (2019)
Samie, F., Bauer, L., Henkel, J.: Hierarchical classification for constrained IoT devices: a case study on human activity recognition. IEEE IoT J. 7(9), 8287-8295 (2020)
Scheurer, S., et al.: Using domain knowledge for interpretable and competitive multi-class human activity recognition. Sensors 20(4), 1208 (2020)
Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1–2), 31–72 (2011)
Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: NIPS, pp. 2951–2959 (2012)
Stikic, M., Schiele, B.: Activity recognition from sparsely labeled data using multi-instance learning. In: Choudhury, T., Quigley, A., Strang, T., Suginuma, K. (eds.) LoCA 2009. LNCS, vol. 5561, pp. 156–173. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01721-6_10
Taran, V., Gordienko, Y., Rokovyi, A., Alienin, O., Stirenko, S.: Impact of ground truth annotation quality on performance of semantic image segmentation of traffic conditions. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds.) ICCSEEA 2019. AISC, vol. 938, pp. 183–193. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16621-2_17
Vapnik, V.: Principles of risk minimization for learning theory. In: NIPS (1992)
Vincent, P., et al.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. JMLR 11(12), (2010)
Wang, L., et al.: Summary of the sussex-huawei locomotion-transportation recognition challenge. In: UbiComp/ISWC, pp. 1521–1530 (2018)
Wehrmann, J., Cerri, R., Barros, R.: Hierarchical multi-label classification networks. In: ICML, pp. 5075–5084 (2018)
Yao, H., Wei, Y., Huang, J., Li, Z.: Hierarchically structured meta-learning. In: ICML, pp. 7045–7054 (2019)
Zamir, A.R., Sax, A., Shen, W., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: Disentangling task transfer learning. In: CVPR, pp. 3712–3722 (2018)
Zhou, D., Xiao, L., Wu, M.: Hierarchical classification via orthogonal transfer. In: ICML, pp. 801–808 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A
Proof
Theorem 1. It can be explained by observing that, for \(K+1\) concepts containing K existed concepts \(c_1, \cdots c_K\) and a new added concept \(\gamma \), we can produce the first level trees combinations as below. Notice that each atomic element o can be one of the \(c_1, \cdots c_K\) concepts. In order to compute the total number of trees combinations, we show what is the number of tree combinations by assigning the K concepts to each item:
-
\((\gamma (\overbrace{o\cdots o}^{K \text {concepts}}))\): the number of trees combinations by taking the concept labels into the account are: \({K \atopwithdelims ()0} L(1) \times 2 \times L(K)\); the reason for multiplying the number of trees combinations for K concepts to 2 is because while the left side contains an atomic \(\gamma \) concept, there are two choices for the right side of the tree in the first level: either we compute the total number of trees for K concepts from the first level or we keep the first level as a \(\overbrace{o\cdots o}^{K \text {concepts}}\) atomics and keep all K concepts together, then continue the number of K trees combinations from the second level of the tree.
-
\(((\gamma o)(\overbrace{o\cdots o}^{K-1 \text {concepts}}))\): similar to the previous part we have \({K \atopwithdelims ()1} L(2) \times 2 \times L(K-1)\) trees combinations by taking the concepts labels into the account. \({K \atopwithdelims ()1}\) indicates the number of combinations for choosing a concept from the K concept and put it with the new concept separately. While L(2) is the number of trees combinations for the left side of tree separated with the new concept \(\gamma \).
-
\(((\gamma oo)(\overbrace{o\cdots o}^{K-2 \text {concepts}}))\), \(\cdots \)
-
\(((\gamma \overbrace{o\cdots o}^{K-1 \text {concepts}})o)\): \({K \atopwithdelims ()K-1} L(K) L(1)\) in this special part, we follow the same formula except the single concept in the right side has only one possible combination in the first level equal to L(1).
All in all, the sum of these items calculates the total number of tree hierarchies for \(K+1\) concepts.
The first few number of total number of trees combinations for 1, 2, 3, 4, 5, 6, \(7, 8, 9,10, \cdots \) concepts are: 1, 1, 4, 26, 236, 2752, 39208, 660032, 12818912, \(282137824,\cdots \). In the case of the SHL dataset that we use in the empirical evaluation, we have 8 different concepts and thus, the number of different types of hierarchies for this case is \(L(8) = 660,032\).
Appendix B Training Details
We use Tensorflow for building the encoders/decoders. We construct encoders by stacking Conv1d/ReLU/MaxPool blocks. These blocks are followed by a Fully Connected/ReLU layers. Encoders performance estimation is based on the validation loss and is framed as a sequence classification problem. As a preprocessing step, annotated input streams from the huge SHL dataset are segmented into sequences of 6000 samples which correspond to a duration of 1 min. given a sampling rate 100 Hz. For weight optimization, we use stochastic gradient descent with Nesterov momentum of 0.9 and a learning-rate of 0.1 for a minimum of 12 epochs (we stop training if there is no improvement). Weight decay is set to 0.0001. Furthermore, to make the neural networks more stable, we use batch normalization on top of each convolutional layer. We use SVMs as our ERMs in the derived hierarchies.
Appendix C Evaluation Metrics
In hierarchical classification settings, the hierarchical structure is important and should be taken into account during model evaluation [17]. Various measures that account for the hierarchical structure of the learning process have been studied in the literature. They can be categorized into: distance-based; depth-dependent; semantics-based; and hierarchy-based measures. Each one is displaying advantages and disadvantages depending on the characteristics of the considered structure [5]. In our experiments, we use the H-loss, a hierarchy-based measure defined in [3]. This measure captures the intuition that “whenever a classification mistake is made on a node of the taxonomy, then no loss should be charged for any additional mistake occurring in the sub-tree of that node.” \(\ell _H(\hat{y}, y) = \sum _{i=1}^{N} \{ \hat{y}_i \ne y_i \wedge \hat{y}_j = y_j, j \in Anc(i) \}\), where \(\hat{y} = (\hat{y}_1, \cdots \hat{y}_N)\) is the predicted labels, \(y = (y_1, \cdots y_N)\) is the true labels, and Anc(i) is the set of ancestors for the node i.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Osmani, A., Hamidi, M., Alizadeh, P. (2021). Hierarchical Learning of Dependent Concepts for Human Activity Recognition. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12713. Springer, Cham. https://doi.org/10.1007/978-3-030-75765-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-75765-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75764-9
Online ISBN: 978-3-030-75765-6
eBook Packages: Computer ScienceComputer Science (R0)