Action Prediction in Smart Home Based on Reinforcement Learning

Hassan, Marwa; Atieh, Mirna

doi:10.1007/978-3-319-14424-5_22

Action Prediction in Smart Home Based on Reinforcement Learning

Marwa Hassan¹⁷ &
Mirna Atieh¹⁷

Conference paper
First Online: 27 December 2014

1636 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8456))

Abstract

This paper presents an “intelligent” environment that can be occupied by an elderly or handicapped person. It is characterized by its online learning and continuous adaptation based on a new algorithm called “Planning Q-learning Algorithm (PQLA)”. The user can make feedback promptly which simulates an algorithm that reconfigures the existing plans. The software adaptation is run under middleware “WCOMP” based on the aspect of assembly concept to adapt to the environmental changes.

Download conference paper PDF

1 Introduction

Ambient computing environments are enriched with devices that are present everywhere and minimized with a lot of computational capabilities. One of its main applications is ambient assisted living and smart homes [1] where the user expects many services that assist him in his daily living concerning health status, activities, and control home appliances [2].

In this work we present architecture of a smart home that adapts dynamically based on continuous learning process to the changes in the environment. It may be occupied by an elderly or handicapped person that can use an assistive platform. The system automates actions by predicting them according to continuous monitoring of the inhabitant behavior. To satisfy calmly the user intents we combine high level learning and reasoning with Low level adaptation to context changes. We monitor user’s actions and context using planning system and machine learning to generate online plans. The plans are modeled based on Markov Decision Process (MDP). The main application is run over a context aware middleware (WCOMP) to adapt to environmental changes. In this contribution we focus on the interaction between the resident and the system. Unlike existing systems where plans are predefined and static, our planning system generates dynamic plans of user’s activities and prompt feedbacks online. In the first section challenges of ambient computing with the categories of user interactions with the smart environment are described. In second section the approach is detailed. Section three presents the whole architecture with its modules. In section four the model of the plan and its parameters is shown. The PQLA is deliberated in section five and section six provides a conclusion.

2 Problem Statement and Related Work

In ambient environments, for the services to be provided abstractly, two main challenges should be taken into consideration. First ambient assisted living system should provide user with abstract service that is adapted to the context and device changes. It should be heterogeneous, dynamic, and open [3]. Second the user should interact with the smart space to adapt to his needs. The ways of interaction can be categorized into three types: user configuration, predefined rules and system learning. In user configuration category inhabitants configure their own space by programming languages or through natural languages [4, 5]. In predefined rules category the designers may monitor users for a specific period of time [6], ask them about their preferences [7], or even depend on expert’s knowledge. In this case user may be unable to express his needs exactly, or he may forget different details. Also humans change their behaviors by time in addition they may feel that they are obliged with system’s decision. The above limitations motivate us to use the learning tools. For the environment to be more intelligent, the interaction with the system should be reduced [8]. Machine learning techniques could be offline where created patterns are fixed like in [9, 10] or online learning where patterns are dynamic and may change over time due to environment dynamicity and inhabitant’s behavior changes. Few works targeted the online learning like in [11]; they detect the changes in the behavior of the inhabitants from the user himself or by a smart detection method. Our work differs by its online and continuous learning. The system learns the change of user’s behavior by allowing him to make feedback actions directly on the devices manually.

3 Our Approach

Many works addressed the low level self-adaptation to context changes like [12] or the high level learning and reasoning to satisfy calmly the user intents. In this work we merge between those two levels by using:

A planning, reasoning and machine learning system to learn and predict user actions: planning and decision-making have been well studied in the AI community [13]. We present planning system based on MDP model coupled with q-learning. This system observes the user’s actions in various situations, context and the environmental changes then constructs online plans of those actions. The learning system will learn when to execute automatically user needs in the future. The system accepts user rejection (feedback) of a taken decision that leads to a reconfiguration of an existing plan. The new configuration is taken into consideration in the next prediction.

Context adaptive middleware to reconfigure the running application: the application is run on a context adaptive middleware (WCOMP) [3] that satisfies the challenges mentioned above. The output of the plan execution is the input that simulates the adaptation of the running the software application.

4 Architecture of the System

Based on our previous architecture [14] we present the detailed architecture of an intelligent system (Fig. 1). The environment is equipped with set of devices, sensors actuators where the user acts. Three types of interactions with the environment can change the overall state: interaction from user, interaction from environment or interaction from system. The ontology describes different devices in the smart space. The system retrieves information about devices from this ontology. The monitoring and classification module senses the incoming events and retrieves information about it and its device. It also infers the time zone of the incoming event. The inference module searches in the plan library for a plan corresponding to the incoming context.

We consider the time and location of the event as the contextual information since they are the main labels that characterize user actions. Orchestration and context management module sends the incoming event and the received context information to the appropriate module. In the plan construction module the system creates the new plans. All the new incoming events are recorded in new plans and saved in plan library. In [15, 16] they define plan library as a set of plans or predefined steps. In this proposition plan library is constructed online through monitoring the user. The execute plan module executes inferred plan following the policy of a new algorithm based on Q-learning algorithm [17] called PQLA. It calculates the Q values based on the returned reward. The update plan module receives the needed information about feedback. It updates the existing plan, adds the new changes due to feedback, reconfigures the probabilities and saves the changes in the plan library.

5 Plan Model: Markov Decision Process

Plans are constructed based on Markov Decision Process (MDP’s) model. Below we define each of the MDP parameters in our plans (Fig. 2):

States represent the state of the devices in the environment where any change leads to transition to a new state. Action leads to a transition from state to the next state. We have to define two types of probabilities the transition probability P(S1/S, a) and the action probability P (a/S). Reward is specified through online interaction; the software calculates the reward based on the user feedback or satisfaction. If the user accepts the software decision, a reward of value 1 will be given to the software and the new learning values will be calculated. Else a reward of value zero will be given to the system and new learning values will be calculated.

6 Planning Q-Learning Algorithm “PQLA”

Q learning develops a computational approach to solve the problem of learning through interaction [18]. It allows the machine to learn its behavior based on feedback from the environment. We define a new Q-learning algorithm based on the knowledge extracted from the generated plans “PQLA” used to make decisions to predict the future actions. In PQLA the agent doesn’t learn only from the immediate rewards but it depends on the learnt knowledge from the generated plans. This knowledge is used to initialize two matrices in the algorithm the probability table and the policy table. The probability table contains the probabilities for each pair of state and action. The policy table contains the sum of the Q-values table and the probability table for each pair of state and action. This algorithm will choose the actions not randomly but according to the values in the policy table.

PQLA algorithm

1.
Set the alpha and gamma parameters;
2.
Rewards are not set before. Rewards are based on the users satisfaction;
3.
Initialize Q matrix (based on previous experience);
4.
Initialize Probability matrix (based on probabilities from the plans);
5.
Calculate the Policy matrix (Add the Q matrix and the Probability matrices).
6.
For each inferred plan

7 Software Adaptation Middleware

The main software application will be run on WCOMP middleware [3] where the adaptation to the environment changes is based on the aspect of assembly concept. Each executed action in the plan will produce an output of the condition-action form that will lead to a selection of a set of aspects of assembly “AAs”. Those AAs (Fig. 3) are deployed in WCOMP and adaptation to changes will occur.

8 Conclusion and Perspectives

We presented architecture for a smart environment that can be occupied by an elderly. We conciliate between a machine learning approach and context aware adaptation approach. At the high level a planning, reasoning and learning system is used to monitor user and predict the future actions. The user is always in the loop since he can do feedbacks to the system that reflects his satisfaction. At the low level layer the application is run under a context aware middleware WCOMP. This application is reconfigured based on the changes in the environment.

References

Augusto, J.C., Nugent, C.D. (eds.): Designing Smart Homes. LNCS (LNAI), vol. 4008. Springer, Heidelberg (2006)
Google Scholar
Abowd, G., Mynatt, E.D.: Designing for human experience in smart environments. In: Cook, D.J., Das, S.K. (eds.) Smart Environments: Technology, Protocols and Applications, pp. 153–174. Willey, New York (2005)
Google Scholar
Tigli, J.Y., Riveill, M., Rey, G., Lavirotte, S., Hourdin, V., Cheung-Foo-Woo, D., Callegari, E.: A middleware for ubiquitous computing: Wcomp. Research Report, Sophia Antipolis University – I3S Laboratory, Nice (2008)
Google Scholar
Jara, A.J., Zamora, M.A., Skarmeta, A.F.: An architecture for ambient assisted living and health environments. In: Omatu, S., Rocha, M.P., Bravo, J., Fernández, F., Corchado, E., Bustillo, A., Corchado, J.M. (eds.) IWANN 2009, Part II. LNCS, vol. 5518, pp. 882–889. Springer, Heidelberg (2009)
Chapter Google Scholar
Fontaine, E.: Programmation d’espace intelligent par l’utilisateur final. Ph.D. thesis, Brenoble University - Informatic Laboratory, France (2012)
Google Scholar
Baldoni, R., Di Ciccio, C., Mecella, M., Patrizi, F., Querzoni, L., Santucci, G.: An embedded middleware platform for pervasive and immersive environments for-All. IEEE, Rome (2009)
Google Scholar
Davis, G., Wiratunga, N., Taylor, B., Craw, S.: Matching SMARTHOUSE technology to needs of the elderly and disabled. In: ICCBR 03, Norway, pp. 29–36 (2004)
Google Scholar
Dix, A., Finlay, J., Abowd, G., Beale, R.: Human–Computer Interaction, 3rd edn. Pearson/Prentice Hall, Harlow (2004)
Google Scholar
Jakkula, V.R., Youngblood, G.M., Cook, D.J.: Identification of lifestyle behavior patterns with prediction of the happiness of an inhabitant in a smart home. AAAI, Boston (2006)
Google Scholar
Sajal, K., Diane, J., Bhattacharya, A., Heierman, E., Tze-Yun, L.: The role of prediction algorithms in the MavHome smart home architecture. IEEE Wireless Commun. 9(6), 77–84 (2002)
Article Google Scholar
Rashidi, P., Cook, D.: An adaptive sensor mining model for pervasive computing applications. ACM, Cyprus (2004)
Google Scholar
Wang, Q., Cheng, L.: AwareWare: an adaptation middleware for heterogeneous environments. In: IEEE International Conference on Communications, Paris (2004)
Google Scholar
Russell, S., Norvig, J.: Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River (2003)
Google Scholar
Hassan, M., Mougharbel, I., Meskawi, N., Tigli, J.-Y., Riveill, M.: Design considerations for assistive platforms in ambient computing for disabled people - wheelchair in an ambient environment. In: Abdulrazak, B., Giroux, S., Bouchard, B., Pigot, H., Mokhtari, M. (eds.) ICOST 2011. LNCS, vol. 6719, pp. 246–250. Springer, Heidelberg (2011)
Chapter Google Scholar
Snoeck, N., Kranenburg, H., Eertink, H.: Plan recognition in smart environments. In: IEEE ICDIM 07, Lyon, France (2007)
Google Scholar
Boger, J., Hoey, J., Poupart, P., Boutilier, C., Fernie, G., Mihailidis, A.: A planning system based on Markov decision processes to guide people with dementia through activities of daily living. IEEE Inf. Technol. Biomed. 10(2), 323–333 (2006)
Article Google Scholar
Szepesvari, C.: Reinforcement Learning Algorithms for MDPs. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool, San Rafael (2009)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction Adaptive Computation and Machine Learning. MIT Press, Cambridge (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Economic Sciences and Business Administration, Lebanese University, Hadat, Lebanon
Marwa Hassan & Mirna Atieh

Authors

Marwa Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Mirna Atieh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Marwa Hassan or Mirna Atieh .

Editor information

Editors and Affiliations

University of Colorado, Denver, Colorado, USA
Cathy Bodine
University of Florida, Gainesville, Florida, USA
Sumi Helal
RMIT University, Melbourne, Victoria, Australia
Tao Gu
Institut Mines Telecom/CNRS IPAL (UMI 2955), Singapore, Singapore
Mounir Mokhtari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hassan, M., Atieh, M. (2015). Action Prediction in Smart Home Based on Reinforcement Learning. In: Bodine, C., Helal, S., Gu, T., Mokhtari, M. (eds) Smart Homes and Health Telematics. ICOST 2014. Lecture Notes in Computer Science(), vol 8456. Springer, Cham. https://doi.org/10.1007/978-3-319-14424-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-14424-5_22
Published: 27 December 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14423-8
Online ISBN: 978-3-319-14424-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics