1 Introduction

Ambient computing environments are enriched with devices that are present everywhere and minimized with a lot of computational capabilities. One of its main applications is ambient assisted living and smart homes [1] where the user expects many services that assist him in his daily living concerning health status, activities, and control home appliances [2].

In this work we present architecture of a smart home that adapts dynamically based on continuous learning process to the changes in the environment. It may be occupied by an elderly or handicapped person that can use an assistive platform. The system automates actions by predicting them according to continuous monitoring of the inhabitant behavior. To satisfy calmly the user intents we combine high level learning and reasoning with Low level adaptation to context changes. We monitor user’s actions and context using planning system and machine learning to generate online plans. The plans are modeled based on Markov Decision Process (MDP). The main application is run over a context aware middleware (WCOMP) to adapt to environmental changes. In this contribution we focus on the interaction between the resident and the system. Unlike existing systems where plans are predefined and static, our planning system generates dynamic plans of user’s activities and prompt feedbacks online. In the first section challenges of ambient computing with the categories of user interactions with the smart environment are described. In second section the approach is detailed. Section three presents the whole architecture with its modules. In section four the model of the plan and its parameters is shown. The PQLA is deliberated in section five and section six provides a conclusion.

2 Problem Statement and Related Work

In ambient environments, for the services to be provided abstractly, two main challenges should be taken into consideration. First ambient assisted living system should provide user with abstract service that is adapted to the context and device changes. It should be heterogeneous, dynamic, and open [3]. Second the user should interact with the smart space to adapt to his needs. The ways of interaction can be categorized into three types: user configuration, predefined rules and system learning. In user configuration category inhabitants configure their own space by programming languages or through natural languages [4, 5]. In predefined rules category the designers may monitor users for a specific period of time [6], ask them about their preferences [7], or even depend on expert’s knowledge. In this case user may be unable to express his needs exactly, or he may forget different details. Also humans change their behaviors by time in addition they may feel that they are obliged with system’s decision. The above limitations motivate us to use the learning tools. For the environment to be more intelligent, the interaction with the system should be reduced [8]. Machine learning techniques could be offline where created patterns are fixed like in [9, 10] or online learning where patterns are dynamic and may change over time due to environment dynamicity and inhabitant’s behavior changes. Few works targeted the online learning like in [11]; they detect the changes in the behavior of the inhabitants from the user himself or by a smart detection method. Our work differs by its online and continuous learning. The system learns the change of user’s behavior by allowing him to make feedback actions directly on the devices manually.

3 Our Approach

Many works addressed the low level self-adaptation to context changes like [12] or the high level learning and reasoning to satisfy calmly the user intents. In this work we merge between those two levels by using:

A planning, reasoning and machine learning system to learn and predict user actions: planning and decision-making have been well studied in the AI community [13]. We present planning system based on MDP model coupled with q-learning. This system observes the user’s actions in various situations, context and the environmental changes then constructs online plans of those actions. The learning system will learn when to execute automatically user needs in the future. The system accepts user rejection (feedback) of a taken decision that leads to a reconfiguration of an existing plan. The new configuration is taken into consideration in the next prediction.

Context adaptive middleware to reconfigure the running application: the application is run on a context adaptive middleware (WCOMP) [3] that satisfies the challenges mentioned above. The output of the plan execution is the input that simulates the adaptation of the running the software application.

4 Architecture of the System

Based on our previous architecture [14] we present the detailed architecture of an intelligent system (Fig. 1). The environment is equipped with set of devices, sensors actuators where the user acts. Three types of interactions with the environment can change the overall state: interaction from user, interaction from environment or interaction from system. The ontology describes different devices in the smart space. The system retrieves information about devices from this ontology. The monitoring and classification module senses the incoming events and retrieves information about it and its device. It also infers the time zone of the incoming event. The inference module searches in the plan library for a plan corresponding to the incoming context.

Fig. 1.
figure 1

The detailed architecture of the intelligent system

We consider the time and location of the event as the contextual information since they are the main labels that characterize user actions. Orchestration and context management module sends the incoming event and the received context information to the appropriate module. In the plan construction module the system creates the new plans. All the new incoming events are recorded in new plans and saved in plan library. In [15, 16] they define plan library as a set of plans or predefined steps. In this proposition plan library is constructed online through monitoring the user. The execute plan module executes inferred plan following the policy of a new algorithm based on Q-learning algorithm [17] called PQLA. It calculates the Q values based on the returned reward. The update plan module receives the needed information about feedback. It updates the existing plan, adds the new changes due to feedback, reconfigures the probabilities and saves the changes in the plan library.

5 Plan Model: Markov Decision Process

Plans are constructed based on Markov Decision Process (MDP’s) model. Below we define each of the MDP parameters in our plans (Fig. 2):

Fig. 2.
figure 2

A graphical representation of a plan

States represent the state of the devices in the environment where any change leads to transition to a new state. Action leads to a transition from state to the next state. We have to define two types of probabilities the transition probability P(S1/S, a) and the action probability P (a/S). Reward is specified through online interaction; the software calculates the reward based on the user feedback or satisfaction. If the user accepts the software decision, a reward of value 1 will be given to the software and the new learning values will be calculated. Else a reward of value zero will be given to the system and new learning values will be calculated.

6 Planning Q-Learning Algorithm “PQLA”

Q learning develops a computational approach to solve the problem of learning through interaction [18]. It allows the machine to learn its behavior based on feedback from the environment. We define a new Q-learning algorithm based on the knowledge extracted from the generated plans “PQLA” used to make decisions to predict the future actions. In PQLA the agent doesn’t learn only from the immediate rewards but it depends on the learnt knowledge from the generated plans. This knowledge is used to initialize two matrices in the algorithm the probability table and the policy table. The probability table contains the probabilities for each pair of state and action. The policy table contains the sum of the Q-values table and the probability table for each pair of state and action. This algorithm will choose the actions not randomly but according to the values in the policy table.

PQLA algorithm

  1. 1.

    Set the alpha and gamma parameters;

  2. 2.

    Rewards are not set before. Rewards are based on the users satisfaction;

  3. 3.

    Initialize Q matrix (based on previous experience);

  4. 4.

    Initialize Probability matrix (based on probabilities from the plans);

  5. 5.

    Calculate the Policy matrix (Add the Q matrix and the Probability matrices).

  6. 6.

    For each inferred plan

7 Software Adaptation Middleware

The main software application will be run on WCOMP middleware [3] where the adaptation to the environment changes is based on the aspect of assembly concept. Each executed action in the plan will produce an output of the condition-action form that will lead to a selection of a set of aspects of assembly “AAs”. Those AAs (Fig. 3) are deployed in WCOMP and adaptation to changes will occur.

Fig. 3.
figure 3

AA example

8 Conclusion and Perspectives

We presented architecture for a smart environment that can be occupied by an elderly. We conciliate between a machine learning approach and context aware adaptation approach. At the high level a planning, reasoning and learning system is used to monitor user and predict the future actions. The user is always in the loop since he can do feedbacks to the system that reflects his satisfaction. At the low level layer the application is run under a context aware middleware WCOMP. This application is reconfigured based on the changes in the environment.