Echo State Network for Classification of Human Eye Movements During Decision Making

Koprinkova-Hristova, Petia; Stefanova, Miroslava; Genova, Bilyana; Bocheva, Nadejda

doi:10.1007/978-3-319-92007-8_29

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 519))

Included in the following conference series:

IFIP International Conference on Artificial Intelligence Applications and Innovations

2369 Accesses
3 Citations

Abstract

The paper develops further a recently proposed author’s approach for classification of dynamic data series using a class of Recurrent Neural Network (RNN) called Echo state network (ESN). It exploits the Intrinsic Plasticity (IP) tuning of ESN reservoir of neurons to fit their dynamics to the data fed into the reservoir input. A novel approach for ranking of a data base of dynamic data series into groups using the length of the multidimensional vector of reservoir state achieved after consecutive feeding of each time series into the ESN is proposed here. It is tested on eye tracker recordings of human eye movements during visual stimulation and decision making process. The preliminary results demonstrated the ability of the proposed technique to discriminate dynamic data series.

You have full access to this open access chapter, Download conference paper PDF

Features extraction from human eye movements via echo state network

Article 04 July 2019

Echo State Networks for Feature Selection in Affective Computing

Electrooculography Feature Extraction Techniques for Classification of Eye Movements

Keywords

1 Introduction

In previous author’s work [1] a novel approach for unsupervised clustering of static multidimensional data sets using a class of RNNs called Echo state networks [2, 3] was proposed. Next it was successfully tested on numerous practical examples and the results were summarized in [4]. In [5] the approach was upgraded and applied for classification of dynamic data series too.

The core of the approach, proposed first in [1], was to use the ESN to extract more informative features from multidimensional data sets. For this aim equilibrium states of the ESN reservoir neurons corresponding to every multidimensional data item presented to the ESN input were used. As it was shown in [6], the fitting of the ESN reservoir dynamics to reflect the input data structure can be achieved by an approach for ESN reservoir tuning called Intrinsic Plasticity (IP) [7, 8] that is aimed at achieving the desired distribution of the ESN reservoir output.

Since the number of the new extracted features depends on the size of the ESN, question how to choose the most proper among them is still under investigation. Initially [1, 4] it was proposed to choose only two of all possible neurons steady states based on their distribution in one- or two-dimensional space [4]. Next in [9] we tried to extend the number of representative neurons until the accuracy of data clustering increases. However, this approach increased computational burden too much. That is why here another approach was proposed: to use the geometric size of the vector of all reservoir states and then to rank the time series data based on that single feature.

The approach was tested on dynamic series of eye movements data collected during psycho-physiological experiments with humans observing specific visual stimuli and making decisions. The preliminary results demonstrated the ability of the proposed approach to rank the time series of human eye movements in dependence on their characteristics.

The paper is organized as follows: next chapter describes briefly the ESN structure and its IP tuning and the newly proposed feature extraction approach; next the experimental set-up and the collected time series data are described; the results from classification by the proposed algorithm are presented and discussed in section four; the paper finished with concluding remarks and directions for future work.

2 Clustering Algorithm

2.1 Echo State Network and IP Tuning

ESN, shown on Fig. 1, is a type of recurrent neural network that belongs to the novel and fast developing family of reservoir computing approaches [2, 3]. The ESN output for the current time instance k is the vector out(k) with size n_out. It is a linear function f^out (usually identity) of the vectors of the current states of the input in(k) (with size n_in) and the reservoir neurons X(k) (with size n_X):

$$ y\left( k \right) = f^{out} \left( {W^{out} \left[ { in\left( k \right), {X\left( k \right)} } \right]} \right) $$

(1)

Here W^out is a trainable $ n_{out} \times (n_{in} + n_{X} ) $ matrix. The neurons in the reservoir have a simple sigmoid output function f^res (usually hyperbolic tangent) that depends on both the ESN input in(k) and the previous reservoir state X(k − 1):

$$ X\left( k \right) = f^{res} \left( {W^{in} in\left( k \right) + W^{res} X\left( {k - 1} \right)} \right) $$

(2)

Here Wⁱⁿ and W^res are $ n_{in} \times n_{X} $ and $ n_{X} \times n_{X} $ randomly generated weight matrices that are not trainable.

The main reason for development of such a type of RNN is to simplify their training algorithm. However, it appeared that although non-trainable weights can be random, there is need to tune them initially. For this aim different approaches were proposed [2, 3]. In [7, 8] an algorithm called intrinsic plasticity (IP) was proposed. Its aim was to increase the entropy of the reservoir neurons outputs by minimization of the Kullback-Leibler divergence:

$$ D_{KL} \left( {p\left( X \right),p_{d} \left( X \right)} \right) = \int {p\left( r \right)\log \left( {\frac{p\left( X \right)}{{p_{d} \left( X \right)}}} \right)} $$

(2)

that is a measure for the difference between the actual p(X) and the desired p_d(X) probability distribution of reservoir neurons states X.

It was proven that for the commonly used hyperbolic tangent at the reservoir neurons output the proper target distribution has to be the Gaussian one. For this aim two additional reservoir parameters, gain a and bias b (both vectors with size n_X), were introduced in [8] as follows:

$$ X\left( k \right) = f^{res} \left( {diag\left( a \right)W^{in} in\left( k \right) + diag\left( a \right)W^{res} X\left( {k - 1} \right) + b} \right) $$

(3)

The IP training is gradient descent algorithm [8] minimizing the Kullback-Leibler divergence by adjustment of the vectors a and b.

2.2 Classification Approach for Dynamic Data Series

In [6] was demonstrated that besides its initial aim, the IP tuning also fits the reservoir connections matrix to the structure of the input data presented to the ESN. Moreover, the equilibrium states of reservoir neurons corresponding to each one of the input data items used during IP tuning reflect the overall data structure [1, 4]. Thus collected in this way features can be used for further classification or clustering.

In [5] it was demonstrated that the reservoir state X(N) reached after feeding of non-constant (time varying) sequence of inputs from in(0) to in(N) to the IP tuned ESN:

$$ \begin{array}{*{20}l} {X\left( N \right) = f^{res} \left( {diag\left( a \right)W^{in} in\left( N \right) + diag\left( a \right)W^{res} X\left( {N - 1} \right) + b} \right)} \hfill \\ {X\left( {N - 1} \right) = f^{res} \left( {diag\left( a \right)W^{in} in\left( {N - 1} \right) + diag\left( a \right)W^{res} X\left( {N - 2} \right) + b} \right)} \hfill \\ \cdots \hfill \\ {X\left( 1 \right) = f^{res} \left( {diag\left( a \right)W^{in} in\left( 0 \right) + diag\left( a \right)W^{res} X\left( 0 \right) + b} \right)} \hfill \\ \end{array} $$

(4)

depends on dynamic characteristics of the time series in and can be exploited as a set of classification features.

Since the choice of neurons whose states are the best feature set for each particular data subject to classification or clustering is non-trivial [4, 9], here we propose another approach: to calculate the size of the vector containing all collected reservoir neurons states:

$$ \begin{array}{*{20}c} {R = \sqrt {\sum\limits_{i = 1}^{{n_{X} }} {x_{i} \left( N \right)^{2} } } ,} & {X\left( N \right) = \left[ {\begin{array}{*{20}c} {x_{1} \left( N \right)} & {x_{2} \left( N \right)} & \cdots & {x_{{n_{X} }} \left( N \right)} \\ \end{array} } \right]} \\ \end{array} $$

(5)

and to use it as single discriminating data feature.

3 Experimental Set-Up

The time series data used to test the idea described above were collected by eye tracking device that recorded the human eye movements during a behavioral experiment performed with the participation of volunteer human subjects observing series of visual stimuli.

Each stimulus is composed by a sequence of consecutive frames. A frame contains of 50 dots presented in a circular aperture with a radius of 7.5 cm in the middle of the computer screen. The dots were grouped in 25 pairs placed at 2 cm distance from each other. Each pair of dots had a limited lifetime of 3 frames. On every frame one-third of the pairs changed position. Each frame lasted 33 ms. The orientation of the virtual lines connecting the dots in 18 pairs intersected in a common point considered as the center of each frame, while the rest 7 pairs had random orientations. The mean position of the centers of all frames in a stimulus sequence determines its “imaginary” center. We generated 14 different types of stimuli having centers at 7 positions shifted left and 7 positions shifted right from the screen midpoint. All shifts were in horizontal direction and varied between 0.67 cm and 4.67 cm with step of 0.67 cm. Ten different patterns for each center position were generated.

The stimuli were presented on a gray screen with mean luminance 50 cd/m² using 20.1″ NEC MultiSync LCD monitor with NvidiaQuadro 900XGL graphic board at a refresh rate of 60 Hz and screen resolution 1280 × 1024 pixels. The experiments were controlled by a custom program developed under Visual C++ and OpenGl.

The subject sat at 57 cm from the monitor screen. Each stimulus presentation was preceded by a warning sound signal. A red fixation point with size of 0.8 cm appeared in the center of the screen for 500 ms. The stimuli were presented immediately after the disappearance of the fixation point. The Subject’s task was to continue looking at the position where the fixation point was presented until he/she made a decision where the center of the pattern was and to indicate this position by a saccade (fast eye movement). The subjects also had to press the left or the right mouse button depending on the perceived position of the center - to the left or to the right from the middle of the screen. If the subject could not make a decision during the stimulus presentation (3.3 s for 100 consecutive frames), the stimulus disappeared and the screen remained gray until the subject made a response.

The eye movements of the participants in the experiment were recorded by a specialized hardware – Jazz novo eye tracking system (Ober Consulting Sp. Z o.o.). All recordings from all the sensors of the device for one session per person were collected with 1 kHz frequency and the information is stored in files. These include: the calibration information; records of horizontal and vertical eye positions in degrees of visual angle eye_x and eye_y; screen sensor signal for presence/absence of a stimulus on the monitor; microphone signal recording sounds during the experiment; information about tested subjects (code) and type of the experimental trail for each particular record.

The raw data were processed to extract only the records during presence of a stimulus on the screen. The data between the stimuli was excluded since it is not relevant to the eye movements during task performance.

Three age groups took part in the experiment: young (from 20 to 35 years), elderly (from 57 to 84 years) and middle age group (from 25 to 55 years). From all collected experimental data we observed big variety of eye movement behaviors varying not only between three age groups but also within each group. So it was very hard to classify test subjects only on the basis of this information. Hence we decided to try whether the proposed above dynamic data discrimination approach can yield some reasonable results.

The input to the ESN was two dimensional vector composed by the visual angles data series recorded during presence of a stimulus on the screen, i.e. $ in\left( k \right) = \left[ {\begin{array}{*{20}c} {eye_{x} \left( k \right)} & {eye_{y} \left( k \right)} \\ \end{array} } \right] $. We tuned three ESNs with reservoir sizes 10, 50 and 100 neurons using the IP algorithm described above. The extracted in this way feature of each dynamic data series was R calculated according to the Eq. (5). Thus the tested subjects were ordered based on the obtained value of R from their recorded eye movement data series.

4 Classification Results and Discussion

First we selected a representative group of four experienced test subjects from different age groups. These subjects took part in experimental set-up preparation so they were able to perform the behavioral tasks strictly and their eye movement recordings were clear from outliers due to improper behavior like looking to the mouse before clicking when decision was taken or keeping fixation. Such noisy behavior was observed with other volunteer subjects especially during their first trails.

Figure 2 represents the eye movements’ data series collected from these four “experienced” test subjects who performed first the described above experiment. Subject 1 is the youngest, subject 3 is middle aged and other two subjects (2 and 4) belong to the elderly group.

We can easily distinguish the middle aged subject while similarities between subjects 2 and 4 are not so obvious. From Fig. 3, representing the variances of the data series, we can conclude that age differentiation by this characteristic is also a hard task even for such small group.

Then we applied described above classification approach to these data series. In order to prove the expected effect of IP tuning of the ESN reservoir, we compare the obtained feature value before (Fig. 4) and after (Fig. 5) its application to the three randomly generated ESN reservoirs containing 10, 50 and 100 neurons.

From Fig. 5 we can conclude that IP tuning definitely helps to classify our four subjects by their age no matter of the reservoir size.

As it was observed, the middle age subject was the best during experiments and his eye movements on Fig. 2 were significantly different from the other persons in the group. This was confirmed by our algorithm according to which the middle aged subject 3 is clearly differentiated from the other three experienced subjects of different ages. Moreover, the two elderly subjects 2 and 4 were classified as close to each other while the younger person 1 was clearly differentiated too.

Then we proceed with data recorded from 18 volunteers from three age groups as shown on Figs. 6, 7 and 8 respectively. Figure 9 represents the variances of all data series from Figs. 6, 7 and 8.

Although we observe some similarities and differences between the three groups, there are also significant dissimilarities between subjects from the same group. For example, we can conclude that young subject 6 behaved significantly different from other members of Group 1; in Group 2 we can distinguish subjects 10 and 11 from others while in Group 3 subject 15 seems to have different eye movement behavior.

Here we decided to IP tune two ESN reservoirs (denoted here as ESN₁ and ESN₂) from each size since initial connections of ESN were randomly generated, and to compare the obtained results.

Figure 10 represents the achieved values of R from ESN₁ and ESN₂ as well as its mean value. It is clear that for both initial ESN reservoirs we obtained similar results after IP tuning, especially for the middle group.

Figure 11 represents the corresponding rank number of the subjects in dependence on the value of R from Fig. 10. The obtained subject order is similar for both ESN initial reservoirs. For most of the subjects two generations of ESN yielded the same rank. The same are the results from the ranking according to the mean value of R. The bigger differences are observed for the first and second subjects only.

However the obtained results showed that differentiation of tested subjects by age using only recorded eye movement behaviors is not possible since there are subjects from different age groups that received close ranks. Nevertheless, the approach seems promising for classification of types of eye movements during decision making that could be related to other psycho-physiological peculiarities of the tested subjects.

5 Conclusions

In conclusion, the proposed approach for classification of time series using the geometric size of the ESN reservoir vector state appears promising since it decreases significantly computational burden of the algorithm.

Concerning the classification of human subjects based on their eye movements, it is obvious that obtained ranking is not enough to separate the people into the groups related to their age. Since the achieved ranking of subjects remain stable using different initializations and sizes of ESN reservoir, we can search other similarities between tested subjects that this classification might reveal.

Another explanation of the reported results might be presence of outliers in the collected data since in these preliminary investigations the raw data was used. Hence further refinement of the collected experimental data base could help to reveal some age-related similarities of the recorded eye movements.

Another direction for future work can be inclusion of the additional information collected during the experiments like amplitude, velocity and acceleration of saccades performed during decision making as well as accuracy of the persons’ response and the corresponding reaction time for each individual stimulus. All these characteristics can serve as features to support classification of test subjects in our future work.

References

Koprinkova-Hristova, P., Tontchev, N.: Echo state networks for multi-dimensional data clustering. In: Villa, Alessandro E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012. LNCS, vol. 7552, pp. 571–578. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33269-2_72
Chapter Google Scholar
Jaeger, H.: Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach. GMD Report 159, German National Research Center for Information Technology (2002)
Google Scholar
Lukosevicius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127–149 (2009)
Article Google Scholar
Koprinkova-Hristova, P.: Multi-dimensional data clustering and visualization via echo state networks. In: Kountchev, R., Nakamatsu, K. (eds.) New Approaches in Intelligent Image Analysis. ISRL, vol. 108, pp. 93–122. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32192-9_3
Chapter Google Scholar
Koprinkova-Hristova, P., Alexiev, K.: Echo state networks in dynamic data clustering. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E.P., Appollini, B., Kasabov, N. (eds.) ICANN 2013. LNCS, vol. 8131, pp. 343–350. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40728-4_43
Chapter Google Scholar
Koprinkova-Hristova, P.: On effects of IP improvement of ESN reservoirs for reflecting of data structure. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN) 2015. IEEE, Killarney (2015). https://doi.org/10.1109/ijcnn.2015.7280703
Steil, J.J.: Online reservoir adaptation by intrinsic plasticity for back-propagation-decoleration and echo state learning. Neural Netw. 20, 353–364 (2007)
Article Google Scholar
Schrauwen, B., Wandermann, M., Verstraeten, D., Steil, J.J., Stroobandt, D.: Improving reservoirs using intrinsic plasticity. Neurocomputing 71, 1159–1171 (2008)
Article Google Scholar
Bozhkov, L., Koprinkova-Hristova, P., Georgieva, P.: Reservoir computing for emotion valence discrimination from EEG signals. Neurocomputing 231, 28–40 (2017)
Article Google Scholar

Download references

Acknowledgment

The reported work is a part of and was supported by the project № DN02/3/2016 “Modelling of voluntary saccadic eye movements during decision making” funded by the Bulgarian Science Fund.

Author information

Authors and Affiliations

Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
Petia Koprinkova-Hristova
Institute of Neurobiology, Bulgarian Academy of Sciences, Sofia, Bulgaria
Miroslava Stefanova, Bilyana Genova & Nadejda Bocheva

Authors

Petia Koprinkova-Hristova
View author publications
You can also search for this author in PubMed Google Scholar
Miroslava Stefanova
View author publications
You can also search for this author in PubMed Google Scholar
Bilyana Genova
View author publications
You can also search for this author in PubMed Google Scholar
Nadejda Bocheva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petia Koprinkova-Hristova .

Editor information

Editors and Affiliations

School of Engineering, Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
University of Piraeus, Piraeus, Greece
Ilias Maglogiannis
University of Thessaly, Lamia, Greece
Vassilis Plagianakos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koprinkova-Hristova, P., Stefanova, M., Genova, B., Bocheva, N. (2018). Echo State Network for Classification of Human Eye Movements During Decision Making. In: Iliadis, L., Maglogiannis, I., Plagianakos, V. (eds) Artificial Intelligence Applications and Innovations. AIAI 2018. IFIP Advances in Information and Communication Technology, vol 519. Springer, Cham. https://doi.org/10.1007/978-3-319-92007-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-92007-8_29
Published: 22 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92006-1
Online ISBN: 978-3-319-92007-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)