# Neural Evidence of the Cerebellum as a State Predictor

## Abstract

We here provide neural evidence that the cerebellar circuit can predict future inputs from present outputs, a hallmark of an internal forward model. Recent computational studies hypothesize that the cerebellum performs state prediction known as a forward model. To test the forward-model hypothesis, we analyzed activities of 94 mossy fibers (inputs to the cerebellar cortex), 83 Purkinje cells (output from the cerebellar cortex to dentate nucleus), and 73 dentate nucleus cells (cerebellar output) in the cerebro-cerebellum, all recorded from a monkey performing step-tracking movements of the right wrist. We found that the firing rates of one population could be reconstructed as a weighted linear sum of those of preceding populations. We then went on to investigate if the current outputs of the cerebellum (dentate cells) could predict the future inputs of the cerebellum (mossy fibers). The firing rates of mossy fibers at time *t* + *t*_{1} could be well reconstructed from as a weighted sum of firing rates of dentate cells at time *t*, thereby proving that the dentate activities contained predictive information about the future inputs. The average goodness-of-fit (*R*^{2}) decreased moderately from 0.89 to 0.86 when *t*_{1} was increased from 20 to 100 ms, hence indicating that the prediction is able to compensate the latency of sensory feedback. The linear equations derived from the firing rates resembled those of a predictor known as Kalman filter composed of prediction and filtering steps. In summary, our analysis of cerebellar activities supports the forward-model hypothesis of the cerebellum.

## Keywords

Motor control Internal forward model Mossy fiber Purkinje cell Dentate cell Kalman filter## Introduction

The cerebellum plays a critical role in the control and coordination of body movements, adaptation to novel environments, and acquisition of new motor skills. Evidence from clinical observations and psychophysical experiments indicates that impairments of the cerebellum lead to motor ataxia characterized by incoordination and dysmetria in multijoint movements. Clinical evidence pioneered by the seminal work of Holmes suggests that impairments in the cerebellum could lead to symptoms characterized by lack of coordination across multiple degrees of freedom in motor control, collectively known as cerebellar ataxia [1]. The cerebellum also plays an essential role both in adapting to external perturbations such as visual rotation or external force fields and in acquiring a new skill [2, 3, 4, 5, 6]. The cerebellum is one of the central nervous system whose anatomical structure, cytoarchitecture, and electrophysiological properties have been thoroughly studied. Despite the plethora of clinical and psychophysical evidence, the precise mechanisms by which the cerebellum coordinates body movements are not yet understood.

Recent computational studies suggest that the cerebellum predicts current and future states of the body by solving the dynamics with given efference copy of motor commands, known as the computation of an internal forward model [7, 8, 9, 10]. Sensory feedback signals from the periphery have certain delays in reaching the central nervous system, on the orders of a few tens to 100 ms. Therefore, the brain always receives the “past” state of the body. It is known in engineering and mechanics that feedback control based on time-delayed state can behave in an oscillatory and often unstable way if the delay is on the order of time constants of system dynamics. Fortunately, physical laws that govern body movements allow the brain to predict a current state from a previous state and an efference copy of motor command, essentially by solving the Newtonian mechanics. This predictive mechanism allows stable and dexterous control of body movements. Although the cerebellum has been suggested as a locus of forward-model computation from psychophysical, neuroimaging, and stimulation studies [11, 12, 13], the neural mechanism of how the cerebellum performs predictive computation has not yet been understood [10].

We therefore set out to determine the computation in the cerebellar circuit in a monkey during wrist step-tracking movements and provide neural evidence that current outputs of the cerebellar circuit contain predictive information about future inputs, a hallmark of an internal forward model. We analyzed firing rates of mossy fibers (MFs) (inputs to the cerebellar cortex), Purkinje cells (PCs) (outputs from the cerebellar cortex to the dentate nucleus (DN)), and dentate cells (DCs) (output of the cerebellum) of a monkey performing step-tracking wrist movements. Unlike the cerebral cortex in which multiple inputs and multiple outputs are related in a highly complex way, the cerebellar MF input and DN output are well defined and basically organized in a characteristic feedforward manner. By exploiting the feedforward structure, firing rates of one population were reconstructed from those of other populations that innervate to the target population. Also, we investigated if firing rates of cerebellar output (i.e., DCs) at time *t* contained information that was predictive for future inputs to the cerebellar cortex (i.e., MFs) at time *t* + *t*_{1}. Our analyses targeted to test the forward-model hypothesis of the cerebellum.

## Methods

### Behavioral Task and Electrophysiological Recording

We used firing rates of the cerebellar cells reported in our previous publications. Here, a brief description about the behavioral task and the electrophysiological recording is provided, and for a full account, please refer to the previous publications [14, 15]. All surgical and experimental protocols were approved by the Animal Care and Use Committee of Tokyo Metropolitan Institute of Medical Science. A male macaque monkey participated in a step-tracking movement task of the right wrist. The monkey gripped a manipulandum that measured the wrist movements and controlled a cursor on a computer screen with the manipulandum toward one of eight targets that were uniformly located on a circle of radius of 8°, corresponding to a wrist movement of 20°. An initial posture of the wrist was either pronated or supinated, so there were 16 experimental conditions (eight movement directions × two initial postures).

During the step-tracking movement task, neural activities of MFs, PCs, and DCs were recorded. Their anatomical locations and physiological characteristics allowed us to identify these cells with confidence. Spike data were sorted on the timing of movement onsets and binned into firing rates of a time window of 20 ms averaged over 10 to 20 trials for one condition. The dataset included 94 MFs, 83 PCs, and 73 DCs from monkey 1. PCs were identified by the coexistence of simple and complex spikes, and MFs were identified by the occurrence of a short positive–negative potential followed by a longer negative afterwave [14, 15]. DCs were identified, in addition to anatomical separation from the cerebellar cortex, by the characteristic of large negative–positive spike waveforms [14]. Note that only simple spikes were analyzed for PCs because the monkey had been trained for the task for years and no apparent improvement in task performance occurred during the recordings. These cells were recorded on different experimental sessions or days, so the firing rates but not spikes were analyzed in this study.

### Characterization of Firing Rates: Spatiotemporal Separability Index and Distributions

*t*-th time bin in [− 1 s, + 1 s] recorded from

*i*-th MF for direction

*d*and posture

*p*. Similarly, \( {\mathrm{PC}}_i^{p,d}(t) \) and \( {\mathrm{DC}}_i^{p,d}(t) \) are defined for PCs and DCs, respectively. We first characterize the spatiotemporal patterns of activities of MFs, PCs, and DCs and introduce an index that quantifies how spatiotemporally separable the activities are. If a cell has stationary directional tuning, the firing rates are spatiotemporally separable as a product of a function of movement direction and a function of time (Fig. 1a). If directional tuning of a cell is not stationary but rather exhibits time-varying preferred direction, the firing rates are spatiotemporally nonseparable. For a given posture, the firing rates of

*i*-th neuron are summarized into a matrix form as

**R**∈

*ℝ*

^{D × T}, where

*D*is the number of movement directions,

*T*is the number of time windows of recording, and

*D*≪

*T*. To level off the difference in firing rates, each row of the matrix

**R**is normalized to zero mean and unit variance.

**R**may be decomposed into a factorial form as

*σ*

_{1}≥ ⋯ ≥

*σ*

_{D}. {

**u**

_{d}} and {

**v**

_{d}} are

*D*-dimensional and

*T*-dimensional orthonormal vectors characterizing directional tuning and temporal profile of firing rates, respectively. If directional tuning is invariant during experimental duration, then

**R**is expressed as a rank-one matrix (i.e., \( \mathbf{R}={\sigma}_1{\mathbf{u}}_1{\mathbf{v}}_1^{\top } \)). On the other hand, if directional tuning varies during movement duration, then

**R**cannot be a rank-one matrix and contains multiple rank-one matrices. Therefore, how well

**R**is reconstructed by a rank-one matrix is a candidate measure for characterizing properties of firing rates. Once the firing-rate matrix is decomposed into a sum of rank-one matrices, the degree of spatiotemporal separability is quantified by an index (spatiotemporal separability index, STSI),

STSI takes a value ranging from 0 to 1; STSI is 1 if **R** is spatiotemporally separable, whereas STSI takes a smaller value if **R** contains multiple rank-one matrices. This index was computed for each cell.

To further characterize the firing rates, we also examined distributions of firing rates of the three populations. For each population, a histogram was constructed by counting the frequency of firing rates in all time bins of cells in that population both for pronated and supinated postures. These histograms were fitted with Gaussian, Gamma, Rayleigh, and inverse Gaussian distributions using a maximum likelihood method, and the values of Akaike information criterion (AIC) were computed for these distributions.

### Linear Reconstruction of Firing Rates of PCs

We then attempted to reconstruct the firing rate as a weighted sum of those of preceding layers by exploiting the connectivity of the cerebellum (Fig. 1 inset). Namely, PCs receive inputs from MFs through granule cells and inhibitory interneurons (stellate cells and basket cells), and that DCs receive inhibitory and excitatory inputs from PCs and collaterals of MFs, respectively.

*i*-th PC of time

*t*was reconstructed as a weighted sum of those of MFs at the same time as

*N*

_{PC}×

*N*

_{MF}weight matrix, that is, for each PC, there are

*N*

_{MF}adjustable weights. Note that these weights cannot be interpreted directly as synaptic strengths between MFs and PCs because they are indirectly connected via granule cells as well as inhibitory interneurons. The weights should rather be interpreted not as anatomical but functional connections between MFs and PCs. The weights are assumed to be either positive or negative, reflecting the anatomical fact that, between MFs and PCs, there are excitatory (granule cells) and inhibitory interactions (Fig. 1 inset). The weight matrix was optimized so as to minimize a squared error between actual firing rate PC

_{i}(

*t*) and reconstructed firing rates \( {\widehat{\mathrm{PC}}}_i(t) \) averaged over experimental duration and movement directions as

The optimization was performed for the two postures separately so that we investigated how the weights trained for one posture generalized to the other posture. Note that the reconstructed firing rates of a PC in one condition (i.e., movement direction and posture) used those of MFs in the same condition. For optimizing the weights, the entire experimental duration [− 1 s, + 1 s] was used. The fitting to the data was evaluated by computing the coefficient of determination (*R*^{2}) in a time window of [− 0.2 s, + 0.5 s] because task-related modulation of firing rates occurred mainly in this time window.

### Linear Reconstruction of Firing Rates of DCs

Collaterals of MFs send excitatory inputs and PCs send inhibitory inputs to DCs, so \( \left\{{w}_{ij}^{\mathrm{MF}\to \mathrm{DC}}\right\} \) and \( \left\{{w}_{ik}^{\mathrm{PC}\to \mathrm{DC}}\right\} \) were assumed to be nonnegative and nonpositive, respectively. For each DC, the linear model contained *N*_{MF} + *N*_{PC} weights, and the squared error between DC_{i}(*t*) and \( {\widehat{\mathrm{DC}}}_i(t) \) was minimized under the nonnegative and nonpositive constraints. These weights were optimized using a standard quadratic programming algorithm.

To explain spontaneous activities of DCs, the constant terms (\( {w}_i^{\mathrm{MF}\to \mathrm{DC}} \) in Eq. (6) and \( {w}_i^{\mathrm{PC}\to \mathrm{DC}} \) in Eq. (7)) were included. Values of the goodness-of-fit using both MF and PC firings (Eq. (5)), MF firings only (Eq. (6)), or PC firings only (Eq. (7)) were compared.

### Statistical Tests of Linear Reconstruction

*within*a posture. Two standard nonlinear models (a threshold model and a quadratic model) were fit to the data. One was a linear-threshold model where the MF firing rates are thresholded by zeroing the activity equal to or smaller than a threshold as

*θ*

_{j}denotes an activity threshold of

*j*-th MF. The linear-threshold model is often used for modeling nonlinear amplification, multiplicative gain modulation, and winner-takes-all selection [16]. In our context, the thresholding operation may be regarded as a nonlinear processing of granule cells between MFs and PCs. The other was a quadratic model which is analogous to an energy model of complex cells in the visual cortex [17]:

*t*is reconstructed with that of a MF at the same time

*t*. In addition, we also considered a finite-impulse-response (FIR) model of order 1 defined as

The second term may be interpreted as a reverberation in the recurrent circuit composed of MFs, granule cells, and Golgi cells, so this FIR model is the simplest example of the adaptive filter model.

*N*

_{MF}, 2

*N*

_{MF}, and 3

*N*

_{MF}adjustable parameters, respectively. For each cell, the three models were compared according to the AIC. In the same way, the linear reconstruction model of DCs was tested with a linear-threshold model,

The linear, linear-threshold, and quadratic models contain *N*_{MF} + *N*_{PC}, 2*N*_{MF} + 2*N*_{PC}, and 3*N*_{MF} + 3*N*_{PC} parameters, respectively. As in Eq. (9), \( \left\{{w}_{ij}^{\mathrm{MF}\to \mathrm{DC}}\right\} \) and \( \left\{{w}_{ik}^{\mathrm{PC}\to \mathrm{DC}}\right\} \) were constrained to be nonnegative and nonpositive, respectively, and other parameters were unconstrained. These models were also compared according to the AICs.

Second, how the reconstruction of linear and other models learned at one posture generalized to those at the other posture. Here, we expected that a reconstruction model that appropriately describes the firing rates should reconstruct not only at a trained posture but also at an untrained posture. Hence, this was a model comparison *across* the two postures. The degree of generalization was evaluated for the four models of PC firing-rate fitting and for the three models of DC firing-rate fitting. Specifically, for one target cell, the weights optimized at one posture were used for a linear reconstruction at the other posture, and the goodness-of-fit between the linear reconstruction and the data was computed at the untrained posture. The goodness-of-fit of multiple models was statistically compared by one-way ANOVA.

### Statistical Comparison of MF–PC and MF–DC Projections

MFs project both to PCs via granule cells and parallel fibers and to DCs. Correspondingly, the linear equations we consider include the two terms \( \left\{{w}_{ij}^{\mathrm{MF}\to \mathrm{PC}}\right\} \) in Eq. (3) and \( \left\{{w}_{ij}^{\mathrm{MF}\to \mathrm{DC}}\right\} \) in Eq. (5), which represent functional projections from MFs and PCs and from MFs to DCs, respectively. Each projection was characterized by a column of \( \left\{{w}_{ij}^{\mathrm{MF}\to \mathrm{PC}}\right\} \) or \( \left\{{w}_{ij}^{\mathrm{MF}\to \mathrm{DC}}\right\} \), which we refer to a projection vector. Similarity of projections was quantified by computing a correlation coefficient between two projection vectors. Specifically, we asked whether the two functional projections statistically differed. Correlation coefficients of all possible pairs of MF–PC projection vectors and MF–DC projection vectors were computed, and their average served as an index of similarity between the two projections. Note that the MF–PC projection vectors took either positive or negative values while the MF–DC projection vectors were nonnegative. Absolute values of the MF–PC projection vectors were used for the computation of correlation coefficients because only the magnitudes of projections were of our interest. To assess the statistical significance of the average of correlation coefficients, a resampling test was performed based on a null hypothesis that there was no statistical difference between MF–PC and MF–DC projections. Specifically, a bootstrap distribution of correlation coefficients was computed by randomly permuting the labels of projection vectors for 100,000 times. The probability of observing the average of correlation coefficients was assessed in terms of the bootstrap distribution.

### Linear Predictions of Future MF Activities from Current DC Activities

*t*+

*t*

_{1}were reconstructed as a weighted sum of the firing rates of DCs at time

*t*, as

This linear prediction model contained *N*_{DC} weights for one MF. As in the linear reconstruction cases, the squared error between the predicted and the actual firing rates was minimized to compute the optimal values of the weights. The weights \( \left\{{w}_{ij}^{\mathrm{DC}\to \mathrm{MF}}\right\} \) cannot be interpreted as functional connectivity from DCs to MFs, as there are no anatomically direct connections between DCs and MFs. Rather, the linear prediction model was introduced to test if the current DC activity contained predictive information about the future MF activity. *t*_{1} is a parameter of time advance and was varied from 0 to 200 ms in steps of 20 ms (the window of time bins). The goodness-of-fit of the linear model was evaluated in a time window of [− 200 ms, 500 ms].

One may suspect that the linear prediction in Eq. (13) is possible if the terms of the right-hand side span a wide variety of waveforms, as in the case of Fourier expansion whereby any time series can be fit with a set of sinusoidal waveforms. To assess the fit statistically and to exclude that possibility, a bootstrap test was performed by shuffling the movement directions of DCs on the right-hand side of Eq. (13), while the movement directions of MFs on the left-hand side were fixed to those used experimentally. Randomizing the movement directions on the right-hand side maintained the diversity of waveforms but eliminated directional relationship between both sides of Eq. (13). The bootstrap test was designed based on a null hypothesis that any time series of the same degree of similarity could predict the future inputs to the cerebellum. With one sequence of shuffled targets, the goodness-of-fit was computed for all MFs and then averaged. This was repeated for 10,000 times to produce a bootstrap distribution of goodness-of-fit, and then the probability of the experimental goodness-of-fit was computed. This bootstrap test was performed with *t*_{1} ranging from 20 to 100 ms in step of 20 ms for the two postures, separately.

### Sparse Linear Analyses

Here *T* denotes the length of data. Note that Eqs. (14) and (16) are the cost functions used in a standard sparse linear analysis known as LASSO [18]. Equation (15) slightly differs from LASSO because of the nonnegative constraint of MF → DC weights and the nonpositive constraint of PC → DC weights. The time advance parameter *t*_{1} in Eq. (16) was fixed to 40 ms in this sparse analysis. The parameter *λ* determines the tradeoff between the squared error and the sparseness and must be optimized for individual target cells. For each target cell, we varied *λ* in a range from 0.05 to 5 and chose the value that exhibited the smallest generalization error for test data in tenfold cross validation. The weight coefficients computed with the optimized *λ* were assessed in two ways: the proportion of nonzero weights and the proportion of significantly contributing weights. Here, the significantly contributing weights were defined by counting the number of weights whose cumulative sum exceeded 90% of the sum of total weights.

## Results

First, statistical characteristics of firing rates of the three populations were computed and compared (see “Characterization of Firing Rates: Spatiotemporal Separability Index and Distributions” in the “Methods” section). Then, the firing rates of PCs and DCs were reconstructed with linear weighted models of MF and PC firing rates (see “Linear Reconstruction of Firing Rates of PCs” and “Linear Reconstruction of Firing Rates of DCs” in the “Methods” section). The reconstructions of the linear models were statistically compared to those of nonlinear and FIR models by computing AICs within a single posture and the degree of generalization across two postures (see “Statistical Test of Linear Reconstruction”). Finally, the forward-model hypothesis was tested by applying a linear-weighted model that predicted the MF activity at time *t* + *t*_{1} from the DC activity at time *t* (see “Linear Prediction of Future MF Activities from Current DC Activities” in the “Methods” section).

### Characterization of Firing Rates

The STSI, defined in Eq. (2), was introduced to assess the complexity of spatiotemporal patterns of firing rates of a given neuron (see “Characterization of Firing Rates: Spatiotemporal Separability Index and Distributions” in the “Methods” section). STSI exhibited clearly separable values for MFs and PCs (Fig. 1a). Our previous papers reported that task-related activities recorded from MFs showed a unimodal and directionally tuned phasic modulation around movement onset that was analogous to those observed in activities of the primary motor cortex (M1) [19, 20, 21], and that simple-spike activities of PCs showed dynamic and time-varying directional tuning before and after movement onset [14, 15]. Consistent with these observations, STSIs of MFs were significantly larger than STSIs of PCs (unpaired *t* test, *p* < 0.01). The population of MFs had the largest value (median 0.63 ± 0.14 SD), followed by the population of DCs (median 0.48 ± 0.15 SD), and the population of PCs had the smallest values (median 0.38 ± 0.15 SD). There was a statistically significant difference between STSIs of the three populations as determined by one-way ANOVA (*F*(2, 497) = 95.3, *p* = 9.18 × 10^{−36}). A Tukey post hoc test revealed that the STSIs of MFs were statistically significantly larger than those of PCs (*p* = 9.6 × 10^{−10}) and those of DCs (*p* = 9.6 × 10^{−10}) and that the STSIs of DCs were statistically larger than those of PCs (*p* = 2.7 × 10^{−5}). Therefore, the firing rates of MFs exhibited spatiotemporally separable and simple characteristics, the firing rates of PCs exhibited spatiotemporally nonseparable and complex characteristics, and the firing rates of DCs exhibited characteristics intermediate between MFs and PCs.

The distributions of firing rates were computed for the three populations, respectively (Fig. 1b). They were unimodal (MF: mean 21.1 ± 17.6 SD; PC: mean 48.2 ± 22.7; DC: mean 38.6 ± 22.7). These distributions were fitted with Gaussian, Gamma, Rayleigh, and inverse Gaussian distributions, which were compared according to the values of AICs. We found that the Gamma distribution provided the smallest values of AICs for the MF (1.26 × 10^{6} (Gaussian), 1.18 × 10^{6} (Gamma), 1.25 × 10^{6} (Rayleigh), 5.5 × 10^{6} (inverse Gaussian)), PC (1.201 × 10^{6} (Gaussian), 1.191 × 10^{6} (Gamma), 1.194 × 10^{6} (Rayleigh), 1.216 × 10^{6} (inverse Gaussian)), and DC firing rates (1.045 × 10^{6} (Gaussian), 1.023 × 10^{6} (Gamma), 1.025 × 10^{6} (Rayleigh), 1.052 × 10^{6} (inverse Gaussian)). A Gamma distribution has a property that a sum of two independent random variables from a Gamma distribution obeys also a Gamma distribution. Actual neural activities are correlated to each other, so the fact that the firing rates were all distributed as Gamma distributions does not necessarily support simple summation from one population to another, but rather imply some simple transformations among the populations.

The two analyses above found that the three populations had distinct degrees of spatiotemporal complexity as quantified in terms of STSIs, while the firing rates were all Gamma distributed. These results, taken together, led us to a hypothesis that the transformation from one population to another might be linear, so we went on to test the hypothesis by linearly fitting the firing rates of one population with those of preceding populations.

### Linear Reconstruction of Firing Rates of PCs and DCs

_{i}(

*t*) and reconstructed firing rates \( {\widehat{\mathrm{PC}}}_i(t) \) (Eq. (4)) was minimized separately for the pronated and supinated postures. The results of the two postures were similar; we present the results of the pronated posture below. Figure 2 illustrates time series and contour plots of the firing rates of two representative PCs that exhibited the highest

*R*

^{2}values between the original and reconstructed firing rates. In one PC (Fig. 2a), firing rates underwent a suppression just before the movement onset and an increase after the movement onset around movement directions 1 and 8, therefore exhibiting the reversal of its preferred direction. The reconstructed firing rates captured the reversal of preferred direction. In another PC (Fig. 2b), there was a uniform suppression of firing rates over the movement direction before the movement onset, followed by the emergence of directional tuning around directions 4 and 5 after the movement onset. As a population, the linear reconstruction model explained the original firing rates of PCs, as evidenced by

*R*

^{2}values for the pronated posture (mean 0.95 ± 0.023 SD) and the supinated posture (mean 0.96 ± 0.023 SD).

*R*

^{2}values between the original and reconstructed firing rates. As in the PC case, the firing rates of DCs were well reconstructed as linear sums of MFs and DCs. As a population, the median values of

*R*

^{2}of linear model were 0.94 ± 0.048 SD for the pronated posture and 0.93 ± 0.0420 SD for the supinated posture. For a comparison, the DC firing rates were reconstructed using only the MF or PC firings minimizing Eq. (6) or (7), respectively. The median values of

*R*

^{2}using only MFs were 0.92 ± 0.069 (pronated) and 0.91 ± 0.067 (supinated), and the median values of

*R*

^{2}using only MFs were 0.91 ± 0.060 (pronated) and 0.91 ± 0.050 (supinated). There was a statistically significant difference between the

*R*

^{2}values of the linear reconstructions using MF + PC, MF only, or PC only as determined by one-way ANOVA (

*F*(2, 216) = 5.02,

*p*= 7.4 × 10

^{−3}(pronated);

*F*(2, 216) = 6.82,

*p*= 1.3 × 10

^{−3}(pronated)). A Tukey post hoc test revealed that the

*R*

^{2}values of MF + PC reconstructions were significantly better than those of MF reconstructions or those of PC reconstructions (

*p*= 0.047 for MF only and

*p*= 0.008 for PC only (pronated),

*p*= 0.0098 for MF only and

*p*= 0.0018 for PC only (supinated)).

### Comparison of AIC for Multiple Models of PCs and DCs

Summary of AIC statistics for PC firing-rate fitting

Linear model | Thresholding model | Quadratic model | FIR model | |
---|---|---|---|---|

Median (STD) for pro posture | 6.05 × 10 | 6.22 × 10 | 6.27 × 10 | 6.11 × 10 |

Median (STD) for sup posture | 6.10 × 10 | 6.26 × 10 | 6.27 × 10 | 6.14 × 10 |

Summary of AIC statistics for DC firing-rate fitting

Linear model | Thresholding model | Quadratic model | |
---|---|---|---|

Median (STD) for pro posture | 6.18 × 10 | 6.67 × 10 | 6.97 × 10 |

Median (STD) for sup posture | 6.18 × 10 | 6.65 × 10 | 6.97 × 10 |

### Generalization Across Postures of Fitted Models of PCs and DCs

*within*a posture. We then proceeded to ask which model best generalized firing-rate fitting

*across*two postures (see “Statistical Test of Linear Reconstruction”). Specifically, the weights trained in one posture were used to reconstruct the firing rates in the other posture that were not used for training. The degrees of generalization were summarized in box plots of Fig. 6a: (left panel) from supinated to pronated posture, and (right panel) from pronated to supinated posture, and Table 3. The goodness-of-fit was almost equal for the linear, threshold, and FIR models. In contrast, the goodness-of-fit was significantly worse for the quadratic model than those for the other models, indicating that the quadratic model overfit to the trained posture and did not generalize properly to the untrained posture. There was a statistically significant difference between the four models as determined by one-way ANOVA (

*F*(3, 328) = 19.8,

*p*= 7.57 × 10

^{−12}(trained in pro and tested in sup);

*F*(3, 328) = 20.9,

*p*= 2.1 × 10

^{−12}(trained in sup and tested in pro)). A Tukey post hoc test revealed that the goodness-of-fit of the quadratic model was significantly worse than the three models (

*p*= 3.6 × 10

^{−9}for the linear model,

*p*= 5.1 × 10

^{−9}for the thresholding model, and

*p*= 1.3 × 10

^{−7}for the FIR model (trained in pro and tested in sup);

*p*= 3.8 × 10

^{−9}for the linear model,

*p*= 3.8 × 10

^{−9}for the thresholding model, and

*p*= 1.4 × 10

^{−7}for the FIR model (trained in sup and tested in pro)). There was no significant difference between the linear, thresholding, and FIR models. In summary, the linear, thresholding, and FIR models generalized from one posture to another almost equally, whereas the generalization of the quadratic model was significantly worse than those of the other models.

Summary of degrees of generalization across two postures for PC firing-rate fitting

*F*(2, 246) = 0.10,

*p*= 0.90 (trained in pro and tested in sup);

*F*(2,246) = 2.5 × 10

^{−3},

*p*= 0.9975 (trained in sup and tested in pro)). Therefore, the degree of generalization from one posture to another was not different between the three models.

Summary of degrees of generalization across two postures for PC firing-rate fitting

### Statistical Comparison of MF–PC and MF–DC Projections

In the linear equations derived from the experimental firing rates, there are two distinct projections from the MFs: from MFs to PCs (*w*^{MF → PC} in (3)) and from MFs to DCs (*w*^{MF → DC} in (5)) (see “Statistical Comparison of MF–PC and MF–DC Projections” in the “Methods” section). We asked whether a common population of MFs projected both to PCs and DCs or separate populations of MFs projected to PCs and DCs. The average of correlation coefficients between MF–PC and MF–DC projection vectors was 0.060. We assessed a statistical significance of this value of correlation by a resampling test with a null hypothesis that there was no statistical difference between the two types of projections. A 99% confidence interval was [0.1027, 0.1089], and the average value of experimental correlation coefficients was significantly small (*p* < 10^{−5}). Therefore, the null hypothesis was rejected, and PCs and DCs did not receive projections from the same population of MFs.

### Linear Predictions of Future Inputs from Current Outputs

*t*+

*t*

_{1}from current outputs from the cerebellum (DCs) at time

*t*(see “Linear Prediction of Future MF Activities from Current DC Activities”). The time advance

*t*

_{1}was varied from 20 to 200 ms in steps of 20 ms, and the reconstructions with

*t*

_{1}= 40 ms (Fig. 7) and

*t*

_{1}= 80 ms (Fig. 8) were presented. When

*t*

_{1}= 40 ms, spatiotemporal patterns of firing rates of MFs were captured by linear predictions without any noticeable delay. When

*t*

_{1}= 80 ms, although fitting to peaks and troughs becomes less accurate, the overall patterns of firing rates were preserved. In fact, the goodness-of-fit decreased moderately when

*t*

_{1}was increased from 20 to 200 ms (Fig. 9). One may suspect that any time series of similar complexity could predict the future input reasonably, so we proceeded on to test whether the performance of the linear prediction was a statistical change or not. Statistical significance of the goodness-of-fit was assessed by a bootstrap test based on a null hypothesis that any time series of similar complexity could predict the future MF activities. A bootstrap distribution of goodness-of-fit was constructed by shuffling the movement directions of DCs on the right-hand side of Eq. (13) for 10,000 times. Intuitively, DC activities with shuffled movement directions were of the same complexity of those with the experimental directions but did not retain movement-specific information. We found that the goodness-of-fit of the original data was significantly better than those of the shuffled data for

*t*

_{1}ranging from 20 to 100 ms for both the pronated (

*p*< 5 × 10

^{−3}for all

*t*

_{1}ranging from 20 to 100 ms, Bonferroni corrected) and supinated (

*p*< 5 × 10

^{−3}for all

*t*

_{1}ranging from 20 to 100 ms, Bonferroni corrected) postures. Therefore, the current output from the cerebellum contained predictive information about the future input to the cerebellum.

### Weight Distributions of Linear Reconstruction and Prediction Models

Summary of weight characteristics obtained in sparse linear analyses

Proportion of nonzero weights | Proportion of significantly contributing weights | |
---|---|---|

PC (pronated) | 0.60 (0.14) | 0.32 (0.082) |

PC (supinated) | 0.68 (0.13) | 0.36 (0.096) |

DC (pronated) | 0.82 (0.34) | 0.38 (0.26) |

DC (supinated) | 0.73 (0.354) | 0.37 (0.14) |

MF (pronated) | 0.77 (0.10) | 0.35 (0.063) |

MF (supinated) | 0.81 (0.12) | 0.38 (0.067) |

## Discussion

There are three main findings in this study. First, the distributions of firing rates of the three populations were all Gamma distributed, and they exhibited various degrees of spatiotemporal complexity. This indicated that the activities of the three populations represented functionally distinct roles in computation within the cerebellar circuit. Second, the firing rates of PCs are reconstructed linearly as a weighted sum of the firing rates of MFs, and the firing rates of DCs are reconstructed linearly as a weighted sum of the firing rates of MFs and PCs. Finally, the firing rates of DCs at time *t* linearly predict the firing rates of MFs at time *t* + *t*_{1}, so the current output from the cerebellum contains predictive information about the future input to the cerebellum. These findings reveal the linear computation from one population to another and support the forward-model hypothesis of the cerebellum. It is worth mentioning that no nonlinearity that is expected from the perceptron model or dependence on previous inputs that is expected from the adaptive filter model was necessary to explain our data. Our results provide a strikingly simple picture of linear transformations for the cerebellar computation. In the following, we branch out to discuss our findings in consideration of previous studies and speculate implications for the computation of internal models in the cerebellum.

We should remark that all the results in this study were obtained from one animal and could have reflected an idiosyncrasy of that animal, so the conclusion in this study must be confirmed with another animal in a future study.

### Previous Electrophysiological Studies

There are two kinds of internal models hypothesized for motor control: a forward model that performs a state prediction from a current estimate and an efference copy, and an inverse model that transforms a desired goal of movement into the necessary motor commands [22]. There has long been a controversy over whether the cerebellum functions as a forward model or an inverse model. Previous single-unit recording studies of PC activities during hand movements provided controversial results for the internal-model hypothesis of the cerebellum [23, 24]. These studies examined the correlation between activities of PCs and movement kinematics and/or dynamics. The underlying assumption was that kinematic and dynamic representations of PCs relate to forward and inverse models, respectively. As kinematic variables (e.g., hand trajectory) and dynamic variables (e.g., muscle activities) are highly correlated under unperturbed conditions, one approach is to make a monkey perform the same movement trajectory with different loads on the hand to dissociate dynamics from kinematics ( [25, 26]. The monkey compensated the varying load in order to keep the same hand path, thereby dissociating the dynamics from the kinematics. For instance, Pasalar et al. [23] recorded simple-spike activities of task-related PCs while monkeys performed a circular manual tracking task under varying viscous and elastic loads. The simple-spike firing rates and spatial tuning did not change significantly under various load conditions, which supported a kinematic representation of arm movements in the cerebellar cortex. Their results appeared to be compatible with the forward-model hypothesis of the cerebellum, which predicts movement kinematics.

Similarly, Yamamoto et al. [24] recorded simple-spike activities of PCs while monkeys performed elbow extension or flexion movements under assistive or resistive forces. In contrast with the findings of Pasalar et al., the simple-spike activities did change according to the load condition and correlated with the change in muscle activities, thereby seemingly consistent with the inverse-model hypothesis. Although the two studies examined PC firing rates in similar experiments, their conclusions were opposite to each other. Other studies described simple-spike activities correlated with eye-movement dynamics [27] or cursor-movement kinematics [28]. Therefore, to date, these single-unit–recording studies seem inconclusive about whether the cerebellum plays a role of an internal forward model or inverse model.

These studies rely on an assumption that kinematic and dynamic representations of PCs relate to forward and inverse models, respectively. This assumption, however, does not hold because an internal forward model should include dynamical variables such as efference copies of motor control signals. Also, disentangling predicted state signals, sensory feedback signals from the periphery, and motor commands is rather difficult because these signals resemble each other [8]. Therefore, a mere correlative comparison of PC activities with one or other behavioral parameters would not lead to conclusive evidence for either of forward or inverse models.

### State Prediction as a Prerequisite for a Forward Model

To resolve the limitation of the single-unit studies that correlated firing rates of one cell population and behavioral measures, we believe it essential to analyze network-level computation across multiple cell populations, as suggested by Wolpert and Miall [8]. The current study was designed to circumvent the abovementioned difficulty of disentangling multiple representations and targeted the transformation through the cerebellum from MF (input to the cerebellum) to DC (output from the cerebellum) via PC (output from the cerebellar cortex to the cerebellar nuclei), revealing the linear computation from one cell population to another. Furthermore, we found that the current output of the cerebellum predicts the future inputs to the cerebellum, a distinctive feature of an internal forward model. These findings could not have been achieved with the analysis of correlation between activities of one cell population and behavioral measures.

A critical test of the forward-model hypothesis of the cerebellum is whether the prediction performed in the cerebellum can offset delayed sensory feedback. Delays in sensory feedback can differ from one sensory modality to another; proprioceptive feedback takes of the order of 50 ms from muscle spindles to the somatosensory and motor cortices, and visual feedback takes about 50 ms from the retina to the primary visual cortex and 100 ms to the higher visual cortices [29]. These delays can deteriorate the performance of rapid movements of the order of a few hundred milliseconds employed in this study. Our analysis revealed that the current DC activity contained predictive information about the future MF activity for a range of time advance. Therefore, our result supports that the cerebellum is capable of compensating the sensory delays of the order of 100 ms, supporting the forward-model hypothesis. A previous electrophysiological study reported that activity of postcentral neurons changed on the average of about 60 ms before the onset of agonist elbow muscles in voluntary elbow movements [30]. The early onset of activity of postcentral neurons is within the timescale of prediction in the cerebellum found in this study.

Morphological and physiological evidence accumulated over decades suggests that a region of the cerebro-cerebellum that forms a closed-loop circuit with M1 appears to satisfy the basic requirements for a forward model that generates a prediction of the outcome of a motor command [31]. First, this region of the cerebro-cerebellum receives a putative efference copy as well as a strong somatosensory input [14, 15], and these inputs are presumed to be integrated in the cerebellar cortex. Second, the activities of PCs in this region lag behind those of M1 neurons, while they precede the movement onset [15]. The timing of activity is compatible with the idea that it works as a forward model that predicts an outcome of the motor command. As a result, the output of this region of the cerebro-cerebellum may help M1 to generate a suitable motor command for the next moment depending on the predicted consequence before a feedback signal is available for the current motor command. We note that there are in general two input pathways to the MFs: one from the cerebral cortex through the pons and another from the peripheral sensory organs. Our single-unit recording did not allow to identify the origin of MFs and thus to discuss what information the MF activities encoded.

Our analyses assume the feedforward anatomical structure of the cerebellar circuit, but it is known that the cerebellar circuit contains recurrent anatomical connections that form a closed-loop circuit within the cerebellum, such as those composed of Golgi and granule cells and those composed of Purkinje and basket cells. Among these recurrent connections, the most relevant to this study is the nucleocortical projections from the cerebellar nuclei to the granular layer as MFs [32]. A recent study reported that excitatory output cells in the interposed nucleus provide efference copy signals via MFs to the cerebellar cortical zones and that an eye-blink conditioning training increased the local density of nucleocortical MF terminals [33]. One may suspect, therefore, that the linear prediction from DCs to MFs reported in this study could be attributed to the nucleocortical recurrent connections. We note, however, that this nucleocortical projection per se does not explain the longer time scale of linear prediction up to 100 ms reported in this study. Also, the nucleocortical pathway comprises only approximately 5% of the total of cerebellar MF inputs [34]. We hence expect that recurrent connections in the nucleocortical projections have minor contributions to our results of linear prediction from DCs to MFs.

### Linear Computation in the Cerebellar Circuit

We have shown the linear transformations from MFs to PCs and from MFs and PCs to DCs explained the observed firing rates recorded during the wrist movement task. In addition, the future MF activities were linearly predicted from the current DC activities. The success of linear modeling of PC and DC activities was unexpected and intriguing for the following three reasons. First, a PC receives parallel fiber inputs of the order of 100,000, while our dataset from monkey 1 contained only 94 MFs. Second, these firing rates were recorded at different sessions or even across different recording days separated by years, so the firing rates of the three populations were of no direct causal relation. Nonetheless, the computation in the cerebellar circuit turned out to be linear.

A missing piece in our study is the granule cell activity. Our results demonstrated that the transformation from MFs to PCs is linear, implying another linear computation in the granular layer. Two possibilities of computation in the granular layer are suggested. The first possibility is that each granule cell performs linear computation by linearly summing up the inputs from MFs. A previous study reported linear computation from MFs to medium ganglion cells in the cerebellum-like structure of electric fish [35]. They reported that linear weighted sums of sparsely and randomly mixed MF inputs reconstructed the membrane potentials of granule cells. The reconstructed granule cell activities exhibited a rich repertoire of temporal bases, which in turn constitute a negative image of sensory inputs. Their study suggests the linear computation from MFs to granule cells, in line with our results. It is interesting to note that, in their study, sparsity of linear weights was explicitly incorporated into an error function. In contrast, our results demonstrated that the sparse distributions of weights emerged spontaneously without a sparseness term in the cost function. Another possibility is that each granule cell performs nonlinear computation of MF inputs and the population of granule cells as a whole encodes inputs linearly. Recent studies demonstrated that individual granule cells were more narrowly tuned to the whisker angle of a rat than EPSC, thereby exhibiting nonlinear computation of granule cells sharpening their inputs [36], while individual PCs encoded whisker position linearly [37]. Interestingly, the population of narrowly tuned granule cells provides a linear excitatory drive across a range of whisker positions to PCs [36]. Because our dataset does not contain granule cell activities, we are not certain which may be the case.

Despite the fact that there are interneurons with recurrent connections in the cerebellar circuit, our finding indicates that the computation in the cerebellum is unexpectedly linear. We here speculate two possible explanations for the success of our linear modeling. One reason is that the performance of the monkeys was stable because they have been trained over years for this wrist movement task. Therefore, we expect that the response properties of cerebellar cells remained stable across experimental sessions once the monkeys had achieved stable task performance. Another reason is that the cells in the dataset were selected if they showed task-related modulations of firing rates. Among numerous parallel-fiber inputs to a PC, it is conceivable that only a fraction of task-related inputs determines the response properties of that PC, as revealed by the sparse linear analyses. The stability of task performance and the selected sampling of task-related cells could explain the success of our linear modeling.

In line with our findings of linear transformations and predictions, an increasing number of recent literatures have reported that firing rates of cerebellar cells encode movement-related parameters linearly [38]. In the case of MFs [39, 40], Laurens et al., for example, reported a linear monotonic relationship between firing rate of MFs and eye position [39]. Similarly, for PCs [27, 37, 41, 42, 43, 44, 45, 46, 47], Hong et al. found that regularly firing spikes perform linear encoding of eye movement velocity by firing rate [43]. Finally, neurons in the cerebellar nuclei [48, 49, 50, 51], the timing, and kinematics of motor output were modulated by linearly graded disinhibition of neurons in the deep cerebellar nuclei [49]. Along with these reports of linear encoding of behavioral parameters in the cerebellar cells, our findings reinforce the perspective of linear computation within the cerebellar circuits. Whereas there is evidence for linear encoding in the cerebellum as referenced above, we note that there is also evidence for nonlinear coding of saccade onset timings by spikes of PCs that are related to the period of occasional pauses [43], suggesting a possibility of multiplexed encoding by cerebellar cells.

### Linear Equations and Interpretation as Kalman Filter

We realized that the chain of linear equations of neuron activities resembles those of an estimator known as Kalman filter. If we assume that MFs represent a current estimate of state and sensory feedback, PCs represent a prediction of state, and DCs represent a filtered state, then the linear equations can be interpreted as Kalman filter as follows. The first equation is a prediction step ((A) in Fig. 11); a current estimate (MFs) is projected to a predicted state (PCs). Then the second equation is a filtering step; a predicted state (PCs) and a sensory feedback (MFs) are integrated into a filtered state (DCs) ((B) in Fig. 11). Then, a filtered state (DC) contained predictive information about the future inputs (MFs) ((C) in Fig. 11).

In the above analogy with the Kalman filter, we have made an important assumption: PCs and DCs receive information about current estimate of state and sensory afferent signals through MFs, respectively. Our statistical test suggests that MF projections to PCs differ from MF collateral projections to DC in terms of their sources. It is known for the cerebro-cerebellum that MFs originate from the pons (which receives direct cortical projections) and from the brainstem/spinal cord, and that each of the MF inputs reflects either cortical activities or sensory feedback signals. Anatomical studies using an anterograde tracer (HRP-WGA) in cats reported that MFs of cortical origins via pons consist of the main input to the cerebellar cortex, whereas collaterals of those MFs poorly project directly to the dentate nuclei [52, 53]. Therefore, “the bulk of information of cortical origin reaches the cerebellar nuclei only after processing in the cerebellar cortex” (p. 22 of Brodal et al. [52]). This is consistent with our assumption that DCs receive a predicted state not from MFs but from PCs.

**z**

_{t}as in

Here **K** is the Kalman gain and **C** is the observation matrix. By comparing the second equation of (17) and Eq. (18), we speculate that the weights from PC to DC (*w*^{PC → DC}) and the weights from MF to DC (*w*^{MF → DC}) correspond to the matrices **I** − **KC** and **K** in Eq. (18), respectively. While these weights were assumed to be stable and constant in our analysis because the task performance of the monkey was unchanged for years, the analogy predicts an opposing plasticity of *w*^{PC → DC} and *w*^{MF → DC}, namely, *w*^{PC → DC} and *w*^{MF → DC} should change their strengths in opposite directions when learning occurs. Although the analogy between the cerebellum and Kalman filter presented here is a speculation, we believe that this analogy could serve as a computational proposal that drives future studies of the cerebellum.

### Previous Computational Models of the Cerebellum

There are lines of computational models of the cerebellum in the literatures. The pioneering and most dominant model of the cerebellum is the perceptron model of the cerebellar cortex by Marr [54] and independently by Albus [55]. The perceptron model was first inspired by the analogy of feedforward network structures between the cerebellar cortex and perceptron. The core hypothesis was that two independent inputs to a PC (MFs and a climbing fiber) represent input pattern signals and supervised error signals, respectively. Later, the climbing fiber inputs were found to induce long-term depression in synapses between parallel fibers and a PC in rabbits’ cerebellar slices [56, 57]. The perceptron model contains a nonlinear term to threshold a weighted sum of inputs. On the contrary, our results have shown that the linear model sufficed to explain the firing rates of PCs in terms of MFs.

The perceptron model is essentially a static pattern classifier and not designed to handle time-varying, dynamic inputs. Subsequently, the perceptron model was extended to the adaptive filter model which generates a dynamic response by summing various temporal basis patterns [58]. The adaptive filter model considers a recurrent circuit among MFs, Golgi cells, and granule cells which generates resonant temporal patterns with various phase leads and lags. Therefore, the adaptive filter model assumes that the activities of PCs result from the interaction among MFs, Golgi cells, and granule cells. There is supportive evidence of the adaptive filter model [59, 60]. On the contrary, our finding of linear transformation from MFs to PCs suggests that the recurrent circuit plays a negligible role in generating temporal bases and rather that MFs already contain rich temporal repertoire that in turn drives the activities of PCs. The present study cannot exclude a possibility that different parts of the cerebellum may adopt different neural mechanisms for generating temporal patterns; the adaptive filter model has been tested in the floccus, whereas our data was recorded from the cerebellar hemisphere.

## Notes

### Acknowledgments

The authors thank Drs. Yoshikazu Shinoda, Hiroyuki Kambara, and Satoshi Hirose for helpful discussions and encouragements and the members of Swartz Center for Computational Neuroscience at the University of California San Diego for their valuable comments, where a part of this work was conducted. The authors also thank two anonymous reviewers for helpful comments in improving a previous version of the manuscript.

### Grants

This work is supported by JSPS KAKENHI Grant-in-Aid for Scientific Research (No. 25430007, No. 26120005, and No. 16K12476 to H.T.; No. 24650224 to T.I.; No. 14580784, No. 15016008, No. 16015212, No. 20033029, No. 21500319, No. 26120003 to S.K.), the Japan Science and Technology Agency (PRESTO: Intelligent Cooperation and Control) and the Ministry of Education, Culture, Sports, Science and Technology (MEXT), NBRP “Japanese Monkeys” through the National BioResource Project of the MEXT Japan, the JSPS Programs (Program for Advancing Strategic International Networks to Accelerate the Circulation of Talented Researchers, and Embodied-Brain Systems Science), and the Hitachi-Kurata and the Tateishi Science Foundations.

### Author Contribution

HT and SK designed the study. TI and SK provided the data. HT analyzed the data. HT, TI, and SK discussed the results. HT drafted the manuscript. HT, TI, and SK revised the manuscript.

### Compliance with Ethical Standards

### Conflict of Interest

The authors declare that they have no conflict of interest.

## References

- 1.Holmes G. The symptoms of acute cerebellar injuries due to gunshot injuries. Brain. 1917;40(4):461–535.Google Scholar
- 2.Lang CE, Bastian AJ. Cerebellar subjects show impaired adaptation of anticipatory EMG during catching. J Neurophysiol. 1999;82(5):2108–19.Google Scholar
- 3.Martin TA, Keating JG, Goodkin HP, Bastian AJ, Thach WT. Throwing while looking through prisms. I. Focal olivocerebellar lesions impair adaptation. Brain. 1996;119(Pt 4):1183–98.Google Scholar
- 4.Maschke M, Gomez CM, Ebner TJ, Konczak J. Hereditary cerebellar ataxia progressively impairs force adaptation during goal-directed arm movements. J Neurophysiol. 2004;91(1):230–8.Google Scholar
- 5.Morton SM, Bastian AJ. Cerebellar contributions to locomotor adaptations during splitbelt treadmill walking. J Neurosci. 2006;26(36):9107–16.Google Scholar
- 6.Tseng YW, Diedrichsen J, Krakauer JW, Shadmehr R, Bastian AJ. Sensory prediction errors drive cerebellum-dependent adaptation of reaching. J Neurophysiol. 2007;98(1):54–62.Google Scholar
- 7.Wolpert DM, Miall RC, Kawato M. Internal models in the cerebellum. Trends Cogn Sci. 1998;2(9):338–47.Google Scholar
- 8.Wolpert DM, Miall RC. Forward models for physiological motor control. Neural Netw. 1996;9(8):1265–79.Google Scholar
- 9.Bastian AJ. Learning to predict the future: the cerebellum adapts feedforward movement control. Curr Opin Neurobiol. 2006;16(6):645–9.Google Scholar
- 10.Ishikawa T, Tomatsu S, Izawa J, Kakei S. The cerebro-cerebellum: could it be loci of forward models? Neurosci Res. 2016;104:72–9.Google Scholar
- 11.Synofzik M, Lindner A, Thier P. The cerebellum updates predictions about the visual consequences of one's behavior. Curr Biol. 2008;18(11):814–8.Google Scholar
- 12.Kawato M, Kuroda T, Imamizu H, Nakano E, Miyauchi S, Yoshioka T. Internal forward models in the cerebellum: fMRI study on grip force and load force coupling. Prog Brain Res. 2003;142:171–88.Google Scholar
- 13.Miall RC, Christensen LO, Cain O, Stanley J. Disruption of state estimation in the human lateral cerebellum. PLoS Biol. 2007;5(11):e316.Google Scholar
- 14.Ishikawa T, Tomatsu S, Tsunoda Y, Lee J, Hoffman DS, Kakei S. Releasing dentate nucleus cells from Purkinje cell inhibition generates output from the cerebrocerebellum. PLoS One. 2014;9(10):e108774.Google Scholar
- 15.Tomatsu S, Ishikawa T, Tsunoda Y, Lee J, Hoffman DS, Kakei S. Information processing in the hemisphere of the cerebellar cortex for control of wrist movement. J Neurophysiol. 2016;115(1):255–70.Google Scholar
- 16.Dayan P, Abbott LF. Theoretical neuroscience. Cambridge: MIT Press; 2001.Google Scholar
- 17.Ohzawa I, DeAngelis GC, Freeman RD. Stereoscopic depth discrimination in the visual cortex: neurons ideally suited as disparity detectors. Science. 1990;249(4972):1037–41.Google Scholar
- 18.Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol. 1996:267–88.Google Scholar
- 19.Kakei S, Hoffman DS, Strick PL. Muscle and movement representations in the primary motor cortex. Science. 1999;285(5436):2136–9.Google Scholar
- 20.Kakei S, Hoffman DS, Strick PL. Direction of action is represented in the ventral premotor cortex. Nat Neurosci. 2001;4(10):1020–5.Google Scholar
- 21.Kakei S, Hoffman DS, Strick PL. Sensorimotor transformations in cortical motor areas. Neurosci Res. 2003;46(1):1–10.Google Scholar
- 22.Jordan MI, Rumelhart DE. Forward models: supervised learning with a distal teacher. Cogn Sci. 1992;16(3):307–54.Google Scholar
- 23.Pasalar S, Roitman AV, Durfee WK, Ebner TJ. Force field effects on cerebellar Purkinje cell discharge with implications for internal models. Nat Neurosci. 2006;9(11):1404–11.Google Scholar
- 24.Yamamoto K, Kawato M, Kotosaka S, Kitazawa S. Encoding of movement dynamics by Purkinje cell simple spike activity during fast arm movements under resistive and assistive force fields. J Neurophysiol. 2007;97(2):1588–99.Google Scholar
- 25.Evarts EV. Relation of pyramidal tract activity to force exerted during voluntary movement. J Neurophysiol. 1968;31(1):14–27.Google Scholar
- 26.Kalaska JF, Cohen DA, Hyde ML, Prud'homme M. A comparison of movement direction-related versus load direction-related activity in primate motor cortex, using a two-dimensional reaching task. J Neurosci. 1989;9(6):2080–102.Google Scholar
- 27.Shidara M, Kawano K, Gomi H, Kawato M. Inverse-dynamics model eye movement control by Purkinje cells in the cerebellum. Nature. 1993;365(6441):50–2.Google Scholar
- 28.Liu X, Robertson E, Miall RC. Neuronal activity related to the visual representation of arm movements in the lateral cerebellar cortex. J Neurophysiol. 2003;89(3):1223–37.Google Scholar
- 29.Schmolesky MT, Wang Y, Hanes DP, Thompson KG, Leutgeb S, Schall JD, et al. Signal timing across the macaque visual system. J Neurophysiol. 1998;79(6):3272–8.Google Scholar
- 30.Soso MJ, Fetz EE. Responses of identified cells in postcentral cortex of awake monkeys during comparable active and passive joint movements. J Neurophysiol. 1980;43(4):1090–110.Google Scholar
- 31.Ramnani N. The primate cortico-cerebellar system: anatomy and function. Nat Rev Neurosci. 2006;7(7):511–22.Google Scholar
- 32.Houck BD, Person AL. Cerebellar loops: a review of the nucleocortical pathway. Cerebellum. 2014;13(3):378–85.Google Scholar
- 33.Gao Z, Proietti-Onori M, Lin Z, Ten Brinke MM, Boele HJ, Potters JW, et al. Excitatory cerebellar Nucleocortical circuit provides internal amplification during associative conditioning. Neuron. 2016;89(3):645–57.Google Scholar
- 34.Hamori J, Takacs J. Two types of GABA-containing axon terminals in cerebellar glomeruli of cat: an immunogold-EM study. Exp Brain Res. 1989;74(3):471–9.Google Scholar
- 35.Kennedy A, Wayne G, Kaifosh P, Alvina K, Abbott LF, Sawtell NB. A temporal basis for predicting the sensory consequences of motor commands in an electric fish. Nat Neurosci. 2014;17(3):416–22.Google Scholar
- 36.Chen S, Augustine GJ, Chadderton P. Serial processing of kinematic signals by cerebellar circuitry during voluntary whisking. Nat Commun. 2017;8(1):232.Google Scholar
- 37.Chen S, Augustine GJ, Chadderton P. The cerebellum linearly encodes whisker position during voluntary movement. elife. 2016;5:e10509.Google Scholar
- 38.Raymond JL, Medina JF. Computational principles of supervised learning in the cerebellum. Annu Rev Neurosci. 2018;41:233–53.Google Scholar
- 39.Laurens J, Heiney SA, Kim G, Blazquez PM. Cerebellar cortex granular layer interneurons in the macaque monkey are functionally driven by mossy fiber pathways through net excitation or inhibition. PLoS One. 2013;8(12):e82239.Google Scholar
- 40.Lisberger SG, Fuchs AF. Role of primate flocculus during rapid behavioral modification of vestibuloocular reflex. II. Mossy fiber firing patterns during horizontal head rotation and eye movement. J Neurophysiol. 1978;41(3):764–77.Google Scholar
- 41.Dugue GP, Tihy M, Gourevitch B, Lena C. Cerebellar re-encoding of self-generated head movements. elife. 2017;6.Google Scholar
- 42.Herzfeld DJ, Kojima Y, Soetedjo R, Shadmehr R. Encoding of action by the Purkinje cells of the cerebellum. Nature. 2015;526(7573):439–42.Google Scholar
- 43.Hong S, Negrello M, Junker M, Smilgin A, Thier P, De Schutter E. Multiplexed coding by cerebellar Purkinje neurons. elife. 2016;5.Google Scholar
- 44.Medina JF. The multiple roles of Purkinje cells in sensori-motor calibration: to predict, teach and command. Curr Opin Neurobiol. 2011;21(4):616–22.Google Scholar
- 45.Medina JF, Lisberger SG. Variation, signal, and noise in cerebellar sensory-motor processing for smooth-pursuit eye movements. J Neurosci. 2007;27(25):6832–42.Google Scholar
- 46.Medina JF, Lisberger SG. Encoding and decoding of learned smooth-pursuit eye movements in the floccular complex of the monkey cerebellum. J Neurophysiol. 2009;102(4):2039–54.Google Scholar
- 47.Sun Z, Smilgin A, Junker M, Dicke PW, Thier P. The same oculomotor vermal Purkinje cells encode the different kinematics of saccades and of smooth pursuit eye movements. Sci Rep. 2017;7:40613.Google Scholar
- 48.Ebner TJ, Hewitt AL, Popa LS. What features of limb movements are encoded in the discharge of cerebellar neurons? Cerebellum. 2011;10(4):683–93.Google Scholar
- 49.Heiney SA, Kim J, Augustine GJ, Medina JF. Precise control of movement kinematics by optogenetic inhibition of Purkinje cell activity. J Neurosci. 2014;34(6):2321–30.Google Scholar
- 50.Kleine JF, Guan Y, Buttner U. Saccade-related neurons in the primate fastigial nucleus: what do they encode? J Neurophysiol. 2003;90(5):3137–54.Google Scholar
- 51.Ten Brinke MM, Heiney SA, Wang X, Proietti-Onori M, Boele HJ, Bakermans J, et al. Dynamic modulation of activity in cerebellar nuclei neurons during Pavlovian eyeblink conditioning in mice. elife. 2017;6.Google Scholar
- 52.Brodal P, Dietrichs E, Walberg F. Do pontocerebellar mossy fibres give off collaterals to the cerebellar nuclei? An experimental study in the cat with implantation of crystalline HRP-WGA. Neurosci Res. 1986;4(1):12–24.Google Scholar
- 53.Shinoda Y, Sugiuchi Y, Futami T, Izawa R. Axon collaterals of mossy fibers from the pontine nucleus in the cerebellar dentate nucleus. J Neurophysiol. 1992;67(3):547–60.Google Scholar
- 54.Marr D. A theory of cerebellar cortex. J Physiol. 1969;202(2):437–70.Google Scholar
- 55.Albus JS. A theory of cerebellar function. Math Biosci. 1971;10(1–2):25–61.Google Scholar
- 56.Ito M, Kano M. Long-lasting depression of parallel fiber-Purkinje cell transmission induced by conjunctive stimulation of parallel fibers and climbing fibers in the cerebellar cortex. Neurosci Lett. 1982;33(3):253–8.Google Scholar
- 57.Ito M, Sakurai M, Tongroach P. Climbing fibre induced depression of both mossy fibre responsiveness and glutamate sensitivity of cerebellar Purkinje cells. J Physiol. 1982;324:113–34.Google Scholar
- 58.Fujita M. Adaptive filter model of the cerebellum. Biol Cybern. 1982;45(3):195–206.Google Scholar
- 59.Dean P, Porrill J. Decorrelation learning in the cerebellum: computational analysis and experimental questions. Prog Brain Res. 2014;210:157–92.Google Scholar
- 60.Dean P, Porrill J, Ekerot CF, Jorntell H. The cerebellar microcircuit as an adaptive filter: experimental and computational evidence. Nat Rev Neurosci. 2010;11(1):30–43.Google Scholar

## Copyright information

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.