Keywords

1 Introduction

As Internet and Internet of things technology have been developing rapidly, agriculture has entered the era of “big data”. Because agricultural data comes from a variety of sources, updates more quickly and has complicated structure, with every technological progress in agriculture, it is necessary to research big data of agriculture [1, 2]. In the face of huge amount of agricultural information resources, there are some problems such as “resource obscurity” and “information overload”. Internet of things technology has made access to agricultural context data more conveniently and accurately. It can dig user preference information in the light of context data, it is able to offer personalized information recommendation services for farmers.

Traditional recommendation systems mainly research “user-item” 2D model, by constructing the user interest model and combining with appropriate recommendation algorithm, they take the initiative to push items which users are interested in to users [3]. In agricultural areas, however, due to the complexity of agricultural variety and timeliness of information demand, if it joins the contextual information to recommend, it will have a better effect of information for farmers. The current recommendation systems are mostly based on user browsing behavior which can be recommended to users by digging up the content of the log, but they hardly take into account the context and characteristics of the users [4]. The proposed algorithm is combined with the farmer’s production contextual information, such as planting varieties, season, climate, soil, water quality, location and so on. By constructing context-aware user interest model and combined with Slope one algorithm to predict preference of the items, it was carried out on the field information recommendation. Finally, the recommendation results of improved algorithm have been tested and evaluated.

2 Related Work

From the point of view of the recommendation models, the recommendation systems are mainly divided into the following categories: Content-Based (CB) [5,6,7], Collaborative Filtering (CF) [8,9,10], Hybrid recommendation [11, 12], etc. These algorithms use data mining and optimization algorithms to mine the nearest neighbors or items, and then predict user’s preference to recommend.

2.1 User-Based Collaborative Filtering Algorithm

The process of user-based collaborative filtering recommendation algorithm includes: (1) searching for similar users; (2) calculating user’s prediction rating for unevaluated items; (3) recommending items to target users that are highly rated and not evaluated by target user [13]. The algorithms which are used to calculate similarity between users mainly include cosine similarity, modified cosine similarity and Pearson similarity. Because different users have different evaluation criteria and grading habits for the same content, the similarity between users is calculated by using the modified cosine similarity as Eq. 1 [14].

$$ \text{sim} (a,b) = \frac{{\sum_{{c \in U_{ab} }} (U_{a,c} - \overline{{U_{a} }} )(U_{b,c} - \overline{{U_{b} }} )}}{{\sqrt {\sum_{{c \in U_{ab} }} (U_{a,c} - \overline{{U_{a} }} )^{2} } \sqrt {\sum_{{c \in U_{{_{ab} }} }} (U_{b,c} - \overline{{U_{b} }} )^{2} } }} $$
(1)

\( U_{a,c} \) is the rating of user a for item c. \( U_{b,c} \) is the rating of user b for item c. \( \overline{{U_{a} }} \) and \( \overline{{U_{b} }} \) are the average value of user a and user b for all item rating respectively.

After calculating the similarity of users, it should calculate the prediction ratings of other items. The nearest neighbor set of the target user u is NNu, and the prediction rating of user u for the recommended item i can be obtained by the user’s rating of the item in the nearest neighbor set NNu as Eq. 2.

$$ {\text{S}}_{u,i} = \overline{{{\text{U}}_{u} }} + \frac{{\sum\limits_{{m \in NN_{u} }} {\text{sim} (u,m)(U_{m,i} - \overline{{U_{m} }} )} }}{{\sum\limits_{{m \in NN_{u} }} {\left| {\text{sim} (u,m)} \right|} }} $$
(2)

sim (u, m) represents similarity of the target user u and m that is its nearest neighbor set NNu.

2.2 Item-Based Collaborative Filtering Algorithm

The item-based collaborative filtering algorithm mainly includes: (1) using the existing user item rating record to calculate the similarity between items; (2) finding the nearest neighbor set of the target item according to the size of the similarity; (3) predicting the target item’s rating by using the target user’s rating of the nearest neighbor set item, and recommending the item to the user with a high forecast rating [15,16,17]. In order to alleviate data sparseness and scalability problem of the user-based collaborative filtering recommendation algorithm, it chooses to use Slope one algorithm [18]. Firstly, it calculates the item similarity as Eq. 3, and then calculates predicting rating of the items as Eq. 4. \( \text{f} (x) = x + b \) is used to predict rating and it has higher accuracy and efficiency [19].

$$ {\text{dev}}_{x,y} = \sum\limits_{u \in U(x) \cap U(y)} {\frac{{s_{u,x} - s_{u,y} }}{{\left| {U(x) \cap U(y)} \right|}}} $$
(3)
$$ {\text{Q}}_{u,x} = \overline{{s_{u} }} + \sum\limits_{{y \in S_{u} }} {\frac{{dev_{x,y} }}{{\left| {S_{u} } \right|}}} $$
(4)

In Eq. 3, \( \left| {\text{U}(x) \cap \text{U}(y)} \right| \) is the amount of users who generate ratings of item x and item y at the same time. In Eq. 4, Su is the rating of all known the items for user u except for item x.

2.3 Context-Aware Recommendation

Context-aware recommendation system is a recommendation system which contains context information. The contextual information is introduced into the traditional “user-item” 2D model, and then it will make its extension to be a multidimensional utility model R including contextual information: User × Item × Context → Rating [20, 21], in which the user, item and the context of structural description is as Eq. 5.

$$ \begin{array}{*{20}l} {{\text{User}}\, \subseteq \,{\text{UId}} \times {\text{UPreferences}} \times UAddress \times UJob;} \hfill \\ {Item\, \subseteq \,IName \times IType \times IContent;} \hfill \\ {Context\, \subseteq \,Location \times Time \times Environment \times Activity.} \hfill \\ \end{array} $$
(5)

Based on the location of context information used in the recommendation process, Context-aware recommendation is divided into: pre-filtering, contextual modelling, and post-filtering.

Pre-filtering refers to making use of contextual information preprocessing for the recommended data sets, filtering out the user rating data which is not associated with the current user context, then using traditional recommendation technology to predict the user preferences, and then producing recommendations. This paradigm transforms the multi-dimensional recommendation problem into a solution to the traditional two-dimensional recommendation problem. Baltrunas presented the improved pre-filtering method based on contextual user clustering, and analyzed and verified the real data of e-commerce, online travel, mobile advertising and other fields [22]. Van Setten presented a contextual pre-filtering recommendation system based on knowledge rules [23]. Liu put forward a service recommended architecture that combined Context-aware and ontology. Above all, it was access to service data sets related to the current context of users, and then calculated the similarity of users fusing contextual information, eventually generated the personalized recommendation services for users based on the current context [24].

3 Context-Aware and Collaborative Filtering Algorithm

By acquiring information about climate, soil, light, location, time and so on, this algorithm constructs the “user-item-context” 3D model. Through context-aware, it can mine the users’ behavior, preferences, habits, knowledge level, interest characteristics and so on, then it fuses information into the user interest model, and further uses recommendation algorithm based on the traditional two-dimensional model. In this way, it establishes an initial user interest model based on the context information and solves the cold startup problem of the recommendation algorithm.

3.1 Context-Aware User Interest Model

User interest preferences vary with contextual information (such as location, time, environment, etc.), so it should consider contextual information during the recommendation process.

Location.

According to the distribution of national planting industry, it can be divided into east China, south China, central China, north China, northwest, northwest, northeast and so on, and it can also be divided according to the provinces and counties planned by the administrative region of China. The current location of the user is obtained using the IP address or mobile phone. The formula is Eq. 6.

$$ IR_{location} = {\text{l}}(u_{i} ,m_{i} ) = \sqrt {\frac{{\exp ( - \frac{{(x_{i,0}^{2} + y_{i,0}^{2} )}}{2} - \exp ( - \frac{{(x_{j,1}^{2} + y_{j,1}^{2} )}}{2})}}{{\exp ( - \frac{{(x_{i,0}^{2} + y_{i,0}^{2} )}}{2}{ + }\exp ( - \frac{{(x_{j,1}^{2} + y_{j,1}^{2} )}}{2})}}} $$
(6)

Every region of China’s agriculture has its own characteristics such as the main agricultural production object, the agricultural production cycle, the development of the vulnerable disaster. When a user is in the corresponding region, he will contribute to the main crops which he is interested in, and he will product behaviors of sowing, harvesting based on the laws of the local agricultural production cycle.

Time.

Agricultural production has obvious seasonal and cyclical characteristics, and the demand diversification of users in agriculture will change over time. The three ways of dividing the time are month, season and crop growth period. It makes the change of the user’s interests based on the differences of different crop growth period. If the current season is in April, southern corn growers will tend to be more understanding of spring maize planting technology. The adaptive exponential attenuation function to process the time information is as Eq. 7.

$$ \begin{array}{*{20}l} {IR_{\text{time}} = {\text{t}}(u_{i} ,t,l_{i} ,c,p) = \exp \left| { - {\text{In}}2\;*\;time(u,p)/hl_{u} } \right|} \hfill \\ {time(u,p) \ge 0,\;time(u,p) \in N} \hfill \\ \end{array} $$
(7)

Among them, \( {\text{t}}(u_{i} ,t,l_{i} ,c,p) \) is the value of time weight which is used to calculate the attenuation degree of user interest. The ui is the user. The t is the current time of the system. The li is the location of the user. The c is the crop type in the user’s label, and the p is the page of the user browsing. When \( time(u,p) = 0 \), it is the last browsing time of the page p. When \( time(u,p) = 1 \), it is the penult time to browse. The \( hl_{u} \) is the half-life of user u, and its value is related to the growth cycle of the crop. At the beginning of the growth cycle, the interest degree of the user will be high, and the interest degree will decrease with time.

Environment.

Agricultural field environment context includes temperature, humidity, illumination, precipitation, wind speed, wind direction, air pressure, soil temperature, soil moisture, NPK, water level, dissolved oxygen, PH, and etc. All information can be obtained by IoT sensor. These contextual information can guide the agricultural production activities on cultivation, fertilization, irrigation, improvement, and it also can help farmers find problems in time, improve the agricultural production, and avoid or reduce the influence of all kinds of meteorological disasters. The environment context can be signified by Eq. 8.

$$ EnvironContext = \left\{ {temperature,\;humidity,\;illumination,\;rainfall} \right\} $$
(8)

In the application of specific examples, that setting up an environment context model can select several important factors to adjust the dimension properly.

3.2 Context-Aware Collaborative Filtering Algorithm

Algorithm Thought.

The algorithm adopts the pre-filtering recommended paradigm, and the first is obtained approximate context set by computing context similarity with target user current context, and then “user-item-context” 3D model is reduced to “user-item” 2D model, and finally by computing item similarity, it can predict the ratings of items and generate recommendations using slope one algorithm. The algorithm frame is represented in Fig. 1.

Fig. 1.
figure 1

Context-aware algorithm frame

Context Similarity Calculation.

Context similarity computing includes location similarity, time similarity, and environment similarity.

Location Similarity.

By obtaining the user’s location, the database determines whether the users belong to the same planting area. If they belong to the same area, they are supposed to be similar. To define a growing area as E, the calculation formula of the location similarity of target user a and the other user b is as Eq. 9.

$$ {\text{simL(}}a,b )= \left\{ \begin{aligned} 1, \;\;\,a,b \in E \hfill \\ 0, \;\;\,a,b \notin E \hfill \\ \end{aligned} \right. $$
(9)

Time Similarity.

Through calculating time difference between currently using system time of target user and using system of other users last time, if the difference is smaller, the time similarity is bigger, on the contrary it is smaller. Solar terms play an important role in farming guide, so it divides by a solar term for a period in China. It defines that t is the time difference for users to use the system, the unit for the day. The time similarity computation formula of the target user a with another user b is as Eq. 10.

$$ \text{simT} (a,b )= \left\{ {\begin{array}{*{20}l} {1,} \hfill & {t \in \left[ {0,15} \right]} \hfill \\ {0.8,} \hfill & {t \in \left( {15,30} \right]} \hfill \\ {0.6,} \hfill & {t \in \left( {30,45} \right]} \hfill \\ {0.4,} \hfill & {t \in \left( {45,60} \right]} \hfill \\ {0.2,} \hfill & {t \in \left( {60,75} \right]} \hfill \\ {0,} \hfill & {t \in \left( {75,\infty } \right)} \hfill \\ \end{array} } \right. $$
(10)

Environment Similarity.

Environmental context involving factors index is more complicated, and the environment such as climate, topography, soil information has distinct regional characteristics, so each area should plant reasonably on the grounds of their own conditions for precipitation, temperature and humidity conditions, soil conditions. Therefore, the environment similarity can be computed based on the location similarity by Eq. 11.

$$ {\text{simE(}}a,b )= \left\{ \begin{aligned} 1, \;\;\,a,b \in E \hfill \\ 0, \;\;\,a,b \notin E \hfill \\ \end{aligned} \right. $$
(11)

Comprehensive Context Similarity.

The above three similarity is acquired. The comprehensive context similarity formula of the users is calculated by Eq. 12.

$$ \text{sim}(a,b) = \alpha \text{simL}(a,b) + \beta \text{simT}(a,b) + (1 - \alpha - \beta )\text{simE}(a,b) $$
(12)

Weighting factor \( 0 < \alpha ,\;\beta < 1 \), \( \alpha \) indicates the weight of the user’s location similarity, and \( \beta \) indicates the weight of the user’s time similarity of the system, and \( 1 - \alpha - \beta \) indicates the right value of the environment similarity. By setting different values, the three kinds of similarity can be fused to improve the recommended quality.

Algorithm Process Description.

The algorithm process is described below.

Input: “user-item-context” 3D model Cn; Target user u; User rating data S; Current context information C; Collection I (R) of the recommended items.

Output: user u in the current context C has the most preferred Top-N item set I (N).

Algorithm process:

figure a

Initialization algorithm process is as follows: first of all, items recommended list is empty (line 1). Then it takes out all of the rating data of the target user u, and it constructs “item-context” 2D rating matrix as shown in Table 1 (line 2). Based on the current context information C and context model Cn, it calculates comprehensive context similarity using the Eqs. 9– 12 and structures approximate context set F(C) of the current context (line 3–5). The 3D model is lessened to “user-item” 2D model. If a user has different ratings on the same item in different contexts, it only saves the rating which has the largest rating with the current context similarity (line 6). According to Eq. 3, the similarity calculation between items is obtained (line 7–8), and the similarity between items is shown in Table 2. Using Eq. 4 to predict the item’s rating (line 9), the first N items collection I (N) which has the highest predicting rating is recommended to users (line 10–11).

Table 1. Item-context rating matrix
Table 2. Similarity between items

4 Experimentation and Results Analysis

In order to verify the validity of the CACF algorithm, it adopts the Java language, Eclipse2015 development tool and the MySQL5.7 database to develop a system prototype to verify.

4.1 Data Sets

The data set is to use the crawler technology from China’s agricultural science and technology information website, China’s rural website, China agricultural information website and other comprehensive websites crawling 2000 users data, including the user’s location, the time of using the system, the environmental information, users’ interest in crop varieties, the user browsing behavior and so on. It selects 80% of these as the training set, and the remaining 20% as the test set.

4.2 Evaluation Criteria

This paper adopts MAP (Mean Average Precision) and P@N as the measurement criteria. These two methods are commonly used to measure classification accuracy, which focus on the accuracy of the most relevant items, and pays more attention to the ordering relation of the recommended results, and it is also more consistent with the Top-N recommendation.

MAP is used to measure the accuracy of the average ranking of all related items. The larger the value of MAP, the higher the accuracy, on the contrary the lower the accuracy. MAP is calculated by Eq. 13.

$$ \text{MAP} = \frac{1}{\left| U \right|}\sum\limits_{i = 1}^{\left| U \right|} {\frac{1}{{\left| {C_{i} } \right|}}} \sum\limits_{g = 1}^{{\left| {C_{i} } \right|}} {\frac{1}{{\left| {R_{ig} } \right|}}} \sum\limits_{h = 1}^{{\left| {R_{ik} } \right|}} {\frac{h}{{r_{igh} }}} $$
(13)

Among them, \( \left| U \right| \) signifies the amount of elements in the user collection; \( \left| {C_{i} } \right| \) shows the amount of contextual type; \( \left| {R_{ig} } \right| \) expresses related numbers under context \( c_{g} \) of \( u_{i} \), \( r_{igh} \) denotes the h related item. If there is no context information included in the user preferences, \( \left| {C_{i} } \right| = 1 \).

P@N is used to evaluate the relevance of the user’s first N recommendation items. In the experiment, it defines N = 15 and defines W as the number of related items for the top-N recommendation items as Eq. 14.

$$ {\text{P@}}15 = \frac{{W_{15} }}{15} $$
(14)

4.3 Results and Discussion

This experiment firstly classifies the training set and test set according to the context information, and pre-filters the data through the calculation of the comprehensive context similarity, then it carries on the preference prediction and recommendation.

The results of this experiment are demonstrated in Figs. 2, 3 and 4. In Fig. 2, the number of item neighbors K is equal to 20,40,60,80,100, which compared the MAP values of different algorithms. As the number of item neighbors increased, the MAP values of different algorithms increased, and all these algorithms reached the maximum value at K = 60, but then the MAP value began to decrease with the increase of the amount of neighbors. In Fig. 3, the number of item neighbors K is equal to 20, 40, 60, 80, 100, and the P@15 value of different algorithms was compared. With the quantity of item neighbors increasing, the value of P@15 of different algorithms first rose and then fell, and all these algorithms reached the maximum value at K = 60. It is obvious that a suitable number of item neighbors can improve the recommendation accuracy. In Fig. 4, therefore, it took the number of nearest neighbor item K = 60, it can be seen that the CACF had the highest MAP value of 0.65, that is 25% and 12% respectively higher than the UBCF and SLOA; The P@15 value of the CACF is 0.51, which is 30% and 13% respectively higher than the UBCF and SLOA. To sum up, the CACF has the higher MAP value and higher P@15 accuracy values than the UBCF and SLOA. It is shown that it has a higher recommendation quality in the case of considering context information.

Fig. 2.
figure 2

The influence of the number of neighbors on MAP

Fig. 3.
figure 3

The influence of the number of neighbors on P@15

Fig. 4.
figure 4

The comparison of different algorithms on MAP and P @15

5 Conclusion

Regarding to the problems of the complexity of agricultural field information and the personalized needs of agricultural users, this research puts forward a kind of recommendation algorithm which combined context-aware with the Item-based collaborative filtering. Through acquiring agricultural context information such as location, time, environment, calculating context similarity and adopting pre-filtering method, the 3D model is reduced to “user-item” 2D model, then it uses the classic Slope one algorithm to predict preference to recommend field information. Experiments show that the CACF algorithm has improved their effectiveness and has higher recommendation quality compared with UBCF and SLOA. The next step will be to refine the user contextual information to improve the recommendation quality further.