Keywords

1 Introduction

Applications of Web 2.0 enable people to create and share information on the Internet; however, volumes of user-generated contents have in turn caused the problem of information overload and hindered users from browsing and retrieving information [1]. If not properly addressed, the users would be frustrated by the increasing number of online resources. Recently, some Web 2.0 platforms provide tagging mechanism, namely social tagging, that allows users to annotate resources (e.g., websites, articles, photos, videos, music, and etc.) with free, preferred keywords to ease the access to their collecting resources in the future. For example, Del.icio.us, a social bookmarking website, enables individuals to bookmark any URLs on the World Wide Web. CiteULike, a digital library, allows users to upload abstracts or full-texts of research articles with relevant tags (or labels) and afterward they can retrieve documents through their corresponding tags. Social tagging that takes into account users’ notion of a specific resource [2] is helpful in organizing, browsing and retrieving their own resources [3]. In other words, social tagging allows resources to be categorized in the way a particular user prefers to, and therefore is considered as a substitute for taxonomy [4, 5].

Social tagging can benefit people managing their online resources; on the other hand, tags from individual users represent their personal preferences and can be used to improve the performance of personalized recommendation if properly utilized [6]. However, social tagging always suffers the problems such as diverse and/or unchecked vocabulary and unwillingness to tag because tags are freely and voluntarily assigned by users [7, 8]. The problem of diverse vocabulary may result from users’ forgetfulness and bounded rationality. Ebbinghaus hypothesized that the memory retention declines over time and people may lose about 70% of the information received two days ago without any attempt on retention [9]. Though the loss of information can be mitigated by constant recall, it persists when people fail to review frequently [10]. Furthermore, the bounded rationality also limits users’ ability to process information that disabled them from recalling all the tags they have used [11]. As a result, users might tend to reuse the most frequent terms (vocabularies) or use different terms each time they annotate similar resources. Thus, as the number of annotated resources increased, the tag space would become vast and the resources related to a tag would become heterogeneous. Both might frustrate users in accessing resources due to the cognitive dissonance [12, 13].

To address the problems faced by social tagging, some studies have shifted focus on tag recommender systems to assist individual users in tagging resources and converge the tags attached [3, 14,15,16,17,18]. Tag recommendation service has been provided by some websites, such as Delicious, BibSonomy, and Last.fm, that implies the needs in real-world situation. The task of tag recommender systems is to identify a set of tags that might be considered relevant to a resource by the focal user. Specifically, given a user u and a resource r, the task of traditional recommendation is to predict the class of preference(u, r); while that of tag recommendation is to predict the set of tags(u, r) what the user u will assign to the resource r [7].

For making tag recommendations, previous research assumed the number of tags was static; that is, users are limited to annotating resources with existing tags. For example, they followed collaborative filtering methods by identifying users whose used tags or annotating resources are similar to that of the focal user and suggest those tags annotating to the similar resources by these users [19]. On the other hand, some research addresses the tag recommendation problem by content-based approaches. For example, while annotating a resource, some studies tried to identify the resources that share similar content with the focal resource and then recommended the top-ranked tags annotated to them to the user [20]. Chen and Shin proposed several textual features and social features for each tag used by each particular user and use which to construct a classifier to predict the representative tags that the focal user is interested in [21].

Though prior studies have shown the effectiveness of their proposed approaches in making tag recommendations, they still have some limitations needed to be addressed. A resource will be suggested one or more used tags compulsorily no matter whether they are relevant or not. However, the tags that people used to annotate resource might evolve over time. Some tags that receive less notice will be left behind; while some tag shall emerge from as annotating new resources. Reasonably, these tags shall be relevant to the annotating resource and related to the particular user’s topics of interest (the tags he or she has used to annotate resources) to better help users retrieve the resources later. That is, the new tags must to some degree conform to or associate with users’ practices. In a nutshell, a tag recommender system shall make suggestion on the basis of existing tags and is able to recommend new tags that are appropriate and associated with users’ practices.

Nevertheless, prior research focused most on the reuse of existing tags and accordingly attempted to recommend people those tags that are popular among the referred users or frequently used to annotate similar resources. Generally, people annotate resources one by one, and each resource will be assigned one or more tags. The assigned tags can be existing ones that people used to annotate previous resources (including the previous one), or created by the users if there is no proper tag existing, or both. As a result, this study intends to improve the personalized tag recommendations by suggesting appropriate existing tags or new tags to the target user. Instead of multi-label classification, we adopt the content-based approach and model personalized tag recommendation as an incremental clustering problem. The incremental clustering assumes each object (or resource) appears in sequence and then is incrementally clustered into either an appropriate existing category or a created new category [22, 23]. This study extends the incremental clustering approach and propose a progressive expansion-based tag (PET) recommendation technique. The proposed PET recommendation technique assumes the resources to be annotated are fed in sequence and will be assigned one or more existing categories (i.e., existing tags) and/or suggested appropriate new categories (i.e., new tags). In addition, when determining the appropriate existing tags for a resource, the PET technique will consider the focal user’s topics of interest. For example, instead of identifying similar resources to make tag recommendations, PET tries to measure the relevance between a new resource and a tag. It measures the content similarity between the resource to be annotated and all resources annotated by the tag. Furthermore, to suggest new tags, the PET will identify the representative term(s) in the resource to be annotated by measuring not only the term frequency but also the relevance to existing tags. The remainder of this study is organized as follows: In Sect. 2, we review the literature relevant to this study. Then we depict our proposed Progressive Expansion-based Tag (PET) recommendation technique in Sect. 3. In Sect. 4, we describe the empirical evaluation including the data collection, evaluation design, followed by the evaluation results in Sect. 5.

2 Literature Review

In this section, we briefly review the research works relevant to our proposed progressive expansion-based tag recommendation technique, including prior research in tag recommender systems and an overview of incremental clustering.

2.1 Tag Recommender Systems

Tag recommender is one kind of recommender systems. Instead of recommending objects such as books, music, or movies, the purpose of tag recommender is to suggest appropriate tags to users who are annotating objects in the social media; especially the social bookmarking and the media sharing websites. Such websites generally provide the social tagging mechanisms that allow users to annotate objects with free keywords. For example, the social bookmarking website, Del.icio.us enables individuals to bookmark any URLs in the World Wide Web; the digital library, CiteULike allows users to upload the abstract or full-text of research articles with some relevant tags (or labels); the famous video sharing website, YouTube allows users to upload their videos with some tags. Though user-generated content is the core to Web 2.0, Internet users have been overloaded with the great volumes of information that hinders them from browsing and retrieving information [1]. Social tagging can benefit users managing and accessing their online resources; on the other hand, the tags annotated by an individual user may represent his or her notions of a resource that can facilitate the personalized recommendation if properly utilized [2, 3, 6].

Use of tags allows users to annotate resources in the way they like, and therefore, tagging is somehow considered as a substitute for the taxonomy of user’s resources [4, 5]. Nevertheless, social tagging always suffers the problems such as diverse and/or unchecked vocabulary and unwillingness to tag because tags are freely and voluntarily assigned by users [7, 8]. Besides, users tend to reuse the frequent tags or to create new tags, which will diminish the coherence or distinctness of the resources with a specific tag and adversely affect users’ resource searches and access due to the cognitive dissonance [12, 13]. To address the problems faced by social tagging, prior research attempts to develop tag recommender systems to support users in annotating resource to converge the tags attached [3, 14,15,16,17,18]. Tag recommendation may drastically transfer the tagging process from generation to recognition which reduces user’s cognitive effort and time [20]. A tag recommender system follows some criteria to select from the tag space the most relevant tags to the user’s uploading resource. Specifically, given a user u and a resource r, the task of tag recommendation is to predict a set of tags(u, r) from a finite set of tags T that the user u may prefer to annotate the resource r [7].

Prior research broadly divided the tag recommendations into content-based, collaborative filtering, and graph-based (or ranking-based) approaches according to their adoptive algorithms [3, 7]. The content-based approaches focus on content analysis and are mainly applied textual resources like webpages and textual documents [21, 24,25,26,27,28,29,30,31]. Instead of analyzing contents, the collaborative filtering approaches for tag recommendation resemble traditional collaborative filtering recommendation approaches which make recommendations on the basis of the preferences of a referent group [19, 20, 32]. Finally, the graph-based or ranking-based approaches are inspired from the Web ranking. They make recommendations based on the ranking score that is computed according to spectral attributes extracted from the underlying folksonomy data structure (i.e., the 3-way relationship among users, resources, tags) [7, 17, 33, 34].

Overall, prior research focused most on the reuse of existing tags and attempted to recommend people those tags that are popular among the referred users or frequently used to annotate similar resources. Though users’ interests may evolve over time, they seldom take into consideration the user’s topics of interest when making tag recommendations. Besides, people annotate resources one by one and always create new tags combining with existing tags to annotate them. Tag recommendations shall be made with consideration of user’s interests and that is what we intend to address in this study.

2.2 Incremental Clustering

Clustering analysis methods usually employ the batch mode strategy to discover the structure hidden in the whole unlabeled data at a time. However, the sheer volume of data available for clustering analysis has made the memory-based approach impractical, and thus raise the need of incremental clustering approaches, which process one object at a time and require less memory space for data storage [35]. One of the well-known incremental clustering algorithms is sequential k-means [36], which is an incremental variant of Lloyd’s algorithm [37]. The sequential k-means algorithm targets on finding a set of cluster means M that minimizes the cost function \( \sum\nolimits_{{\forall o_{j} \in O}} {\min_{m \in M} \left\| {o_{j} - m} \right\|^{2} } \). It randomly initials k data points as cluster means M = (m1, m2, …, mk) and set to 1 the size of each cluster N = (n1, n2, …, nk). As an object oj arrives, Euclidean distance between the object oj and each of the cluster means will be calculated in sequence. Assume the object oj is classified into its closest cluster ci, the size of cluster ci (i.e., ni) will be increased by 1 and the mean of cluster ci (i.e., mi) will be updated by mi + (oj − mi)/ni.

Yang et al. [23, 38] addressed the news event detection problem by proposing INCR, a single-pass incremental clustering algorithm, produces nonhierarchical clusters incrementally for both retrospective and online detection. For supporting online detection, INCR was designed to sequentially process news documents. It employed an incremental IDF to respond the effect of continuously incoming documents on term weighting and vector normalization during online detection. The incremental IDF is defined as \( idf(w,p) = \log_{2} (\frac{N(p)}{n,(w,p)}) \), where w is the focal term, p is the current time point, N(p) is the number of documents accumulated up to the current time point (including the retrospective corpus if used), and n(w, p) is the document frequency of term w at time point p. Furthermore, INCR incorporated a time penalty, which can be a uniformly weighted time window (i.e., a time window of m documents before x is imposed) or a linear decaying-weight function, to adjust the similarity between a document x and any cluster c in the past. The similarity measure can be cosine similarity or any distance measure like Euclidean distance. The Similarity′(x, c) is defined as \( \left( {\begin{array}{*{20}c} {(1 - \frac{i}{m}) \times Similarity(x,c)} & {{\text{if}}\,c\,{\text{has}}\,{\text{any}}\,{\text{member}}\,{\text{in}}\,{\text{the}}\,{\text{window}}} \\ 0 & {\text{otherwise}} \\ \end{array} } \right) \), where i is the number of documents between x and the most recent member document in c, and m is the time window of documents before x. Finally, a document x is absorbed by the most similar cluster in the past if the similarity between the document and cluster is larger than a pre-selected clustering threshold (tc); otherwise, the document becomes the seed of a new cluster.

3 Progressive Expansion-Based Tag (PET) Recommendation Technique

Our study intends to propose a Progressive Expansion-based Tag (PET) recommendation technique by revising an incremental clustering algorithm. The PET technique considers a focal user’s interests to recommend the appropriate categories (tags) to the resources for the focal user. On the other hand, the PET tries to recommend tags by identifying the relevant tags from the tags annotated to the focal resource by other users if the focal user’s own tags are less appropriate. As shown in Fig. 1, the overall process of the PET technique comprises four phases, including feature extraction and selection, resource representation, candidate tag generation, and tag recommendation. The PET technique takes as inputs a focal user’s resource profile (i.e., resources with their respective annotated tags) and the resources to be annotated and produces a list of tags to be recommended. Because the PET considers user’s (resources) interests, we first group the resources in the user’s profile by their attached tags. Reasonably, two resources that attached the same tag may discuss similar topic or share similar content. A set of important features will then be selected and used to represent resources in each tag cluster. Subsequently, an incremental clustering algorithm is applied to determine a set of appropriate tag clusters for the resources to be annotated. A resource will be classified into a tag cluster if the content similarity between them is over a pre-specified threshold and these tag clusters then become the candidates for recommendations. If a resource could not be classified into suitable tag cluster, the PET will access appropriate tags used by other users. In the following, we describe the preliminary design of the proposed PET technique.

Fig. 1.
figure 1

Overall process of progressive expansion-based tag recommendation technique

Feature Extraction and Selection:

In the feature extraction and selection phase, the resources in the user’s profile are groups by their respective attached tags to form a set of tag clusters. One resource could belong to multiple groups since it might be attached more than one tags. The PET then extracts from the textual resources a set of representative features (i.e., nouns and noun phrases) for representing the resources themselves. We adopted the rule-based part-of-speech tagger developed by Brill to syntactically tag each word in these resources [39]. Subsequently, we employed a parser for extracting nouns and verbs from each syntactically tagged document. The global dictionary scheme was adopted and the chi-square statistic was used to measure to the weight of each feature for constructing the representative feature set of each cluster [40].

Resource Representation:

In the resource representation phase, the resources in each cluster are represented by its set of representative features. In this study, we employed TFxIDF measure as the representation scheme to re-represent the resources in each cluster.

Candidate Tag Generation:

The purpose of candidate tag generation phase is to assess and identify the tags relevant to the resource to be annotated. This phase comprises two stages, including tag cluster identification and new tag generation. At the stage of tag cluster identification, this study revised the INCR algorithm [23, 38] to enable supporting multi-label classification. Specifically, INCR algorithm assumes each object belongs to one and only one cluster. However, in our study, a resource can belong to any number of tag cluster; that is, a resource might be different to the resources in the focal user’s profile or belong to more than one tag cluster. As a result, we accommodate INCR algorithm to be able to assign a resource into multiple tag clusters or create a new cluster for it if needed. We followed the INCR algorithm by employing a clustering threshold. The tag clusters that share similarities with a resource higher than the clustering threshold will be viewed as candidate tags for recommendations. However, when a resource is labeled as new; that is, all the similarities it achieves are lower than the clustering threshold, we will try to identify suitable tags for recommendation from the annotated resources of other users. Thus, the task of new tag generation is to assess suitability of the tags that was annotated to the focal resource by other users. We rank those tags by considering their respective frequency appearing in the whole resources, their relevance associated to the resources that the focal user has annotated, and their temporal distance to the resource to be annotated. The frequency TF is defined as the number of a tag that is used to annotate resources; the relevance TR is defined as the content similarity between a specific tag cluster (i.e., the resources received the specific tag) and the resources in the focal user’s profile; the temporal distance TD is defined as \( e^{{ - \frac{{\left| {Now - Date(t_{i} )} \right|}}{Now - Date(T)}}} \) where ti is the tag to be assessed, T is the set of all candidate tags, Date(ti) is the starting date to use tag ti and Date(T) is the starting date to use anyone of the candidate tags. We finally defined the ranking score of a specific tag ti as Score(ti) = TF × TR × TD.

Tag Recommendation:

The task of the final phase of PET technique is to make tag recommendations. PET will first recommend tags identified at the stage of tag cluster identification, and if needed, the tags identified at the new tag generation stage will be recommended to satisfy the number of recommending tags. The candidate tags from focal user’s profile will be ordered by their achieved similarities and those from other users’ profile will be ordered by their ranking scores.

4 Empirical Evaluation

4.1 Data Collection

We adopted the MovieLens 20M database (ml-20m) as our evaluation corpus. This database contains 465,564 tag applications across 27,278 movies, created by 138,493 users who have rated at least 20 movies between January 09, 1995 and March 31, 2015. Among the database, the max, min, and average number of tags used by a user is 2,330, 1, and 58.1; the max, min, and average number of tags received by a movie is 197, 1, and 15.14; the max, min, and average number of movies that a specific tag was annotated to is 1,093, 2, and 18.03. Because the tags annotated to the movies in the evaluation corpus is sparse, we adopted the p-core scheme to tri-partite hypergraphs to trim the corpus and keep its dense part for the evaluation purpose [41, 42]. Finally, we set the level k to 3 for the p-core scheme to make sure that each user, tag and resource has/occurs at least 3 times in the evaluation corpus. After the trimming, there exists 7,801 users, 19,545 movies, and 364,804 tagging records in the evaluation corpus. Besides, we also collected the synopsis of each annotated movie for the experiments. We implemented a crawler to gather the overview of each movie from TheMovieDb website (https://www.themoviedb.org/) through the movie ID provided by MovieLens database.

4.2 Experiment Design

For each user in the evaluation corpus, we take his or her last annotating movie and corresponding tags as testing examples, and all users’ tagging histories (i.e., all other annotating movies and corresponding tags) as training examples. In this study, we implemented two popularity-based recommendation approaches, namely PAT and PUT as the performance benchmarks. In PAT, the top-n tags that are frequently used to annotate resources by all users will be recommended; on the other hand, in PUT, the top-n tags that are frequently used to annotate resources by the focal user will be recommended. Furthermore, we adopted Precision, Recall, Hamming Loss, Mean Reciprocal Rank (MRR) [43], Average Precision (AP) [44], and Average Utility (AU) as the evaluation criteria. These criteria are defined as Precision = \( \frac{1}{\left| D \right|}\sum\limits_{i = 1}^{\left| D \right|} {\frac{{\left| {P_{i} \cap T_{i} } \right|}}{{\left| {T_{i} } \right|}}} \), Recall = \( \frac{1}{\left| D \right|}\sum\limits_{i = 1}^{\left| D \right|} {\frac{{\left| {P_{i} \cap T_{i} } \right|}}{{\left| {P_{i} } \right|}}} \), Hamming Loss = \( \frac{1}{\left| D \right|}\sum\limits_{i = 1}^{\left| D \right|} {\frac{{\left| {P_{i} \Delta T_{i} } \right|}}{{\left| {P_{i} } \right|}}} \), MRR = \( \frac{1}{\left| D \right|}\sum\limits_{i = 1}^{\left| D \right|} {\sum\limits_{{j \in P_{i} \cap T_{i} }} {\frac{{{1 \mathord{\left/ {\vphantom {1 {Rank_{j} }}} \right. \kern-0pt} {Rank_{j} }}}}{{\left| {P_{i} \cap T_{i} } \right|}}} } \), AP = \( \frac{1}{\left| D \right|}\sum\limits_{i = 1}^{\left| D \right|} {\sum\limits_{{j \in P_{i} \cap T_{i} }} {\frac{{Precision_{j} }}{{\left| {P_{i} \cap T_{i} } \right|}}} } \), and AU = \( \frac{1}{\left| D \right|}\sum\limits_{i = 1}^{\left| D \right|} {\sum\limits_{{j \in P_{i} \cap T_{i} }} {\frac{{Precision_{j} }}{{\left| {P_{i} } \right|}}} } \), where |D| is the number of target movies, Pi is the set of recommended tags for the target movie di, and Ti is the set of true tags annotated to the target movie di, △is the XOR operation, Rankj is the rank of the recommended tag j, and Precisionj is the precision at the time tag j is recommended. Finally, we set the clustering threshold for incremental clustering algorithm to 0.05 and examine the overall effectiveness of our proposed PET and benchmark techniques by averaging the recommendation performance across all users.

5 Evaluation Results

We investigate the effectiveness of both evaluation techniques when the number of recommended tags is three and five. As shown in Table 1, our proposed PET outperforms the benchmarks, i.e., PAT and PUT techniques, across all performance metrics when making recommendation of three and five tags. Though the performance of PET is advantageous over the benchmarks, the rates it achieves across all performance metrics are not satisfying. Furthermore, as the number of recommended tags increased, there is a tradeoff existing in precision and recall rates. However, almost all the rates it achieves are lower than 0.1 except for the recall rate. The evaluation results imply the difficulty of tag recommendation that must identify relevant tags among thousands of candidate tags. Overall, the performance of the proposed PET technique is better than the benchmark technique, which make tag recommendations on the basis of tag’s popularity. Besides, the low performance rates may be raised by the sparse data, that is still a problem needed to be addressed in the study of tag recommendation.

Table 1. Comparative evaluation results

6 Conclusion

This study based on the concept of incremental clustering to propose a progressive expansion-based tag recommendation technique. The PET technique can recommend appropriate tags to the resources to be annotated in consideration of the focal user’s preference and tag usage practices. The preliminary evaluation results indicated that the proposed PET technique is more effective than the popularity-based tag recommendation approaches across all evaluation criteria. The progressive expansion approach can identify tags to meet user’s needs in annotating online resources. However, this study has some limitations need to be addressed which in turns become the future research directions. First, we only adopted one database (i.e., MovieLens 20M database) to evaluate and compare the investigated techniques. More experimental datasets shall be collected from the other social bookmarking websites, such as BibSonomy, CiteULike, and Last.fm for carrying out more empirical evaluations. Second, this study employed two popularity-based recommendation approaches as the performance benchmarks. Other approaches to tag recommendation shall also be examined in the future. Finally, the experimental evaluations we conducted in this study are preliminary, and thus it requires more analyses on the effects of the proposed PET technique.