Introduction

In today’s world, recommender systems (RSs) (Bobadilla et al. , 2013) play an extremely important role in our digital life (Lu et al. 2015). Many e-commerce platforms (such as Amazon or eBay, for example), entertainment services (e.g. Netflix, Spotify), or social media (Instagram, Facebook, Twitter, etc.) incorporate recommendation into their functionalities. These systems analyse how users interact with their products and suggest ones that the users might be interested in.

A suitable context where an RS could help, which is the research problem where our proposal is framed in, is the so-called publication venue recommendation (Wang et al. 2018). The general problem is stated as follows: A scientist, who just recently wrote a scientific paper (target article), wishes to select the most appropriate venue (journal, conference or scientific eventFootnote 1) where this article could be submitted for its evaluation trying to maximise the acceptance possibilities. The author could consider different criteria for selecting a journal, among others the impact factor or position in a ranking, but one of the most important is that the topics of the target paper fit with those from the journal scope (if the target article is about RS, the journal scope should contain RS). Once this fact is confirmed, the next step would be to verify that a number of papers dealing with the common topics are found published in the journal, i.e. a topical compatibility between some published papers and the target article exists (if the target article is about personalisation, tagging and book recommendation, there should appear articles published in the journal dealing with these same topics). A final condition to verify would be that these related articles had recently been published, which would mean that, for that journal, their topics are hot ones. In this case, an RS might help the author to perform this task. Given the target article, the RS would suggests some relevant journals where it could be submitted. The use of these RSs is widely spreading, and most of the big scientific publishers are incorporating them in their web sites for supporting the general premise of this paper. Then, the recommendation process continues starting from a target article as input, and matching it with the venues profiles. As a result, a ranking of suitable journals, conferences or general scientific events is generated and presented to the writer. As we have already mentioned, a large proportion of the profile consists of (weighted) terms or keywords, i.e. the words included in the textual descriptions of items associated to a number that reflects the importance of each term in the item and/or in the entire collection of terms. Other alternatives have, however, been explored in the literature in terms of profile construction such as the inclusion of topical and temporal dimensions. Regarding such topic-based profiles and assuming that the topics covered by the documents in the collection are available, textual subprofiles could be created for each topic, incorporating the text of all the associated documents about this topic. Let us imagine a simple situation in the context of publication venue recommendation. Let us suppose that a journal has published articles about RS, IR, personalisation and applications. The journal’s profile would consequently consist of four subprofiles, each containing the texts of the articles in that research area (four different subprofiles containing articles dealing with RS, IR, personalisation and applications separately). Each journal, therefore, would be represented by several subprofiles that are more election of the best possible venue, although with some important limitations. An example is Springer Nature Journal Suggester,Footnote 2 among many others.

There are two main types of recommender systems (Adomavicius and Tuzhilin 2005): those based on collaborative filtering (CF) and those based on content-based recommendation (CBR) (Lops et al. 2019). While the first type generates suggestions based on user ratings for items they have consumed, the second offers recommendations based on the features (frequently textual descriptions) of these consumed items. In both cases, ratings or contents are stored in special structures called profiles (Gauch et al. 2007) and these represent the users’ interests. RSs exploit these profiles to generate useful suggestions for users.

Returning to our venue recommendation context, considering that the main source of information of an article is its own content, and assuming that it is difficult to find user ratings assigned to journals (beyond the obvious binary rating of having published or not), CBR seems to be the most plausible approach to tackle this problem.

As mentioned below, in CBR, as long as there is a textual description of the items, profiles are usually represented by a bag of words which is obtained by combining the texts for any item the user has shown interest in. As CBR is primarily based on information retrieval (IR), the recommendation process consists in carrying out a matching between the active user’s profile (in terms of IR, a representation of an information need, i.e. a query) and the textual representation of all the items in the collection (documents in the IR context). The degree of relevance of each item with respect to the corresponding profile is therefore computed, and a ranking of relevant items to be recommended is subsequently generated.

This approach is used for most recommendation contexts (books, films, music, etc.). There are, however, other situations where items are associated with a group of documents rather than a single text, such as expert finding (Lin et al. 2017), whereby experts on a certain subject are recommended according to a set of documents that define their expertise to any user who requires them, or our problem at hand, publication venue recommendation. In this case, the venues (items) comprise a series of articles published there (documents). In these two cases, item profiles might be built to define their specific informational context and subsequently used in the recommendation stage. The context in which the venues are also described by means of profiles compiled from the associated articles published in these venues represents the general premise of this paper. Then, the recommendation process continues starting from a target article as input, and matching it with the venues profiles. As a result, a ranking of suitable journals, conferences or general scientific events is generated and presented to the writer.

As we have already mentioned, a large proportion of the profile consists of (weighted) terms or keywords, i.e. the words included in the textual descriptions of items associated to a number that reflects the importance of each term in the item and/or in the entire collection of terms. Other alternatives have, however, been explored in the literature in terms of profile construction such as the inclusion of topical and temporal dimensions.

Regarding such topic-based profiles and assuming that the topics covered by the documents in the collection are available, textual subprofiles could be created for each topic, incorporating the text of all the associated documents about this topic. Let us imagine a simple situation in the context of publication venue recommendation. Let us suppose that a journal has published articles about RS, IR, personalisation and applications. The journal’s profile would consequently consist of four subprofiles, each containing the texts of the articles in that research area (four different subprofiles containing articles dealing with RS, IR, personalisation and applications separately). Each journal, therefore, would be represented by several subprofiles that are more topically homogeneous, as articles are not mixed in one profile (heterogeneous) but grouped according to topics of interest. This method of organisation would clearly increase the interpretability of these (sub)profiles. This topical dimension can be automatically incorporated into the profiles by mining the texts in situations in which these categories are not clearly available, such as, for example, learning topic models using Latent Dirichlet Allocation (LDA) (de Campos et al. 2021; Jelodar et al. 2019) or by applying clustering algorithms (de Campos et al. 2020).

In situations where time is a feature included in those documents that define items (such as any type of timestamp), this dimension could be incorporated in different ways into the recommendation (Campos et al. 2014). Firstly, temporal subprofiles may be built by dividing the temporal line of documents into periods, grouping them into each one of these and thereby building the corresponding subprofiles. For example, in a journal with 10 years of existence, 10 different temporal subprofiles could be constructed starting from the articles published in the same year.

The homogeneity of the profiles (in this case temporal) is also present in the item profiling process. This approach could be considered as a generalisation of the well-known profiles based on long and short-term preferences. Secondly, but unrelated to profiles, another option is to include time in the recommendation process, by means of the application of a decay factor that penalises older items.

From now on focusing in the publication venue recommendation problem, and bearing in mind these topical and temporal dimensions for building homogeneous subprofiles, a question immediately arises: would it make sense to combine both aspects in order to improve the quality of the journal recommendation in terms of system effectiveness? Since the state of the art of general CBR shows that it is suitable, two innovative methods for carrying out such an integration are presented and evaluated in this paper and these represent the main contribution of this article. Being more concrete, two combination methods are introduced: while the first one mines the topics of the whole article collection and creates journal subprofiles according to a temporal division, the second approach initially carries out a temporal division of the articles and later, for each temporal split, extracts the topics inside and builds the corresponding journal subprofiles. Our goal is, therefore, to determine whether this general mixture of time and topics is valuable in comparison with other non-hybrid alternatives and which the best option is. For this purpose, this study will address the following research questions:

  • RQ1: Does a temporal division of the scientific articles published in journals provide a reliable source for constructing high quality profiles?

  • RQ2: Can decay-based techniques which penalise older scientific articles be successfully incorporated?

  • RQ3: To what extent is building journal profiles based on latent topics in the article collection an added value for the venue recommendation problem?

  • RQ4: Is hybridisation, i.e. the combination of topical and temporal aspects for creating journal profiles, a good alternative for the problem at hand?

  • RQ5: Which method is the best form of hybridisation for the venue recommendation problem?

In order to find an answer to these questions, we have designed a detailed experimentation where the performance of the different models is tested. Although the answers drawn after analysing the obtained results are specific to the publication venue recommendation problem, and therefore not totally generalisable to other content-based problems, the methodology described in this paper, i.e. the profiling proposals, may be applied to many types of item collections represented by text in the context of CBR.

The remainder of this paper is organised as follows: Sect. 2 discusses other related work; Sect. 3 introduces the different methods for creating (sub)profiles based on terms, topics and time, and for combining them; Sect. 4 focuses on the experimental part of the work, including results and discussion; and, finally, Sect. 5 details our conclusions and outlines our future lines of research.

Related work

Starting with the simplest form of representing items or user profiles, the method usually adopted is to compile a list of weighted terms which are automatically extracted from the document associated with them (Gauch et al. 2007). These terms are supposed to correctly represent the document subjects and the weights are responsible for measuring their importance in terms of the entire document collection and within each document. Some examples of the use of term-based profiles for recommendation reflect how widely they are used in CBR, and these include TV programme recommendation (Wartena et al. 2011), expert finding (de Campos et al. 2020), treatments for patients in a health RS (Bateja et al. 2018), tweet recommendation (Benzarti and Faiz 2016), and image recommendation (Karlsen et al. 2018).

An alternative way of representing profiles to keywords is through the use of tags, concepts, categories or topics. We could say that these are higher-level features, named in a different way, are synonym terms and symbolise topics, trying to capture the underlying semantics of the items. Since the profiles comprise more general concepts rather than just words, certain authors believe that this is beneficial for the quality of the recommendation (Firan et al. 2007), at least in the case of tags. There are a number of examples of experiences that build and recommend based on tag profiles (Bogers 2018; Aliannejadi and Crestani 2018; Stakhiyevich and Huang 2019; Yan et al. 2020; Becerra et al. 2017). Tags are assigned to user profiles either manually or on the basis of a machine learning-based approach. The use of concepts is discussed in a number of papers (Ren et al. 2015; Narducci et al. 2016; Sharma et al. 2017; Simsek and Karagoz 2020). In most cases, the concepts are extracted from ontologies or concept graphs giving the information associated to items or users. With respect to topic-based profiles (Saraswatm et al. 2020), these comprise latent topics mined directly from document collections, typically using the LDA algorithm or extensions of it. Starting with user-associated texts, the profile is fed with the most probable topics associated to the words contained in them. Certain papers illustrate the use of topic-based profiles to CBR in a wide variety of problems (Chen et al. 2017; Huang and Wu 2019; de Campos et al. 2021; Fu et al. 2021; Khan et al. 2021). Although the underlying representation based on topics is very appropriate for representing profiles, it does, in fact, lack the interpretability offered by terms, tags or concepts. A translation from topics to human-understandable labels is needed and this requires an additional effort.

All these previous item or user profiles are monolithic in the sense that all the possible facets of interests are combined into a single profile. Another alternative is to consider profiles as comprising different subprofiles, each associated to a possible facet, thereby capturing the various underlying, non-explicit topics which are usually extracted by machine learning algorithms from the associated texts of users and items. These multi-faceted profiles are no longer flat although they may have different shapes: trees, representing personal data, expertise and interests (Pavan and Luca 2015); graphs of clusters capturing different facets from different sources (Zeng et al. 2002); two subprofiles to capture user interests and friends’ interests (Gulla et al. 2014); subprofiles comprising subsets of items rated by the user and which are used to improve the diversity of the recommendations (Kaya and Bridge 2019); different types of subprofiles, each containing keywords, concepts and tags (Narducci et al. 2013), or hierarchies of weighted topics (Kook 2005). Clustering is the usual technique for creating such multi-faceted profiles. This unsupervised learning is applied to the document collection resulting in clusters of documents or keywords, which will integrate the profiles as subprofiles. Each cluster would represent a concept in the entire collection. There are a number of papers which cover this methodology (Somlo et al. 2001; McGowan et al. 2002; Yeung et al. 2009; Amini et al. 2014; de Campos et al. 2020; de Campos et al. 2021).

In the research presented in this paper, topic-based profiles will comprise subprofiles which represent different concepts but rather than containing a list of topics, they contain terms, i.e. those from the documents associated with the topics.

Much has been published on temporal dynamics, i.e. the inclusion of time in recommendation, and this has mainly focused on CF (Campos et al. 2014). One of the most common approaches is to use decay functions to penalise old items and reward new ones (Ding and Li 2005; Yeniterzi and Callan 2015). A second alternative is to include time in the computation of item weights (Linda and Bharadwaj 2019). Another possibility is to integrate time into the rating matrix in CF and use it to find trusted relationships between users (Ngaffo et al. 2021). Another research line is to consider time frames: in the article (Ramos and Paraboni 2014), the authors propose a CBR system for tweets, where a specific time frame is learned for each user and only tweets within this personalised frame are recommended. The same idea has been used by other authors (Si et al. 2017) but for points-of-interest recommendation. One generalisation is the use of long and short-term profiles as another option to include time and take into account the users’ most recent interests in contrast to those which were acquired by interacting with the system some time ago (Li et al. 2011; Xiang et al. 2010). The time domain is included in our models simply by splitting the documents into time periods of equal size rather than using long and short-term profiles. Within each time period, the topic subprofiles are learnt.

This review of related work will also examine the combination of topicality and temporality in profiles by taking advantage of both dimensions in order to improve recommendation. This combination is performed by following a wide range of methods which are outlined in the papers mentioned, but the most usual way is to apply a latent topic discovery algorithm to the available text collections, obtain the topics associated to each document and incorporate time by means of weights associated to topics. Other authors use decay functions (Wangwatcharakul and Wongthanavasu 2021) to mitigate the impact of old ratings. They also use item reviews in order to obtain the underlying topics in the collection and associate the rated items to the corresponding topics in the reviews in order to track how the topics evolve with time. Finally, they propose an optimisation method to make predictions. In the article (Li et al. 2014) about news recommendation, the authors build long and short-term profiles. While long-term profiles comprise latent topics extracted from LDA from a collection of news and weighted by considering a time decay function to capture how they evolve in time, short-term profiles comprise topics occurring in documents in the most recent period of time. In the paper (Yin et al. 2015), their authors describe a method for the context of social media to combine interests and temporal context. It is based on mixing a latent class statistical mixture model to represent topic distributions not only from users’ interests (user-oriented topics) but also from a temporal context (time-oriented topics). They also compute the distribution of topics for items. With all of this information, they are able to model different users’ interests in different time periods. In Liu (2015), it is considered the interaction of each user u with each item i in a given time. LDA is applied to extract topics from the set of textual representations of all these interactions, which are represented by a topic distribution. For a given user, once all of their interactions have been sorted chronologically, the assigned topics are modelled as a time series. Recommendation takes place when a Gaussian process predicts the value of each topic at a given time and similarities are computed between the predicted topic distribution and the distribution associated with each user. In another article (Neshati et al. 2017) about community question answering, the authors introduce a method for future expert finding which suggests the most suitable experts for the future. In order to do so, they first apply LDA in order to extract topics from documents associated to experts, and their corresponding timestamps, and calculate the probability of a future expert candidate for a given query. In the context of social media (Nishioka and Scherp 2016), users’ interests are extracted from social media streams. Profiles are then built using weighted topics, which are those obtained by applying LDA to the collection. Items are also indexed using concepts and matched to user profiles. Recommendation is carried out by computing a similarity between user profiles and item profiles. Publication times are also taken into account by means of decay functions, which penalise the older topics and are included in the topic weights. In Zeng et al. (2018), it is built temporal user profiles by directly incorporating time into the LDA algorithm, thereby obtaining topic distributions for words and times, as in the case of timeSVD++ Koren (2010). The last two papers on expert finding use concepts rather than topics from LDA. In Rybak et al. (2014), the profiles consist of weighted concepts, where the weights represent the degree of expertise in each concept. The temporal expertise profile is a set of single profiles which are computed at different periods of time, and a decay function is incorporated into the calculation of the concept weights. In Ziaimatin et al. (2012), while short-term profiles are built by extracting and weighting concepts from an ontology over given time periods, long-term profiles are built by detecting the concepts which are uniformly distributed in the short-term profiles.

The way in which temporality and topicality are combined in this paper is a contribution to the state of the art. In most cases where LDA is used, it has been applied globally to the entire document collection. In this research, LDA is applied locally only to those documents belonging to the same period of time when both dimensions are combined.

Finally, and concerning the specific application field where we are focused, namely content-based publication venue recommendation, although the information within the profiles may vary (terms, noun-phrases, n-grams or topics), always each venue has either a single profile (Medvet et al. 2014; Silva et al. 2015; Yang and Davison 2012) or as many subprofiles as published articles (Kang et al. 2015; Rollins et al. 2017; Pradhan and Pal 2020; Errami et al. 2007). Different topically homogeneous subprofiles are only considered in de Campos et al. (2022). Moreover, there are hardly any works that explicitly use temporal information: in Alhoori and Furuta (2017), in the context of a collaborative filtering algorithm to recommend venues, a personal venue rating which considers the years when the articles published in a venue were added to a researcher personal collection is proposed. In Pradhan and Pal (2020), a similarity between venues which penalises older articles using a decay function (inverse log-weighting) is computed; this similarity in turn is used by a random walk with restart algorithm in a graph of venues. We have not found any work about publication venue recommendation combining topical and temporal information.

Alternatives to profile construction based on terms, topics and time

Considering a researcher who would be interested in knowing possible journals where they could publish a recently written paper, and the fact that journals are going to be modelled by profiles built from the articles published in them, as the building blocks of the recommendation process, in this section the different alternatives for building such profiles, based on terms, topics and time, are introduced. Although always with the publication venue recommendation problem in mind, these methods will be presented in a more formal way in order to understand how they could be generalised to other CBR contexts (as for example Expert finding).

Term-based profiles

Formally, let \(I=\{i_1,\ldots ,i_r\}\) be the set of items to be recommended. Linked with each of these, \(i\in I\), is a set of \(n_i\) text documents \(D^i=\{d^i_1,d^i_2,\ldots ,d^i_{n_i}\}\). In the specific problem of publication venue recommendation, I would be the set of available venues and \(D^i\) the corresponding set of articles published in each of them.

Each item will also be represented by a profile that contains in one way or another the content of its related documents (the terms appearing in them). These profiles can basically be organised in one of two ways:

  • Monolithic profiles: where all the documents linked to each item i are concatenated to create a single document, \(d^i=\cup _{j=1}^{n_i} d^i_j\). This macro document will act as a unique profile \(p_{Mono}^i\) for item i, i.e. \(p_{{Mono}}^i=\{d^i\}\).

  • Atomic subprofiles: where for item i, its profile will comprise as many subprofiles as documents attached to it but in an isolated, unconcatenated way, i.e. \(p_{Atom}^i=\{d^i_1,d^i_2,\ldots ,d^i_{n_i}\}\) (each document is treated as a subprofile in itself).

The collection of items is then represented by a set of profiles, \({\mathcal {P}} = \{p^{i_1},\ldots ,p^{i_r}\}\), which will serve as retrieval units in this context of CBR systems. A graphical representation of this profile construction process is shown in Fig. .

Fig. 1
figure 1

Building monolithic and atomic (sub)profiles

These two basic ways of classifying profiles as either monolithic or atomic correspond in the expert finding literature with the so-called profile-based methods and document-based methods, respectively (Balog et al. 2012). There is no general agreement about which method is preferable, and although document-based methods tend to be considered better than profile-based methods, profile-based methods perform better in certain cases (de Campos et al. 2015; Liu et al. 2005).

As a main conceptual drawback of both monolithic and atomic profiles is that they do not consider any valuable information provided by the articles in terms of the different general concepts that they deal with. In that sense, the monolithic profiles present an extreme compaction that makes them be totally heterogeneous (they integrate all the concepts in one single structure). On the other hand, the atomic subprofiles are in the complete opposite place, showing a radical but homogeneous decomposition.

We believe that there is room for improvement between these two extreme ways of building (sub)profiles and that we can create less extreme ways of organising the information related to each item. We will therefore use both approaches as baselines in our experiments, as they are the most basic ways of representing (sub)profiles and, as shown, are frequently found in the literature.

Topical profiles

One first alternative for organising an item’s subprofiles (out of the two basic organisational schemes presented in the previous section) would be to build more homogeneous subprofiles around the different concepts or topics which can be identified in the entire collection of text documents associated to the items. The construction of subprofiles from a topical perspective can be based on a partition of the document collection by means of a clustering algorithm which uses the documents’ terms as features. This would identify the different clusters of documents according to their subjects, placing all the conceptually-related documents in the same cluster. For an item i, each subprofile will correspond to the concatenation of the documents associated to i which are assigned to the same cluster, and this results in a set of topically homogeneous subprofiles. It is apparent that while this clustering process is global in that it is carried out with the entire document collection (not with the documents associated to an item), the subprofile construction is local, as the subprofiles only contain the text of documents associated to the item i.

Although there are many ways of performing this clustering process, in this paper we will use LDA (Latent Dirichlet Allocation) (Blei et al. 2003) for this purpose. LDA finds k latent topics, \(x_1,x_2,\ldots ,x_k\), in a document collection,Footnote 3 where each topic \(x_l\) is characterised by a conditional probability distribution of terms, \(p(t \mid x_l)\), and determines for each document d a probability distribution of topics, \(p(x_l \mid d)\). For example, in the context of recommending Machine Learning journals, we could highlight the fact that the articles published in all the journals deal with five different topics (to clarify the example, these will be called clustering, classification, regression, association, and feature selection). One article might deal only with classification (100%), a second one might mainly be about feature selection (70%) but may also discuss classification as a secondary topic (30%), a third one might mostly cover regression (90%) but also briefly touch on feature selection (10%), and a fourth one might cover all the topics equally (20% each), as it could be an introduction to Machine Learning.

The clustering generated by LDA obtains k clusters, one for each topic \(x_l\), where each document is assigned to its most probable topic. For each item i, there are therefore at mostFootnote 4k subprofiles, each containing the concatenated text of those documents associated to i on the main topic of \(x_l\). Figure illustrates the entire process for generating the topical subprofiles.

Fig. 2
figure 2

Building topical profiles

In terms of a more formal description, given the set of all the documents \({\mathcal {D}}=\cup _{i=1}^r D^{i}\), each cluster, \({\mathcal {D}}_l\), \(l=1,\ldots ,k\), consists of the documents of the items which are associated to the l-th topic, \(x_l\) (documents where the most probable topic is \(x_l\)), that is to say:

$$\begin{aligned} {\mathcal {D}}_l = \{d^{i_j}_m \,\mid \, x_l=\arg \max _{s=1,\ldots ,k} p(x_s \mid d^{i_j}_m), \; j=1,\ldots ,r, \, m=1,\ldots ,n_{i_j}\} \end{aligned}$$
(1)

From these sets of documents relating to each of the k topics, the subprofiles of each item i must be constructed by grouping the documents in each global cluster that are associated to this item i, thereby obtaining a local cluster, \(D^i_l={\mathcal {D}}_l\cap D^i\). Each item will therefore have assigned as many subprofiles as local clusters have been generated for it. The subprofiles of item i are then documents, \(d^{i,l}\), built by concatenating the documents within each local cluster \(D^i_l\), \(d^{i,l}=\cup _{d^i_j\in D^i_l} d^i_j\). In this case, the topical profile for an item i is \(p_{Top}^i=\{d^{i,1}, d^{i,2},\ldots ,d^{i,k}\}\).

Following with the example of recommending Machine Learning journals introduced some paragraphs above, all the articles published in all the journals would serve as input for LDA, as well as a value of k (let us say 5 for the sake of continuity with the example). The output would be the assignation of the most probable topic to each article. Let us assume that one hypothetical journal entitled “Machine Learning Trends”, denoted as journal \(i_3\), contains six articles with the following assignations to their most probable topic: \(d^3_1 \rightarrow x_3\), \(d^3_2\rightarrow x_2\), \(d^3_3\rightarrow x_5\), \(d^3_4\rightarrow x_2\), \(d^3_5\rightarrow x_3\), \(d^3_6\rightarrow x_3\). Then, 3 different subprofiles would be built for this journal according to the distribution of its articles in 3 topics: \(D^3_2=\{d^3_2,d^3_4\}\), \(D^3_3=\{d^3_1,d^3_5,d^3_6\}\) and \(D^3_5 = \{d^3_3\}\), conforming in this way the profile \(p^3_{Top}\) of journal \(i_3\).

With this method of creating venue subprofiles based on the underlying topics mined from the collection, the subprofiles are better structured, distinguishing between the intra-homogeneity and extra-heterogeneity properties. Each single subprofile presents a very homogeneous content as the documents included in it are all related among them because they deal with the same topic. So in the previous example, as all articles \(d^3_1\), \(d^3_5\) and \(d^3_6\) discuss, one way or another, about the topic \(x_3\), they conform a homogeneous subprofile \(d^{3,3}\). The other two, concerned to different topics, are the base for the other two subprofiles. The three of them deal with different topics and are heterogeneous among them. This division originates a journal may be recommended because a single matching between one of its subprofiles and the target article, avoiding the potential noise of being all of the articles mixed in the same profile, as happening in the monolithic profiles.

Temporal profiles

Another alternative for organising item subprofiles assumes that all the documents relating to any item have an assigned date (e.g. a publication date). We could then sort them accordingly and establish certain temporal divisions to exploit the temporal dimension. From each period, homogeneous subprofiles (in a temporal sense) could be built to represent an item by grouping together all the documents associated to that item which belong to the same period.

Continuing with the previous example of scientific venue recommendation of Sect. 3.2, imagine the situation of a journal about machine learning with an extensive record in the publication of articles. As expected, there are times when there are a large number of papers on the same topic. After a while, new research topics appear and as these are published, they displace and possibly replace some of the existing topics. Initially, the majority of work published in journals of this kind dealt with neural networks but after a few years, researchers became more interested in support vector machines, and nowadays the focus has shifted towards deep learning and this is currently one of the most published topics. Splitting the time line into different intervals and building subprofiles for the corresponding journal in each period is an alternative and simple way of endorsing a temporal perspective to subprofiles and reflects how the focus of the papers has changed over time in the journal.

In order to formalise this idea, let us consider the \(h+1\) time points \(t_0<t_1<\ldots <t_h\) and the h temporal intervals \([t_{u-1},t_u),\, u=1,\ldots ,h\). If date(d) is a function that returns the date of document d, then the h global temporal clusters \({\mathcal {T}}_u\) are defined as follows:

$$\begin{aligned} {\mathcal {T}}_u = \{d^{i_j}_m \,\mid \, t_{u-1}\le date(d^{i_j}_m)<t_u, \; j=1,\ldots ,r, \, m=1,\ldots ,n_{i_j}\}. \end{aligned}$$
(2)

The local temporal clusters for each item i are built as in the case of topical clusters by grouping the documents associated to i that belong to each global cluster \({\mathcal {T}}_u\), i.e. \(T^i_u={\mathcal {T}}_u\cap D^i\). Each subprofile for item i concatenates the documents in \(T^i_u\) to form a single document \(d^{i;u}=\cup _{d^i_j\in T^i_u} d^i_j\). Each item, therefore, is now represented by at most h temporal subprofiles.Footnote 5 The temporal profile for item i in this case is \(p_{Temp}^i=\{d^{i;1}, d^{i;2},\ldots ,d^{i;h}\}\).

Let us assume that our hypothetical journal entitled “Machine Learning Trends”, \(i_3\), contains six articles with the following publication dates: \(d^3_1 \rightarrow 2015\), \(d^3_2\rightarrow 2016\), \(d^3_3\rightarrow 2017\), \(d^3_4\rightarrow 2020\), \(d^3_5\rightarrow 2022\), \(d^3_6\rightarrow 2022\). Then four temporal subprofiles, corresponding to the four biannual intervals [2015, 2017), [2017, 2019), [2019, 2021), [2021, 2023), could be constructed grouping documents by publication year: \(T^3_1=\{d^3_1,d^3_2\}\), \(T^3_2=\{d^3_3\}\), \(T^3_3 = \{d^3_4\}\), \(T^3_4=\{d^3_5,d^3_6\}\), respectively, conforming in this way the four subprofiles of the temporal profile, \(p^3_{Temp}\) of journal \(i_3\).

One advantage of this approach is that, in certain cases, articles dealing with the same underlying topics would tend to appear in the same temporal subprofiles, reflecting the natural evolution of topics along the time. Another important advantage is that these subprofiles could be weighted in order to give more weight to those closer to the current time, thus giving more importance to journals which recently published about the topic of the target article.

Following the argument of this second benefit, a temporal decay might also be considered to reduce the influence of older articles or subprofiles. This is a complementary way of introducing the temporal dimension into this journal recommendation problem. The underlying idea is that a user would be more interested in journals where their last published articles are closer to the paper to be published rather than “older” articles. In order to carry out this idea, once a query has been submitted (the text of the article) and the score or Relevance Status Value (RSV) for each article or subprofile has been computed, a decay function (Larrain et al. 2015) is applied in order to modify the corresponding RSV according to the temporal distance to the year of the “newest” papers in the article collection.

Hybrid profiles: combining topical and temporal profiles

The next natural step is to combine both previous ways of constructing profiles by simultaneously exploiting the topical and temporal dimensions in order to obtain homogeneous subprofiles in terms of these two properties. This homogeneity could be reached in two different ways:

  • By first discovering the underlying global topics within the whole collection and creating their corresponding clusters (topical division), as explained in Sect. 3.2, and secondly by splitting them into temporal units (temporal division), from which the final subprofiles will be constructed. This approach is then a topical-temporal one.

  • By first splitting the collection into temporal units (temporal division), secondly by discovering the underlying topics within each single temporal partition (locally), and finally by building the final subprofiles from them. In this case, this is a temporal-topical approach. It should be noted that in each temporal division, the discovered topics would be different.

More specifically, for the topical–temporal approach, let \({\mathcal {D}}_l\) be defined as in Eq. (1), \(l=1,\ldots ,k\), and \({\mathcal {T}}_u\) be defined as in Eq. (2), \(u=1,\ldots ,h\). The global topical-temporal clusters \({\mathcal{D}\mathcal{T}}_{lu}\) are then defined as

$$\begin{aligned} {\mathcal{D}\mathcal{T}}_{lu} = {\mathcal {D}}_l \cap {\mathcal {T}}_u, \; l=1,\ldots ,k, \; u=1,\ldots ,h. \end{aligned}$$
(3)

Given an item i, its local clusters are obtained by joining the documents found in each global topical-temporal cluster which are associated to i: \(DT^i_{lu}={\mathcal{D}\mathcal{T}}_{lu} \cap D^i\). As in the previous cases, the documents within each local cluster are concatenated to form the corresponding subprofiles, \(d^{i,l;u}=\cup _{d^i_j\in DT^i_{lu}} d^i_j\). The topical-temporal profile for item i is \(p^i_{TopTemp}=\{d^{i,1;1},\ldots ,d^{i,k;h}\}\).

Following with the example presented in the previous sections about the journal recommendation problem, all the articles of all the journals would serve as input to an LDA algorithm, which would assign the articles up to 5 topics. Then inside of each cluster, temporal partitions would be applied in order to create subprofiles considering periods of time. For the journal \(i_3\), whose articles were grouped in only three topics (\(x_2\), \(x_3\) and \(x_5\)) and four biannual intervals, we would obtain only five (out of twelve possible) topical-temporal subprofiles, namely \(DT^3_{2\,1}=\{d^3_2\}\), \(DT^3_{2\,3}=\{d^3_4\}\), \(DT^3_{3\,1}=\{d^3_1\}\), \(DT^3_{3\,4}=\{d^3_5,d^3_6\}\) and \(DT^3_{5\,2}=\{d^3_3\}\). Note that there could be combinations of temporal periods and topics which might be empty, as in this example.

The main feature of this approach is that LDA is globally applied to the whole collection of articles, reflecting the global topics present at the journals belonging to the collection. The division of subprofiles in time periods will cause to place articles in their corresponding publication slot. This means that a topic could be placed in specific time divisions, being a more fine-grained approach. If a topic is just mentioned within a journal in, let us say, only three articles, all from the same period, they will be allocated in the same subprofile, thus giving a more precise information about when this topic has been treated in the journal.

In the case of the temporal-topical approach, starting from each \({\mathcal {T}}_u\) as defined in Eq. (2), \(u=1,\ldots ,h\), as the document collection, we use LDA to obtain the k topics \(x_{u1},\ldots ,x_{uk}\) corresponding to this collection.Footnote 6 We then proceed in the same way as with the topical profiles, i.e. we obtain the clusters \({\mathcal{T}\mathcal{D}}_{ul}\) as the subset of documents of \({\mathcal {T}}_u\) where the most probable topic is \(x_{ul}\):

$$\begin{aligned} {\mathcal{T}\mathcal{D}}_{ul} = \{d^{i_j}_m\in {\mathcal {T}}_u \,\mid \, x_{ul}=\arg \max _{s=1,\ldots ,k} p(x_{us} \mid d^{i_j}_m), \; j=1,\ldots ,r, \, m=1,\ldots ,n_{i_j}\}, \nonumber \\ \; l=1,\ldots ,k, \; u=1,\ldots ,h.\nonumber \\ \end{aligned}$$
(4)

We then obtain the clusters associated to each item i as \(TD^i_{ul}={\mathcal{T}\mathcal{D}}_{ul} \cap D^i\). Finally, the documents in \(TD^i_{ul}\) are concatenated to build the subprofiles, \(d^{i;u,l}=\cup _{d^i_j\in TD^i_{ul}} d^i_j\). The temporal-topical profile for item i is \(p^i_{TempTop}=\{d^{i;1,1},\ldots ,d^{i;h,k}\}\).

The difference with the previous topical-temporal method is that the clustering is local to the articles published in each specific period of time, reflecting better topic distributions of the articles in that period. This could be a more precise approach, but at the same time, each LDA is run with a lower number of articles and this might affect the quality of the mined topics.

It is worth noting that in both cases in this process, each item i will have associated at most \(k*h\) subprofiles because there might not be any document associated to i which deals with a specific topic at a given time period. Also we would like to remark that the temporal decay is suitable to be applied in both topical-temporal and temporal-topical profiling methods.

Using the subprofiles for recommendation

Once we have presented the different types of (sub)profilesFootnote 7 relating to each journal, then a mechanism has to be designed in order to carry out the journal recommendations. For that purpose, we consider that IR-based techniques are the most appropriate ones to perform this content-based recommendation task.

The first step is to index the (sub)profiles,Footnote 8 so all of them are correctly arranged with the aim of an efficient posterior access. This task will be performed by an IR system (IRS).

The second step is the recommendation itself. Then, given an active user, the objective is to obtain a ranking of relevant journals for her target article. This might be achieved by means of an IR model implemented in an IRS, which will be responsible for retrieving relevant journals in respect of the user’s query. More specifically, the textual content of a paper to be published by the user represents a query submitted to the IRS. Then by means of the similarity measure in which the IR is based on, a score of similarity between the target article and each subprofile is computed and a ranking of journal (sub)profiles is finally generated (sorted decreasingly according to their relevance to the query).

As mentioned in the previous paragraph, the ranking consists of subprofiles. However, since the active user requires journal recommendations, the subprofile ranking should then be transformed into a journal ranking. For this purpose, a final fusion process must combine the scores of the subprofiles for each journal and generate a final journal ranking. This is performed by means of the application of fusions methods (de Campos et al. 2017), which aggregate the scores of all of the journal subprofiles, computing a single relevance score for each journal. This fusion is not necessary for monolithic profiles since in this case there is only one profile per journal.

Evaluation and results

In this section, we shall detail everything relating to the evaluation of the previously explained alternatives for item profiling and also the results obtained.

Reminding the problem at hand, publication venue recommendation, given a target article (or at least an abstract, title and keywords), the problem is to recommend to the active user the most suitable venue for publishing such a paper on account of the suitability of the scope of the journal.

The following sections will present all the details of the experimental design and also the results of the experiments.

Test collection

The test collection used in the experimentation is called PMSC-UGR (Albusac et al. 2018) and has been created by the authors from PubMed and Scopus. It originally contained 762,508 articles from 12,396 journals in the biomedical domain, with a title, abstract, keywords, citations and authors for each paper. Out of all the authors from this selection of papers, those who were unequivocally represented by their corresponding ORCID codes were finally selected, leaving a total of 20, 406 authors.

Implementation details of the recommendation model

As mentioned in Sect. 3.5, an IRS is in charge of indexing and retrieving journal (sub)profiles. Therefore, for the first task, we have developed an indexing programme based on the Lucene library.Footnote 9 Previously, this piece of software removes stop words and performs stemming, indexing only the resulting stems. With respect to the similarity measure required in the second task, the Lucene implementation of the Language Model with Jelinek-Mercer smoothing has been used.

The task of fusing the rankings of subprofiles of a same journal has been implemented by means of the CombLgDCS fusion method (de Campos et al. 2017), which aggregates the scores of all of the journal subprofiles, decreasing them proportionally to the logarithm of their positions in the ranking.

In order to discover the latent topics required for topical, temporal-topical and topical-temporal models, and obtain topically homogeneous subprofiles, it is required to implement an LDA algorithm. In our case, the chosen Python implementation of LDA is the given by the Gensim library, with the default values for hyper-parameters.

During this process, terms that appear in fewer than 750 documents are ignored and also those that appear in more than \(90\%\). This decision was made in order to remove very common (almost considered stopwords) and very rare terms, which, if used as input for LDA, could distort the learned topics and also increase the complexity. Moreover, only a maximum of 5000 of the remaining most frequent terms in the corpus have been considered as LDA input. These figures correspond to previous experiments to set the most suitable size of the vocabulary. Nevertheless, the subprofiles built with this technique contain all the terms from the original documents.Footnote 10

Experimental design

For evaluation purposes, we have restricted the PMSC-UGR collection to those papers published between 2007 and 2016 which appear in journals with more than 100 papers in this period,Footnote 11 leaving the dataset with a total of 1002 journals. The collection has then been split into two partitions: the first comprises articles dating from 2007 to 2015 (a total of 276,679 papers), and this will be reserved for building the (sub)profiles and serve as the training set; and the second only contains articles from 2016 (32,864 articles) and this will be used as the test set. This holdout method is suitable for this evaluation and does not require cross-validation to obtain reliable results given the large number of articles in the test set. Each article from the test partition will be considered as a query to be submitted to the underlying IRS and this query will comprise the combined text of its title, abstract and keywords.

Starting from the training partition, the following types of (sub)profiles will be considered for our experiments, pointing out which are the baselines:

  • Monolithic profiles (Mono) (Baseline)

  • Atomic subprofiles (Atomic) (Baseline)

  • Topical subprofiles (Top)

  • Temporal subprofiles (Temp): These are built from four temporal partitions of two years each (2007–2008, 2009–2010, 2011–2012 and 2013–2014) and one of only one year (2015), i.e. \(h=5\).

  • Topical + Temporal (TopTemp): After applying a global LDA to the entire article collection, subprofiles are built in the five temporal partitions.

  • Temporal + Topical (TempTop): Subprofiles are built from the five temporal partitions after applying local LDA to the article collection in each partition.

In addition, 10 instances of randomised subprofiles have been created (\(Random1, \dots , Random10\)). Each comprises 5 random partitions of the articles from the training set, thereby replacing the 5 temporal subprofiles with 5 random subprofiles. The retrieval effectiveness of these data sets will also be measured in order to test whether the temporal divisions differ from randomness.

For the LDA algorithm on which the Top, TopTemp and TempTop approaches are based, it is necessary to set up the k parameter, i.e. the number of latent topics to be discovered. This is not an easy task because the quality of the results could be very highly depend on this value. In this research, we have considered three different values, two of which are related to medical categories or specialities:

  • number of comprehensive medical specialities extracted from the Medical School blog at St George’s University,Footnote 12\(k=20\)

  • number of second-level categories of the MESH thesaurus,Footnote 13\(k=110\)

  • \(k=400\), in order to test a very large number of topics

It is apparent that these three values attempt to cover a wide range of topics, from a low to a relatively large number of them, in order to evaluate the performance of the different types of profiles according to the number of topics discovered by the LDA algorithm. The underlying idea is also to choose meaningful values of k relating to medical categories. In this case, and in order to not increase the complexity of the study, we have made the decision of using the same k values for all the different subprofile approaches.

Regarding the decay functions introduced in Sect. 3.3 that might be applied for the truly temporal approaches, i.e. Temp, TopTemp and TempTop, and for Atomic, which also supports a temporal treatment, and among the wide range of decay functions found in the literature, we have implemented two of them to be tested:

  • linear: \(RSV_{LinearD} = \frac{NRSV}{1 + Penalty}\)

  • doubled squared rootFootnote 14: \(RSV_{2SqrtD} = \frac{NRSV}{\sqrt{\sqrt{1 + Penalty}}}\)

where NRSV is the normalised RSV, computed by dividing the corresponding RSV by the maximum of the ranking and Penalty = 2015-Publication year, where 2015 is the year of the “newest” papers in the article collection. For individual articles in the atomic subprofiles, their publication year is used directly to compute the new score, whereas for the remaining temporal subprofiles, their average value is incorporated into the decay formulas since they contain articles over two years. A third value of the decay parameter would be "None", which means that no temporal penalisation is considered in the RSV.

These three options try to study the behaviour of decay functions in three extreme situations: a rather smooth penalisation of older articles (2Sqrt), a strong penalisation (Linear) and no penalisation at all (None). We have not tried to find the best decay functions, although some other functions were tested (the results were more or less similar and we will not include them).

Evaluation measures

In order to evaluate the recommendation, and before presenting the evaluation measures, it is important to determine the ground truth: in this case, only one journal is relevant for each query (test target article) and this is the journal where the paper has actually been published. This is a very objective criterion, although it is also very conservative, as probably other recommended journals could also be appropriate for publishing the test target article.

The following evaluation measures found in the literature to determine the quality of the results obtained by venue recommendation methods are the most common:

  • Recall@X (R@X): This measures the ability of recommending the relevant journal where the test target article has been published in the first X journals in the ranking (recommended venues). In other words, we compute the average number of times where the actual venue where a test paper was published is among the first X recommended venues. In previous work on venue recommendation (e.g. Luong et al. 2012; Medvet et al. 2014; Wang et al. 2018), this measure is also called accuracy@X. Two thresholds X are considered to compute the values of this measure: 1 and 5. \(X=1\) is considered because the number of relevant journals is 1 and so we would like to know how successful it would be to recommend only one journal. Since the user does not usually obtain only one recommendation but a number of alternative journals, \(X=5\) is also used as the threshold.

  • Mean Reciprocal Rank (MRR@Y): In this case, the idea is to reflect how high in the ranking the only relevant journal is recommended. It therefore computes the average of the inverse of the positions in the ranking for the journal where each test paper was published. The total number of results (journals) in the ranking is limited to Y.Footnote 15 We have only considered the top 40 positions in the ranking, i.e. \(Y=40\). In our specific case where only an item (a journal) is relevant, this measure coincides with mean Average Precision, MAP@Y.

Results

Table 1 Results of the experiments (the best results for each column are shown in bold)

The results of our experiments are displayed in Table . Although the rankings of methods obtained for the different performance metrics are not identical, the trends are the same. In fact, if we calculate the Pearson correlation between these rankings, we always obtain correlation coefficients which are greater than 0.86 (if we exclude the results obtained by the random subprofiles and also those which use linear decay, which are both quite poor, then all the correlation coefficients are greater than 0.98). We can, therefore, comment on our results without referring to any specific metric.

If we first focus on RQ1, we can conclude that the use of the temporal dimension alone (Temp subprofiles, without decay) only very slightly improves the results obtained by the Mono baseline. This implies that the impact of dividing the monolithic profiles into several parts which are only based on temporal criteria is limited. In addition, the Atomic approach performs almost the same as Temp. Splitting the article collection into years and building the journal subprofiles upon them does not, therefore, offer any clear advantage. The random subprofiles, Randomi (with the same number of subprofiles as Temp), perform almost identically to Mono and worse than Temp. This implies that the temporal dimension has a slightly positive effect on the results which is not attributable merely to the fact of creating several subprofiles.

When a decay factor is also used, very poor results are obtained for Temp (even worse than the baselines) with the linear version and much better results with the doubled squared root version (2Sqrt) (which is smoother than linear). Consequently, the performance of Temp clearly depends on the aid that decay functions can provide. This behaviour of the three decay methods (Linear < None < 2Sqrt) also persists when the temporal dimension is combined with the topical dimensionFootnote 16 or when it is applied to the other baseline, Atomic, so that the preferred version of decay is always 2Sqrt. Temp is also better than Atomic (using 2Sqrt decay). By way of conclusion, a well-designed decay function incorporated into the journal recommendation process can boost performance, and this therefore answers RQ2.

In terms of RQ3, the use of topical subprofiles (Top) clearly improves the results of the baselines Mono and Atomic, regardless of the number of topics selected. Since there are no very important differences between the results of Top with a different number of topics, this parameter does not seem critical for good behaviour. Top is also better than Temp without decay and has a similar performance to Temp with 2Sqrt. However, the contributions of Top and Temp to improved performance seem to be based on different premises, so that their combination could generate a kind of synergy. This is indeed the case but it depends on the way the topical and temporal subprofiles are combined. When TopTemp subprofiles are used, i.e. first a topical division and then the temporal division (and using the same topics in every time period), we do not observe any clear improvement in the results in terms of using only either Top or Temp (and only a minuscule improvement is obtained). However, when we use the other proposed combination of TempTop subprofiles, i.e. first a temporal division and then the topical division (with the topics being specific for each time period), a clear improvement is apparent. Moreover, the TempTop results are always the best ones for all the metrics, regardless of the number of topics being considered. We think that this is due to the fact that topic identification, and the subsequent construction of subprofiles, is tailored to the set of articles included in each temporal partition, which is more precise and totally adapted to the content of such articles. Additionally, the TopTemp approach is more general and not so well fitted to the texts in each temporal partition. Hybridising topical and temporal profiles is, therefore, a very interesting approach but only if temporal divisions are made and journal subprofiles built based on topic discovery in each time-based partition, which answers RQ4. The absolutely best results for the problem of journal recommendation are obtained using TempTop subprofiles with 110 topics and 2Sqrt decay (RQ5).

Finally, and in order to try to verify these conclusions, a statistical significance test has been applied for the measure R@1. More specifically, the McNemar test (McNemar 1947) was selected, which is a non-parametric test for paired data, as recommended in Dietterich (1998) for comparing machine learning algorithms. The significance threshold, \(\alpha \), has been set to 0.1. It has been run for families of subprofiles (for the three values of k and the best decay method, when applicable): TempTop with 2Sqrt, TopTemp with 2Sqrt and Top without decay. The idea is to first determine if there are significant differences in each family. The results of these tests fulfil the same pattern for TempTop and TopTemp families: there are no differences between the two top k values and there is with the worst. In the case of TempTop, there are no differences between 110 and 20, but there are with 400. For TopTemp, there are no differences between 400 and 110, but there are for 20. In terms of Top, there are no differences between the three values of k. A final series of tests is run between the best values of each family, including in this case the baselines (p values are shown in Table , where the * means that there are significant differences): TempTop, 110, 2Sqrt; TopTemp, 400, 2Sqrt; Top, 20, None; Temp, 2Sqrt; Atomic, 2Sqrt; and Mono, None. The results show that there are significant differences between TempTop and the others and also between Mono and the others, and there are no differences between TopTemp, Top and Temp. And there are differences between TopTemp and Atomic, but not between Atomic, Top and Temp. By way of summary, TempTop is clearly the best option for combining temporal and topical dimensions and Mono is the worst alternative.

Table 2 p values of the McNemar’s tests

Conclusions and further research

In this paper, we have focused on testing how useful temporality, topicality and the combination of these are for the problem of building and using profiles in the context of journal recommendation. We have proposed five research questions and tried to give answers to them by means of a detailed experimental design. We have proposed two different ways of hybridising temporal and topical dimensions with the aim of improving the capacity of organising the available information and the performance of this specific type of recommender. A biomedical journal collection on publication venue recommendation was used to test our proposals, in conjunction with several state-of-the-art models, and these revealed that the combination of these two types of approaches is a good alternative although it is important to note that order matters in terms of performance. From our experiments, we can conclude that the best option for our problem at hand is to create temporal partitions and discover the latent topics starting from the papers in each partition using LDA and then to construct the profiles. It is important to mention that the number of topics for building the profiles is not a critical parameter. The application of a decay factor might be a valuable aid but it clearly depends on the quality of the penalising function. In our context, 2Sqrt helps to improve the performance of the recommendation with hybrid subprofiles.

The findings revealed in this paper might not be directly extrapolated to a generic content-based recommender system, although we think that they could be useful, and a good starting point, to improve it.

In terms of future lines of research, we plan to explore other methods of combining temporal and topical dimensions to obtain better subprofiles. One alternative is the use of temporal topic models (Blei and Lafferty 2006; Dieng et al. 2019). Another option is to use methods based on the aggregation or fusion (Wu 2012) of topical and temporal rankings individually obtained by Top and Temp, respectively. Another research line is to design high quality decay functions that boost the performance of hybridisation. Finding optimal values for the number of topics for each topic-based profiling model and how they could impact in the performance of the recommendation model is another interesting area of investigation. Finally, we also plan to study the most suitable ways of explaining the recommendations (Tintarev and Masthoff 2012) offered by our models.