Keywords

1 Introduction

Bibliographic datasets form a major topic in the Linked Open Data CloudFootnote 1, accounting for a total of 12–13% of all datasets [15]. One of those datasets is SciGraphFootnote 2, which is published by Springer Nature and is the successor of Springer’s Linked Open Data Conference Portal [3], comprising 7.2M articles and 240k books published by Springer Nature, and totaling to 1B triples.

In this paper, we aim at exploiting SciGraph to provide users with recommendations of conferences to submit their publications to, utilizing SciGraph for information on past conferences and publications, and WikiCfP for information on upcoming conferences.

2 Related Work

The idea of building recommender systems for scholarly content goes back almost 20 years [2, 7]. More recently, Linked Open Data has been recognized as a valuable source for building recommender systems. In particular, content-based recommender systems, which focus on the items to be recommended and their interrelations, can benefit strongly from detailed descriptions of those items in open datasets [4, 5].

Similar to the task in this paper, several approaches have been proposed for the recommendation of research papers (see [1] for a comprehensive survey). Although sharing the same domain, the setup is slightly different here – in our scenario, both the input data (i.e., authors, a textual abstract, keywords), and the prediction target (conferences instead of individual papers) are different.

3 Approach

3.1 Datasets

The main dataset used to train the recommender system is SciGraph. For training, we use publications from the years 2013–2015, whereas for evaluation, publications from the year 2016 are used. In total, SciGraph contains 240,396 books, however, only a fraction out of those correspond to the proceedings of a single conference. Moreover, it contains 3,987,480 individual book chapters, again, a fraction of which correspond to papers published at conferences. Additionally, SciGraph provides a taxonomy of research topics, called Product Market Codes (PMCs). In total, 1,465 of those PMCs are included in the hierarchy and assigned to books. Only 89 of those PMCs are related to computer science.

The second dataset we use is WikiCfPFootnote 3, a website which publishes calls for papers. Since there is no downloadable version of the data (although the CC-BY-SA license allows for reusing the dataset), we built a crawler to create a dataset of CfPs, containing names, acronyms, dates, locations, and submission deadlines (which we consider mandatory attributes), as well as links to the conference page, the conference series, categorization in WikiCfP, and textual description (which we consider optional attributes). Overall, we crawled data for 65,714 CfPs in July 2018. The crawled data was linked to SciGraph using string similarity between the conference names. This leads to 53.1% of the CfPs linked to SciGraph.

3.2 Recommendation Techniques

We use three main families of recommendation techniques, i.e., recommendations based on authors, abstracts, and keywords. Furthermore, we also use an ensemble strategy. Generally, the recommendation strategies either exploit some notion of similarity (e.g., recommending conferences which contain publications with similar abstracts), or model the problem as a machine learning problem (i.e., since we have 742 conference series in our training set, we train a multi-label classifier for 742 classes).

Author-based recommendations are computed based on the authors of an application. Essentially, we count the number of papers per conference series which share at least one author with the authors given in the abstract, and use that count as a score.Footnote 4

Abstract-based recommendations compare the abstracts of publications in SciGraph with the abstract given by the user. Overall, we use two different approaches: the max strategy finds single publications with the highest abstract similarity and proposes the corresponding conference, while the concat strategy concatenates all abstracts related to a conference to a virtual document, and compares the given abstract to those virtual documents.

Different variants for generating recommendations are used. We utilize standard TF-IDF, as well as TF-IDF based on word n-grams, LSA and LSA based on word n-grams [10], and pLSA [6]. Furthermore, we utilize similarity based on word embeddings, based on word2vec [11], GloVe [13], and FastText [8], using both pre-trained embeddings, as well as embeddings trained on the SciGraph collection of abstracts. While all those approaches are based on similarities, we also tried directly predicting the conferences using a convolutional neural network (CNN) approach, which takes the self-trained word2vec embeddings as representations for words, as discussed in [9].

Keyword-based recommendations are based on Product Market Codes in SciGraph. Such product market codes are defined by Springer Nature and resemble other categorization systems in computer science, such as the ACM computing classification system. A second keyword-based model uses a script to identify Computer Science Ontology (CSO) [14] terms in the abstract entered by the user.

4 Evaluation

As sketched above, publication data from 2013–2015 were used as training data for the recommender system, whereas publications from 2016 were used for testing. For each publication in the test set, we try to predict the conference at which it has been published, and compare the results to the gold standard (i.e., the conference in which it has actually been published). We create 10 recommendations with each techniqueFootnote 5, and report recall@10 and mean average precision (MAP).

Table 1 shows some basic statistics of the training and test set. In total, the recommender system is trained on 742 conference series and 555,798 papers written by 110,831 authors. As far as the abstracts are concerned, only little more than 10% of all the papers have an English language abstract.Footnote 6 The average length of an abstract is 136 words.

Table 1. Characteristics of the training and test set

Table 2 summarizes the results of the best performing models for recommendations based on authors, abstracts, and keywords. Generally, abstracts work better than authors, and keywords work better than abstracts. For abstracts, TF-IDF using single tokens yields a recall@10 of 0.461 and a MAP of 0.237. For using TF-IDF with n-grams, we explored different variants: we varied the upper limit for n between 2 and 5, and evaluated the approach with the 500k and 1M most frequent n-grams, as well as with all n-grams. The best results were obtained when using the 1M most frequent n-grams of size 1 to 4, outperforming the standard TF-IDF approach.

Table 2. Results of the best performing individual recommendation techniques. For each individual technique, we only report the results of the best performing strategy (max or concat).

In addition, we also evaluated a few ensemble setups. These were built by combining recommendation lists of length 10, 100, and 1,000, given by different base recommenders, and using a logistic regression as a meta learner [16] to generate a recommendation list of length 10 as in the setups above. We can observe that combining two abstract-based techniques (TF-IDF and word2vec plus CNN, which were very diverse in their predictions), outperforms the two individual techniques in both recall@10 and MAP.

Building ensembles incorporating SciGraph market codes yields no significantly better results than using keywords alone, demonstrating that those keywords are in fact the most suitable indicator for recommending conferences. Generally, extending the base recommendation lists beyond 100 elements does not change the results much, because conferences predicted on a position higher than 100 are unlikely to be considered in the final result list of size 10.

The recall figures reported in Table 2 do not exceed 0.665, but this result should be considered in a broader context. In total, only 77% of all conferences in the test set are also contained in the training set, i.e., we do not have any training signals for the remaining conferences. Since we can only use previous publications of proceedings for generating training features, the approaches discussed in this paper can only recommend conferences known from the training set, i.e., the maximum recall we could reach with these methods would be 0.815.

In general, we can see that keyword-based models are the best performing ones. However, they are also the least user-friendly ones, since product market codes are assigned by editors at Springer Nature (more recently, using automated tools [12]). While end users might be able to assign them at a decent quality, the actual recommendation quality with user-assigned keywords might actually be lower than the one based on editor-assigned product market codes. Another possible issue is that by selecting up to seven keywords out of 1,465, one could easily create pseudo-keys for conferences (i.e., each conference can be uniquely identified by its keywords), so overfitting might also be an issue for those models.

Another observation we have made in our experiments is that there is a strong bias towards machine learning and neural networks related conferences. As the corpus is focused on computer science conferences, and the training dataset is from the past few years (an informal inspection of the data in SciGraph yielded that roughly half of the papers in the graph are related to artificial intelligence), this topic is over-represented in our training dataset. Hence, the system is likely to create more recommendations for such conferences.

5 Conclusion

In this paper, we have introduced a recommendation system for conferences, based on abstracts, authors, and keywords.Footnote 7 The system can be used by authors searching for upcoming conferences to publish at. The recommendations are computed based on SciGraph, with submission deadlines added from WikiCfP.

We have observed that the best signal for creating recommendations are keywords, in particular market codes in SciGraph, which, however, are not often easy to select for laymen users. With those keywords, a recall@10 of up to 0.665 and a MAP of up to 0.522 can be reached. Recommendations based on authors (recall@10 of 0.372 and MAP of 0.284) and abstracts (recall@10 up to 0.494, MAP up to 0.273) are clearly inferior, where the best results for the latter are obtained with TF-IDF based on word n-grams. Moreover, the good results obtained with vector space embeddings pre-trained on other text categories (e.g., news articles or Wikipedia texts) could not be reproduced on a target corpus of abstracts of scientific texts from various research fields.