Joeran Beel, Stefan Langer, Bela Gipp, and Andreas Nürnberger. 2014. The Architecture and Datasets of Docear’s Research Paper Recommender System. In Proceedings of the 3rd International Workshop on Mining Scientific Publications (WOSP 2014) at the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014). Downloaded from http://www.docear.org.
The Architecture and Datasets of Docear’s
Research Paper Recommender System
Dept. of Computer Science
In the past few years, we have developed a research paper
recommender system for our reference management software
Docear. In this paper, we introduce the architecture of the
recommender system and four datasets. The architecture comprises of multiple components, e.g. for crawling PDFs, generating user models, and calculating content-based recommendations. It supports researchers and developers in building their own research paper recommender systems, and is, to the best of our knowledge, the most comprehensive architecture that has been released in this field. The four datasets contain metadata of 9.4 million academic articles, including 1.8 million articles publicly available on the Web; the articles’ citation network; anonymized information on 8,059 Docear users; information about the users’ 52,202 mind-maps and personal libraries; and details on the 308,146 recommendations that the recommender system delivered. The datasets are a unique source of information to enable, for instance, research on collaborative filtering, content-based filtering, and the use of reference management and mind-mapping software.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – information filtering.
Algorithms, Design, Experimentation
Dataset, recommender system, mind-map, reference manager,
Researchers and developers in the field of recommender systems can benefit from publicly available architectures and datasets1. Architectures help with the understanding and building of
recommender systems, and are available in various recommendation domains such as e-commerce , marketing , and engineering . Datasets empower the evaluation of recommender systems by enabling that researchers evaluate their systems with the same data.
In this paper, we present the architecture of Docear’s research paper recommender system. In addition, we present four datasets
containing information about a large corpus of research articles, and Docear’s users, their mind-maps, and the recommendations they received. By publishing the recommender system’s architecture and datasets, we pursue three goals.
First, we want researchers to be able to understand, validate, and reproduce our research on Docear’s recommender system. In our previous papers, we could often not go into detail of the
recommender system due to spacial restrictions. This paper gives the information on Docear’s recommender system that is necessary to allow the re-implementation of our approaches and to reproduce our findings.
Second, we want to support researchers when building their own research paper recommender systems. Docear’s architecture and datasets ease the process of designing one’s own system, estimating the required development times, determining the required hardware resources to run the system, and crawling full-text papers to use as recommendation candidates.
Third, we want to provide real-world data to researchers who have no access to such data. This is of particular importance, since the majority of researchers in the field of research paper recommender systems have no access to real-life recommender systems . Our datasets allow analyses beyond the analyses we have already...
Citations: dcr_doc_id_54421) (Figure 4). This allows to apply weighting
schemes, such as TF-IDF to citations, i.e
Please join StudyMode to read the full document