Rakuten Data Challenge
This paper presents an overview of the SIGIR 2020 eCom Rakuten Data Challenge. For this data challenge we make available a multimodal dataset of 99 thousand product listings from Rakuten France catalog. Each product in the dataset contains a textual title, a (possibly empty) textual description and an associated image. Two tasks are proposed, namely large-scale multi-modal classification and cross-modal retrieval. Among the data sets, around 85 thousand products and their corresponding product type category are released as training data, around 9.5 thousand products and 4.5 thousand products are released as the test sets for the multi-modal classification and cross-modal retrieval tasks respectively. The evaluation is run in two stages to measure system performance, first on 10% of the test data, and then on the rest 90% of the test data. The different systems are evaluated using macro-F1 score for the multi-modal classification task and recall@1 for the cross-modal retrieval task. Sixteen independent teams submitted system outputs in the proposed tasks. The top performance obtained at the end of the second stage is 91.94% macro-F1 and 50.23% recall@1 for the two tasks respectively.