Call for Participation: Rakuten Data Challenge

2020.05.11

Registration for the Rakuten Data Challenge, organized by Rakuten Institute of Technology, is now open. The Rakuten Multi-modal Classification and Retrieval challenge will be held as part of 2020 SIGIR Workshop on eCommerce, in conjunction with ACM SIGIR Conference on Research and Development in Information Retrieval. The Conference, now in its 43^rd year, is a forum that focuses on Information Retrieval (IR) and Natural Language Processing (NLP) research. It brings together practitioners and researchers from academia and industry to discuss the challenges and approaches to product search and recommendation in the e-commerce domain.

This year’s competition tasks data scientists with predicting the category of thousands of products from their associated titles, descriptions and images, and predicting which images correspond to an held-out test set of titles and description. The product data come from a sample e-commerce catalog released by Rakuten France. The role of standardization presents a major challenge for the e-commerce industry to describe products, syndicate, and make data compliant. The competition will help contribute to a deeper understanding of machine learning’s application to e-commerce, further ongoing research for accurate and manageable categorization, and drive advances in industry.

We encourage participation in the data challenge, and submission of system description papers. More information is available in the data challenge page, and detailed instructions in this PDF file.

Challenge Evaluation Metric

Multi-modal classification task: We will use the macro-F1 score to evaluate product type code classification on held out test samples. The score is understood as the arithmetic average of per product type code F1 score.

Cross-modal retrieval task: recall at 1 (R@1) on held out test samples. The score is understood to be the average of the per-sample scoring of 1 if the image returned matches the title and 0 otherwise.

Challenge Stage 1 – Model Building (April 20 – June 26)

Participants build and test models on the training data. The leaderboard only shows top model performances on a subset of the test set according to the latest submissions. Each team can submit at most 4 times per day in this stage.

Challenge Stage 2 – Model Evaluation (June 27 – July 22)

The final leaderboard will freeze on July 22 and show the model performances on the entire test set according to the latest submissions. In this stage also each team can submit at most 4 times during the time period that the evaluation is open.

System Description Paper

System description papers will be peer reviewed (single-blind) by the program committee. There will be no specific constraint on the content but it should cover the implementation details, such as data preprocessing, including token normalization and feature extraction, additional data used from external sources; model descriptions, including specific implementations, parameter tuning, etc. and error analysis, if any. Suggested paper length is 4-6 pages, and parameter tuning settings or similar information can be moved to an Appendix section. The deadline for paper submission is July 3, 2020.