RIT has released “Rakuten France: Multi-modal Product Dataset” at the Rakuten Data Release subpage. The data was used for the Sigir 2020 E-Commerce Workshop Data Challenge, which focused on large-scale multi-modal (text and image) classification and cross-modal retrieval. The goal was to predict each product’s type code using a large dataset of approximately 99K product lists as defined by the Rakuten France catalog.
Researcher at RIT Paris, Parantapa Goswami, and Lead Research Scientist at RIT Boston, Pradipto Das, who were also the Data Challenge organizers, provided the data. It is publicly available in both English and Japanese.
SIGIR 2020 E-Commerce Workshop Data Challenge Information and Final Scoreboards: https://sigir-ecom.github.io/data-task.html
SIGIR 2020 E-Commerce Workshop Data Challenge Overview Paper: https://sigir-ecom.github.io/files/Rakuten_Data_Challenge.pdf