RIT Paris and Boston’s Collaborative Research at European Conference on Information Retrieval 2021
Parantapa Goswami, researcher at RIT Paris, and Hesam Amoualian, formerly of RIT, shared their research findings at the 2021 European Conference on Informationl Retrieval (ECIR). This year marks the 43rd edition of the conference, reflecting its long-standing reputation as one of the premier European forums for original research in Information Retrieval; this includes theory, experimentation, and practice of retrieval, representation, management, and usage of textual, visual, audio, and multi-modal information. The event was planned to take place in-person in Lucca, Italy but moved to a virtual format this year. “An E-Commerce Dataset in French for Multi-modal Product Categorization and Cross-Modal Retrieval” was written through a fruitful collaboration with members of RIT Paris and RIT Boston, who constantly work together to combine language and image machine learning models to extract structured information from product titles, descriptions, and images. The paper focuses on a large scale, multi-modal dataset of 99K product listings from Rakuten France’s catalog, which was released as part of the Sigir ECom’20 Workshop. The real world dataset consists of 85K products and product type categories as training data and approximately 9.5K and 4.5K products as test sets for two proposed tasks, multi-modal classification and cross-modal retrieval tasks. With leading scholars and researchers in the Information Retrieval field in attendance his year, Dr. Goswami and Dr. Amoualian could share real-world data from the 2020 workshop with the wider research community as well as consider future improvements in RIT’s data preparation and data challenges.