Lei Chen and Hiro Miyake of RIT Boston attended this year's (NAACL-HTL 2021) to present their research findings. NAACL is a top tier conference in the field of Computational Linguistics and Natural Language Processing (NLP).
Their paper “Label-Guided Learning for Item Categorization in e-Commerce" applies label-guided learning to the item categorization task by using the meaning of class labels like food, clothes, and electronics, to improve text classification performance. While label-guided learning has been shown to improve performance on several standard text classification data sets, very little research has explored industry usage. Using real-industry data from Rakuten's e-commerce platform, they were able to evaluate the benefits of label-guided learning and its real-world applicability. For example, they find that pre-trained embeddings specialized in specific categories performed better than in all available categories, demonstrating the potential for label-guided learning to improve item categorization systems in e-commerce and other industries.
To reach their results, they tested two specific models, Label Embedding Attentive Model (LEAM) and Label-Specific Attention Network (LSAN), finding that LSAN is better than LEAM by taking advantage of context encoding and adaptive combination of self- and label-attention. Furthermore, by using hyperbolic embeddings which learn hierarchical representations of symbolic data by embedding them into hyperbolic space, they conclude that these embeddings ensure class labels are sufficiently separated and can improve performance.
The conference was well attended by researchers from major global companies, including Amazon, Apple, Microsoft, IBM, Alibaba, and ByteDance, all significantly investing in their AI capabilities. Presenting Rakuten research results at such a conference helps to publicize the capabilities of Rakuten and can help recruit top talent in the future.
In the future, the researchers plan to extend their work by incorporating label-guided learning into BERT, which has proven to be a powerful backbone for many modern machine learning models. Also, although hyperbolic embedding was used to initialize the label embeddings, the training proceeded in the normal Euclidean space, so further improvements may be possible by training the model entirely in hyperbolic space.
Two Research Scientists Selected for Presentations at NLP 2021
Boston Members accepted to Top Tier Conference at NAACL-HTL 2021
Coining Simpson’s Bias and Finding Best Approaches through Data at AAAI 2021
New Data Release on Hotel Reviews using Aspect-Based Sentiment Analysis