Unsupervised Extraction of Attributes and Their Values from Product Description

Author: Keiji Shinzato and Satoshi Sekine


This paper describes an unsupervised method for extracting product attributes
and their values from an e-commerce product page. Previously, distant supervision has been applied for this task, but it is not applicable in domains where no reliable knowledge base (KB) is available. Instead, the proposed method automatically creates a KB from tables and itemizations embedded in the product’s pages. This KB is applied to annotate the pages automatically and the annotated corpus is used to train a model for the extraction. Because of the incompleteness of the KB, the annotated corpus is not as accurate as a manually annotated one. Our method tries to filter out sentences that are likely to include problematic annotations based on statistical measures and morpheme patterns induced from the entries in the KB. The experimental results show that the performance of our method achieves an average F score of approximately 58.2 points and that filters can improve the performance.

Copied! instagram