Accurate Product Attribute Extraction on the Field
In this paper we present a bootstrapping approach for attribute value extraction that minimizes the need for human intervention.Our approach automatically extracts attribute names and values from semi structured text, generates a small labelled dataset, and bootstraps it by extracting new values from unstructured text. It is domain/language-independent, relying only on existing semi-structured text to create the initial labeled dataset.We assess the impact of different machine learning approaches to increase precision of the core approach without compromising coverage.We perform an extensive evaluation using e-commerce product data across different categories in two languages and hundreds of thousands of product pages. We show that our approach provides high precision and good coverage. In addition, we study the impact of different methods that address specific sources of error. With error analysis we highlight how these methods complement each other, obtaining insights about the individual methods and the ensamble as a whole.