PUBLICATIONS

Accurate Word Segmentation using Transliteration and Language Model Projection

Author: Masato Hagiwara and Satoshi Sekine

Aug 2013

Proc. of the 51st Annual Meeting of the Association for Computational Linguistics (ACL2013), pp.183-189, 2013.

ABSTRACT

Transliterated compound nouns not separated by whitespaces pose difficulty on word segmentation (WS). Offline approaches have been proposed to split them using word statistics, but
they rely on static lexicon, limiting their use. We propose an online approach, integrating source LM, and/or, back-transliteration and English LM. The experiments on Japanese and Chinese WS have shown that the proposed models achieve significant improvement over state-of-the-art, reducing 16% errors in Japanese.

Paper Link

Copied!

Research Areas : #Language Program

Tags : #Natural Language Processing

Careers : Open Positions