FlashNormalize: Programming by Examples for Text Normalization

IJCAI'15 Proceedings of the 24th International Conference on Artificial Intelligence |

Publication

Several applications including text-to-speech re-quire some normalized format of non-standard words in various domains such as numbers, dates, and currencies and in various human languages. The traditional approach of manually constructing a program for such a normalization task requires expertise in both programming and target (human) language and further does not scale to a large number of domain, format, and target language combinations. We propose to learn programs for such normalization tasks through examples. We present a domain-specific programming language that offers appropriate abstractions for succinctly describing such normalization tasks, and then present a novel search algorithm that can effectively learn programs in this language from input-output examples. We also briefly describe domain-specific heuristics for guiding users of our system to provide representative examples for normalization tasks related to that do-main. Our experiments show that we are able to effectively learn desired programs for a variety of normalization tasks.