Sampling Representative Phrase Sets for Text Entry Experiments

We propose a procedure for sampling representative phrases from any large corpus so that text input researchers can curate their own stimuli for tasks, domains and languages they wish to target using publicly accessible resources. The procedure is based on grounding the notion of representativeness in terms of information theory. Here, you can read the paper and download the code and data.


If you have any questions about the downloads, please feel free to contact us.  Furthermore, if there are datasets for which you would like to obtain representative phrase sets (e.g., general web, Wikipedia, etc.), please let us know as well.  We will use this project page to post requested stimuli. Thank you for your interest.