He Wang, Dimitrios Lymberopoulos, and Jie Liu
Local search users today decide what business to visit solely based on distance information, and business ratings that can be sparse or stale. We believe that when users search for local businesses, such as bars or restaurants, they need to know more about the ambience of each business, such as how crowded it is, how loud and of what type the music it plays is, as well as how loud the human chatter in the business is. Unfortunately, this information doesn't exist today. In this paper, we propose to automatically crowdsource such rich, local business ambience metadata through real user check-in events. Every time a user checks into a business, the phone is in user's hands, and the phone's sensors can sense the business environment. We leverage the phone's microphone during this time to infer the occupancy and human chatter levels, the music type, as well as the music and noise levels in the business. As people check-in to businesses throughout the day, business metadata can be automatically updated over time, enabling a new generation of local search experience. Using approximately 150 audio traces collected from real businesses of various types over a period of 3 months, we show that by properly extracting the temporal and frequency signatures of the audio signal, it is feasible to train models that can simultaneously infer occupancy, human chatter, music, and noise levels in a business, with higher than 79% accuracy.
In 23rd International World Wide Web Conference (WWW)