BingNow 2.0: Real-Time Business Metadata Extraction
BingNow 2.0: Real-Time Business Metadata Extraction

Real-time information about businesses such as, the current occupancy and music levels, as well as the type or exact song playing now, can be important factors in the local search decision process. In this work, we propose to automatically crowdsource such rich, real time business metadata through user check-in events.

Local search users today decide what business to visit solely based on generic or stale information such as distance information, and business ratings. We believe that local search users can only make an informed decision of where to go next if they know more detailed information about the state of each business in the search results at the time of query. For instance, real-time information about businesses such as, the current occupancy and music levels, as well as the type or exact song playing now, can be important factors in the users’ decision process. With this type of information available in the search results, a young professional that wants to go out for drinks to socialize, could emphasize on crowded bars playing loud music. In contrast, a user that wishes to have dinner with his family of 5, could emphasize on lightly occupied businesses playing low or no music at all.

Unfortunately, this type of business information (metadata) doesn’t exist today. In this work, we propose to automatically crowdsource such rich, real time business metadata through real user check-in events. Every time a user checks into a business, the phone is in user’s hands, and the phone’s sensors can sense the business environment. In particular, we leverage the phone’s microphone during this time to infer the occupancy level, the music type or exact song playing, as well as the music and noise levels in the business. The audio data recorded through the phone’s microphone capture all the different acoustic sources in the business (i.e, music, human chatter, noise). By properly analyzing the recorded audio in the temporal and frequency domains, we extract a set of features that can capture the unique and subtly properties of human speech and music. Then, given labeled data traces from real business, we train machine learning models, such as decision tree models, to predict the occupancy, human chatter, music and noise levels in the business. As users check-in to businesses throughout the day, this type of metadata about the businesses can also be updated. In that way, a real-time stream of rich business metadata can be automatically extracted.

Using approximately 150 audio traces collected from real businesses of various types over a period of 3 months, we show that by properly extracting the temporal and frequency signatures of the audio signal, we can train machine learning models to simultaneously infer occupancy, human chatter, music, and noise levels in a business, with higher than 79% accuracy.

This stream of rich business metadata can help search engines index/understand the physical world in real time, to enable the next generation of local search experience. For instance, users could search for local business with queries such as: “Crowded bar playing loud hip-hop music now” or “Quiet Italian restaurant playing low classical music now”. Current search engines are unable to understand and serve this type of queries as they have no understanding of what a “crowded business” or what “a business playing loud hip-hop music” actually means. Even worse, they are unable to understand these statements in real-time.

The following shows an example of how search engines can leverage real-time business metadata to understand the physical world in real-time, and enable the next generation of the local search user experience.