Application-Driven Web Resource Location Classification and Detection

Rapid pervasion of the web into users' daily lives has put much importance on capturing location-specific information on the web, due to the fact that most human activities occur locally around where a user is located. This is especially true in the increasingly popular mobile and local search environments. Thus, how to correctly and effectively detect locations from web resources has become a key challenge to location-based web applications. Previous work has been focusing on deducing web locations from various geographical sources such as geographical names, postal codes, telephone numbers, and so on. None of them, however, notice the intrinsic differences between application needs for different types of locations. Multiple locations may co-exist in a web resource; designing a general-purpose algorithm ignoring their differences usually leads to a low detection precision. In this paper, we first explicitly distinguish the locations of web resources into three types to cater to different application needs: 1) provider location; 2) content location; and 3) serving location. Then we describe a novel system that computes each of the three locations, employing a set of algorithms and different geographical sources. Experimental results on large samples of web data show that our solution outperforms previous approaches. Finally, we identify some promising web applications based on the three proposed locations.

InstitutionMicrosoft Research
