
Overview | ProBase | Snapshots | DF-ITF | Evaluation
Our analysis on the Bing search log during the period of September of 2007 to June of 2009 shows that about 62% of the queries contain at least one concept term. More detailed analysis revealed that common web queries can be classified into a number of different patterns. The following five basic patterns account for the majority of all the Bing queries during that period:
1. Single Entity (E)
2. Single Concept (C)
3. Single Entity + Attributes (E+A)
4. Single Concept + Attributes (C+A)
5. Single Concept + Keywords (C+K)
These patterns can be combined to form more complex patterns. In the paper, we focus on one of them:
Concept + Keywords + Concept (C+K+C)
To evaluate the performance of online query processing, we create a set of benchmark queries that contain concepts, entities, and attributes, for example, "politicians commit crimes" (C+K+C), "large companies in chicago" (C+K), "president washington quotes" (E+A), etc. The first 6 tables show the queries we used. 10 queries for each pattern. (E), (C), (E+A) and (C+A) queries are randomly selected from Bing's search log, so the rank column shows their rankings by frequency.
One assumption we made in the paper is that we can estimate the association between an entity term and a keyword using simple two-way word association. We described this in detail in the paper. Here we take the 10 benchmark C+K queries, substituting the concept for each of them to generate a set of E+K queries, and highlight the pivot word we found in some of these E+K queries. The last table shows them.
- Single Entity (E)
These 10 queries are randomly selected from Bing's two-year search log. Freq. is the corresponding query's frequency and Ranking is its frequency ranking. (The same as below )
# E Queries Freq. Ranking E-1 (house beautiful) 27285 83899 E-2 (borland) 15628 146134 E-3 (witco) 2366 911523 E-4 (hicksville) 1408 1490513 E-5 (alan taylor) 751 2691832 E-6 (condobolin) 654 3060796 E-7 (george low) 216 8569491 E-8 (pigmy love circus) 199 9235273 E-9 (still bill) 117 14965256 E-10 (kip hanrahan) 99 17425678 -
Single Concept (C)
# C Queries Freq. Ranking C-1 [cars] 2286411 528 C-2 [online services] 23729 96622 C-3 [american artists] 12421 183182 C-4 [boutiques] 10815 209807 C-5 [classic fairy tales] 7154 314209 C-6 [british authors] 3311 662012 C-7 [red sox players] 1404 1493864 C-8 [italian composers] 716 2813231 C-9 [media conglomerates] 297 6384077 C-10 [mainstream movies] 105 16564335 -
Single Entity + Attributes (E+A)
# E+A Queries Freq. Ranking EA-1 (pedro infante) <music> 658 3043603 EA-2 (franklin) <time> 84 20166776 EA-3 (assurant) <employees> 70 23591037 EA-4 (king arthur) <music> 58 27870316 EA-5 (phil harris) <age> 30 50799136 EA-6 (david beckham) <quote> 28 53243346 EA-7 (chennai) <place> 17 85110446 EA-8 (president washington) <quotes> 16 87769744 EA-9 (aaron neville) <album> 10 132744902 EA-10 (peanuts) <artist> 10 132744902 -
Single Concept + Attributes (C+A)
# C+A Queries Freq. Ranking CA-1 [famous people] <birthdays> 2116 1012793 CA-2 [common allergies] <symptoms> 1424 1474277 CA-3 [movies] <quotes> 853 2389110 CA-4 [professional boxers] <champions> 145 12310179 CA-5 [violinists] <tool> 28 52969723 CA-6 [films] <soundtracks> 21 68772582 CA-7 [images] <cameras> 21 68772582 CA-8 [herbal supplements] <energy> 13 109177365 CA-9 [nba players] <position> 12 113769727 CA-10 [funds] <home page> 11 126590746 -
Single Concept + Keywords (C+K)
# C+K Queries CK-1 [east asian countries] with nuclear capability CK-2 [american cities] sigmod CK-3 [large companies] in chicago CK-4 las vegas [outdoor activities] CK-5 [international organizations] focus on environmental protection CK-6 horse [medical conditions] CK-7 [name brands] in chinese market CK-8 [football players] own goal CK-9 [astronauts] fly to the moon CK-10 [famous people] bribery -
Concept + Keywords + Concept (C+K+C)
# C+K+C Queries CKC-1 [companies] buy [tech companies] CKC-2 [politicians] commit [crimes] CKC-3 [extreme sports] in [asian countries] CKC-4 [database conferences] in [european cities] CKC-5 [presidents] graduated from [universities] CKC-6 [rivers] flow into [seas] CKC-7 [actors] marry [actresses] CKC-8 [football stars] join [football teams] CKC-9 [cars] owned by [celebrities] CKC-10 [peoples] believe in [religions] -
Pivot words
1Both are pivot words.# Pivot Word in Each Query CK-1 (pr china) with nuclear capability (republic korea) with nuclear capability (north korea) with nuclear capability CK-2 (san diego) sigmod (washington dc) sigmod (ann arbor)1 sigmod CK-3 (general electric) in chicago (american express) in chicago (microsoft corp) in chicago CK-4 las vegas (american football) las vegas (water sport) las vegas (nascar race) CK-5 (family health international) focus on ... (world health organization) focus on ... (the pan american health organization) focus on ... CK-6 horse (high blood pressure) horse (low blood sugar) horse (skin cancer) CK-7 (new balance) in chinese market (calvin klein) in chinese market (hewlett packard) in chinese market CK-8 (paul robinson) own goal (graham alexander) own goal (jonathan woodgate) own goal CK-9 (john glenn) fly to the moon (neil armstrong) fly to the moon (michael collins) fly to the moon CK-10 (james brown) bribery (george bush) bribery (bill clinton) bribery
