One Click Access evaluation at NTCIR
This is the new homepage for NTCIR-10 1CLICK-2.
- New URL for the 1CLICK-2 homepage: http://research.microsoft.com/1CLICK/
IMPORTANT DATES (in Japan Time: UTC+9)
April 30, 2012
|sample queries and iUnits released RIGHT HERE|
|Aug 31, 2012||test queries released|
|Oct 31, 2012||run submissions due|
|Nov 2012-Jan 2013||iUnit match evaluation|
|Feb 01, 2013||very early draft overview released|
|Feb 28, 2013||evaluation results released|
|Mar 01, 2013||draft participant papers due|
|May 01, 2013||all camera ready papers due|
|Jun 18-21, 2013||NTCIR-10 at NII, Tokyo|
Current web search engines usually return a ranked list of URLs in response to a query. The user often has to visit several web pages and locate relevant parts within long web pages. However, for some classes of queries, the system should be able to gather and return relevant information directly to the user, to satisfy her immediately after her click on the search button ("one click access").
The 1CLICK task focusses on evaluting textual output based on information units (iUnits) rather than document relevance. Moreover, we require the systems to try to minimise the amount of text the user has to read or, equivalently, the time she has to spend in order to obtain the information. This type of information access is particularly important for mobile search. The systems are thus expected to search the web and return a multi-document summary of retrieved relevant webpages that fits a small screen.
The first round of 1CLICK at NTCIR-9 (1CLICK-1) dealt with Japanese queries only.There, based on a study on desktop and mobile query logs, we considered four query types: CELEBRITY, LOCAL, DEFINITION and QA.
For this second round of 1CLICK (1CLICK-2), we have expanded our language scope to English and Japanese. Moreover, we will use the following, more fine-grained query types:
ARTIST (10) use wants important facts about musicians, novelists etc. who produce work of art;
ACTOR (10) user wants important facts about actors, actresses, TV personalities etc.;
POLITICIAN (10) user wants important facts about politicians;
ATHLETE (10) user wants important facts about athletes;
FACILITY (15) user wants access and contact info of a particular landmark, facility etc.;
GEO (15) user wants access and contact information of entities with geographical constraints e.g. sushi restaurants near Tokyo station;
DEFINITION (15) user wants to look up an unfamiliar term, an idiom etc.;
QA (15) user wants to know factual (but not necessarily factoid) answers to a natural language question;
The number of queries for each query type is shown in parentheses. Thus, we will use a total of 100 queries for the Japanese task, and another 100 for the English task. The queries will be selected from real mobile query logs.
Registered 1CLICK participants must submit at least one run to the Main Tasks or the Query Classification Subtasks.
MAIN TASKS (Japanese, English)
This is similar to the 1CLICK-1 task described in the Overview: given a query, return a single textual output (X-string). The length of the X-string is limited as follows:
- For English, 1000 chars for DESKTOP run and 280 chars for MOBILE run; and
- For Japanese, 500 chars for DESKTOP run and 140 chars for MOBILE run.
Note that symbols (such as ',' and '(') are excluded in counting the number of characters.
We require systems to return important pieces of factual information and minimise the amount of text the user has to read (time she has to spend to obtain the information).
There are three types of Main Task runs:
Mandatory Runs: Organisers will provide baseline web search results and their page contents for each query. Participants must use these contents only to generate X-strings. Using the baseline data will enhance repeatability and comparability of 1CLICK experiments.
Oracle Runs (OPTIONAL): Organisers will provide the supporting URLs and their page contents for the gold standard iUnits for each query. Participants can use the data either wholly or partially to generate X-strings. If this data set is used in any way at all, the run is considered as an Oracle run.
Open Runs (OPTIONAL): Participants may choose to search the live web on their own to generate X-strings. Any run that does NOT use the oracle data but uses at least some privately-obtained web search results is considered as an Open run, even if it also uses the baseline data.
QUERY CLASSIFICATION SUBTASKS (Japanese, English)
This is a relatively easy subtask: given a query, return its query type. The query type should be chosen from the taxonomy mentioned above. Main Task participants whose systems involve query classification are encouraged to participate in this subtask to "componentise" evaluation.
The input of Query Classification Subtasks is the same as that of MAIN tasks:
a query set file, which contains pairs of a query ID and a query string as explained below.
NOTE: For the Query Classification Subtasks, there are no run types such as Mandatory/Oracle/Open.
We will release a query set file, in which each line contains the following two fields:
Note that we will not explicitly provide the query type information before the run submission deadline.
For ORACLE and MANDATORY runs, we will provide a set of HTML documents for each query, from which participants are expected to generate X-strings. In addition, an index file for each query will be released, which contains the title, URL, rank, and snippet (summary of a HTML document presented under the title in SERP) of the HTML documents. The rank and snippet of HTML documents are derived from Bing's Web search results.
Index files are named <queryID>-index.tsv, and each line is of the following format:
where <filename> is the name of HTML document distributed by the organizers.
- Input files (Download via Microsoft SkyDrive)
- 1C2-E-SAMPLE.tsv (a query set file that contains five queries)
- 1C2-E-SAMPLE-000x-index.tsv (an index file for the query ID 1C2-E-SAMPLE-000x)
- 1C2-E-SAMPLE-000x folder (a folder that contains HTML documents for the query ID 1C2-E-SAMPLE-000x)
Each team can submit up to FOUR runs for Japanese and SIX runs for English. Note that only run files with higher priority will be evaluated if organizers cannot evaluate all the submissions due to lack of resources (see <integer> in file name specification below).
File name specification
The file name of each run must be of the following format:
- <teamID> is your registered team ID, e.g. MSRA
- <language> is either E (English) or J (Japanese)
- <runtype> is the identifier of a run type, which specifies DESKTOP (D) or MOBILE (M), and MANDATORY (MAND), ORACLE (ORCL), or OPEN. More formally, <runtype> must be of the following format: (D|M)-(MAND|ORCL|OPEN).
- D-MAND: a DESKTOP and MANDATORY run file,
- D-ORCL: a DESKTOP and ORCL run file, and
- M-OPEN: a MOBILE and OPEN run file.
- <integer> is a unique integer for each team's run starting from 1, which represents the priority of a run file. Run files with a smaller <integer> will be evaluated with higher priority in the event that organizers do not have resources enough to evaluate all the submissions. Note that at least one MANDATORY run with the highest priority will be evaluated regardless of its <integer>.
Some example run names for a team "MSRA" would be:
Run file specification
All run files should be encoded in UTF-8. TAB is used as the separator.
Each run file begins with exactly one system description line, which should be in the following format:
SYSDESC[TAB]<brief one-sentence system description in English>
Please make sure the description text does not contain a newline symbol.
Below the system description line, there must be an output line for each query. Each output line should contain an X-string. The required format is:
Again, X-strings should not contain any newline symbols. The nugget match evaluation interface will truncate each run before evaluation, ignoring punctuation marks etc.
Each output line should be followed by at least one SOURCE line. These lines represent the knowledge sources from which you generated the X-string. Each SOURCE line must be in the following format:
These lines will be used for investigating what kinds of knowledge sources the participating teams have utilized. <source> must be either a URL (for ORACLE and OPEN runs) or a filename (for MANDATORY runs).
- MSRA-J-D-MAND-1.tsv (Download via Microsoft SkyDrive)
a run file of a team MSRA for English Desktop MANDATORY runs with the highest priority 1
- MSRA-E-D-OPEN-2.tsv (Download via Microsoft SkyDrive)
a run file of a team MSRA for English Desktop OPEN runs with the second highest priority 2
QUESTION CLASSIFICATION SUBTASKS
File name specification
The file name of each run in Query Classification Subtasks must be of
the following format:
where <teamID> is your registered team ID, and <integer> is a unique
integer for each team's run starting from 1. You can submit as many Query Classification runs as you like.
Run file specification
Each line in the run file should contain the following two fields:
where <querytype> is a query type predicted by your system, which must
be one of the following eight types: ARTIST, ACTOR, POLITICIAN, ATHLETE,
FACILITY, GEO, DEFINITION, and QA.
We will follow and possibly extend the S-measure evaluation methodology as described in a CIKM11 paper and Overview of NTCIR-9 1CLICK. S-measure is like weighted nugget recall, but it encourages systems to
(a) present important pieces of information first; and
(b) minimise the amount of text the user has to read.
In addition, unnecessarily long X-strings will be penalised, using a nugget-precision-like metric, as described in an AIRS12 paper.
We also plan to devise new iUnit-based evaluation methods using
ideas from a WSDM12 paper.
(ntcadm-1click at nii.ac.jp)
|Makoto Kato||Kyoto University, Japan|
|Tetsuya Sakai||MSRA, China|
|Virgil Pavlu||Northeastern University, USA|
|Takehiro Yamamoto||Kyoto University, Japan|
|Mayu Iwata||Osaka University, Japan|
|Zhicheng Dou||MSRA, China|
|Matthew Ekstrand-Abueg||Northeastern University, USA|
- Rajput, S., Ekstrand-Abueg, M., Pavlu, V. and Aslam, J.A.: Constructing test collections by inferring document relevance via extracted relevant information, ACM CIKM 2012. doi
- Pavlu, V., Rajput, S., Golbus, P.B. and Aslam, J. A.: IR System Evaluation using Nugget-based Test Collections, ACM WSDM 2012. doi
- Sakai, T. and Kato, M.P: One Click One Revisited: Enhancing Evaluation based on Information Units, AIRS 2012.
- Sakai, T., Kato, M.P. and Song, Y.-I.: Overview of NTCIR-9 1CLICK, NTCIR-9 Proceedings, 2011. pdf
- Sakai, T., Kato, M.P. and Song, Y.-I.: Click the Search Button and Be Happy: Evaluating Direct and Immediate Information Access, ACM CIKM 2011. preprint