Feature List
Each query-url pair is represented by a 136-dimensional vector.
|
Feature List of Microsoft Learning to Rank Datasets | |||
| feature id | feature description | stream | comments |
| 1 | covered query term number | body | |
| 2 | anchor | ||
| 3 | title | ||
| 4 | url | ||
| 5 | whole document | ||
| 6 | covered query term ratio | body | |
| 7 | anchor | ||
| 8 | title | ||
| 9 | url | ||
| 10 | whole document | ||
| 11 | stream length | body | |
| 12 | anchor | ||
| 13 | title | ||
| 14 | url | ||
| 15 | whole document | ||
| 16 | IDF(Inverse document frequency) | body | |
| 17 | anchor | ||
| 18 | title | ||
| 19 | url | ||
| 20 | whole document | ||
| 21 | sum of term frequency | body | |
| 22 | anchor | ||
| 23 | title | ||
| 24 | url | ||
| 25 | whole document | ||
| 26 | min of term frequency | body | |
| 27 | anchor | ||
| 28 | title | ||
| 29 | url | ||
| 30 | whole document | ||
| 31 | max of term frequency | body | |
| 32 | anchor | ||
| 33 | title | ||
| 34 | url | ||
| 35 | whole document | ||
| 36 | mean of term frequency | body | |
| 37 | anchor | ||
| 38 | title | ||
| 39 | url | ||
| 40 | whole document | ||
| 41 | variance of term frequency | body | |
| 42 | anchor | ||
| 43 | title | ||
| 44 | url | ||
| 45 | whole document | ||
| 46 | sum of stream length normalized term frequency | body | |
| 47 | anchor | ||
| 48 | title | ||
| 49 | url | ||
| 50 | whole document | ||
| 51 | min of stream length normalized term frequency | body | |
| 52 | anchor | ||
| 53 | title | ||
| 54 | url | ||
| 55 | whole document | ||
| 56 | max of stream length normalized term frequency | body | |
| 57 | anchor | ||
| 58 | title | ||
| 59 | url | ||
| 60 | whole document | ||
| 61 | mean of stream length normalized term frequency | body | |
| 62 | anchor | ||
| 63 | title | ||
| 64 | url | ||
| 65 | whole document | ||
| 66 | variance of stream length normalized term frequency | body | |
| 67 | anchor | ||
| 68 | title | ||
| 69 | url | ||
| 70 | whole document | ||
| 71 | sum of tf*idf | body | |
| 72 | anchor | ||
| 73 | title | ||
| 74 | url | ||
| 75 | whole document | ||
| 76 | min of tf*idf | body | |
| 77 | anchor | ||
| 78 | title | ||
| 79 | url | ||
| 80 | whole document | ||
| 81 | max of tf*idf | body | |
| 82 | anchor | ||
| 83 | title | ||
| 84 | url | ||
| 85 | whole document | ||
| 86 | mean of tf*idf | body | |
| 87 | anchor | ||
| 88 | title | ||
| 89 | url | ||
| 90 | whole document | ||
| 91 | variance of tf*idf | body | |
| 92 | anchor | ||
| 93 | title | ||
| 94 | url | ||
| 95 | whole document | ||
| 96 | boolean model | body | |
| 97 | anchor | ||
| 98 | title | ||
| 99 | url | ||
| 100 | whole document | ||
| 101 | vector space model | body | |
| 102 | anchor | ||
| 103 | title | ||
| 104 | url | ||
| 105 | whole document | ||
| 106 | BM25 | body | |
| 107 | anchor | ||
| 108 | title | ||
| 109 | url | ||
| 110 | whole document | ||
| 111 | LMIR.ABS | body | Language model approach for information retrieval (IR) with absolute discounting smoothing |
| 112 | anchor | ||
| 113 | title | ||
| 114 | url | ||
| 115 | whole document | ||
| 116 | LMIR.DIR | body | Language model approach for IR with Bayesian smoothing using Dirichlet priors |
| 117 | anchor | ||
| 118 | title | ||
| 119 | url | ||
| 120 | whole document | ||
| 121 | LMIR.JM | body | Language model approach for IR with Jelinek-Mercer smoothing |
| 122 | anchor | ||
| 123 | title | ||
| 124 | url | ||
| 125 | whole document | ||
| 126 | Number of slash in URL | ||
| 127 | Length of URL | ||
| 128 | Inlink number | ||
| 129 | Outlink number | ||
| 130 | PageRank | ||
| 131 | SiteRank | Site level PageRank | |
| 132 | QualityScore | The quality score of a web page. The score is outputted by a web page quality classifier. | |
| 133 | QualityScore2 | The quality score of a web page. The score is outputted by a web page quality classifier, which measures the badness of a web page. | |
| 134 | Query-url click count | The click count of a query-url pair at a search engine in a period | |
| 135 | url click count | The click count of a url aggregated from user browsing data in a period | |
| 136 | url dwell time | The average dwell time of a url aggregated from user browsing data in a period | |
