ORCAS: Open Resource for Click Analysis in Search
ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries.
An index of datasets, SDKs, APIs and other open source code created by Microsoft researchers and shared with the broader academic community. We also maintain a collection highlighting some of the tools you’ll find here.
ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries.
Tip-of-the-tongue (ToT) known-item retrieval is defined as “an item identification task in which the searcher has previously experienced an item but cannot recall a reliable identifier” (i.e., “It’s on the tip of my tongue…”). The…
The TREC Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the…
GitHub Publication Publication Publication Publication Publication Publication Publication Publication Publication
The Phi-3-Mini-128K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense…
This framework aims to assist in the documentation of datasets to promote transparency and help dataset creators and consumers make informed decisions. You can read more about it in our paper:Â Open Datasheets: Machine-readable Documentation for…
Assistive software applications have been developed for a variety of day-to-day tasks, including currency recognition. BankNote-Net is an open dataset for assistive currency recognition.
Code supplementing AI for Good Lab’s work on long COVID sequelae using EHR data. This project aimed to discover long COVID sequelae (symptoms) from Electronic Health Records (EHR) data using causal impact analysis of time series…
This repository contains code for an active learning pipeline for detecting whales in high-resolution satellite imagery.
Mapping electrical infrastructure can support measuring access to electricity and can help identify opportunities to improve or extend existing power infrastructure.
This project automates the identification of buildings and solar panels in aerial imagery, aiding humanitarian mapping. In many developing regions, maps are often outdated or missing, hindering development planning and disaster response. Our work accelerates…