close-up image of interlocking gears turning with a rainbow gradient overlay

Researcher tools: code, datasets, & models

An index of datasets, SDKs, APIs and other open source code created by Microsoft researchers and shared with the broader academic community. We also maintain a collection highlighting some of the tools you’ll find here.

Showing 1 - 10 of 1090 results

Dataset Source Code

MatterGen 

MatterGen is a generative model for inorganic materials design across the periodic table that can be fine-tuned to steer the generation towards a wide range of property constraints.

Dataset Source Code

HeurAgenix 

HeurAgenix is a novel framework based on LLM, designed to generate, evolve, evaluate, and select heuristic algorithms for solving combinatorial optimization problems. It leverages the power of large language models to autonomously handle various optimization…

Dataset Source Code

MarS 

MarS is a cutting-edge financial market simulation engine powered by the Large Market Model (LMM), a generative foundation model.

Dataset Source Code

MageBench 

MageBench is a benchmark for evaluating the reasoning and planning ability of large multimodal model agents. This benchmark currently includes three types of environments: WebUI, Sokoban, and Football, comprising a total of 483 different scenarios.…

Dataset Source Code

TamGen 

This is the implementation of the paper “TamGen: Target-aware Molecule Generation for Drug Design Using a Chemical Language Model”.

Download

RAD-DINO model 

RAD-DINO is a vision transformer model trained to encode chest X-rays using the self-supervised learning method DINOv2. RAD-DINO is described in detail in RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision (F. Pérez-García, H. Sharma, S.…