An index of datasets, SDKs, APIs and other open source code created by Microsoft researchers and shared with the broader academic community. We also maintain a collection highlighting some of the tools you’ll find here.
MarS
MarS is a cutting-edge financial market simulation engine powered by the Large Market Model (LMM), a generative foundation model.
Reducio Variational Autoencoder (Reducio-VAE)
Reducio-VAE is a model for encoding videos into an extremely small latent space. It is part of the Reducio-DiT, which is a highly efficient video generation method. Reducio-VAE encodes a 16-frame video clip to T/4∗H/32∗W/32…
TamGen
This is the implementation of the paper “TamGen: Target-aware Molecule Generation for Drug Design Using a Chemical Language Model”.
RAD-DINO model
RAD-DINO is a vision transformer model trained to encode chest X-rays using the self-supervised learning method DINOv2. RAD-DINO is described in detail in RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision (F. Pérez-García, H. Sharma, S.…
MAIRA-2 model
MAIRA-2 is a multimodal transformer designed for the generation of grounded or non-grounded radiology reports from chest X-rays. It is described in more detail in MAIRA-2: Grounded Radiology Report Generation (S. Bannur, K. Bouzid et al.,…
RadFact: An LLM-based Evaluation Metric for AI-generated Radiology Reporting
RadFact is a framework for the evaluation of model-generated radiology reports given a ground-truth report, with or without grounding. Leveraging the logical inference capabilities of large language models, RadFact is not a single number but a suite of…