Life science research tools

Tools that help advance research in human health issues

Biocharts toolWe collaborate with academic researchers around the world to develop innovative computing technologies and advance research in human health issues. Our collaboration projects apply advanced computing technologies—such as data analysis, imaging, sensor networks, and visualization—to provide insight into disease and human healthcare.

  • .NET Bio
    .NET Bio is an open-source library of common bioinformatics functions, intended to simplify the creation of life science applications.
  • 3D Molecule Viewer
    3D Molecule Viewer is a stand-alone, demo version of the C-ME application that InterKnowlogy built for the Scripps Research Institute. It is a Windows Presentation Foundation application built in C#. This stand-alone, source code version of the application does not have the Microsoft SharePoint dependency and allows you to open sample 3D Protein Database Format (PDB) files directly, spin them in 3D, zoom in on them, display them from different views, and more.
  • Biocharts
    The Biocharts tool enables biologists and modelers to construct high-level theories and models of biological systems, capturing biological hypotheses, inferred mechanisms, and experimental results within the same framework. Among the key features of the tool are convenient ways to represent several competing theories and the interactive nature of building and running the models using an intuitive, rigorous, scenario-based visual language. The main biological areas we are applying the tool are in developmental biology and stem-cell research.
  • BL!P: BLAST in Pivot
    BL!P [blip], also known as BLAST in Pivot, is a tool that automates NCBI BLAST searches, fetches associated GenBank records, and converts this information into a Silverlight PivotViewer collection. Also, BL!P provides a user interface to create customized images for each BLAST match, allowing the user to further customize their data exploration experience.
  • CodaLab
    Coda lab is an open-source platform that uses the cloud to share medical datasets and algorithms. It provides an environment that enables researchers to compare the accurate of image analysis algorithms against common datasets, and it is rapidly expanding to support algorithm development, workflow, and a flexible online environment for research.
  • Create Epitome
    This computational biology tool is an interactive program (created via Silverlight) and a command-line program for Windows. Given a weighted set of amino-acid sequences, it creates a new amino-acid sequence that covers input sequences. It works step-by-step, producing an output line after each step.
  • Disease Model Simulator
    This tool enables a user to investigate the theoretical disease model used in the paper “Host-pathogen time series data in wildlife support a transmission function between density and frequency dependence” by Matthew J. Smith, Sandra Telfer, Eva R. Kallio, Sarah Burthe, Alex R. Cook, Xavier Lambin, and Michael Begon, to be published in “Proceedings of the National Academy of Sciences of the United States of America.”
  • DNA Strand Displacement Simulator
    DNA Strand Displacement (DSD) is a programming language for designing and simulating computational circuits made of DNA, in which strand displacement is the main computational mechanism.
  • eLMM
    eLMM (eliminate confounding in eQTL studies with Linear Mixed Models) is a program for performing eQTL analysis in the presence of two confounders: population structure and expression heterogeneity.
  • Epistasis GWAS for 7 common diseases
    This data consists of the results of an SNP-pair epistasis genome-wide association study (GWAS) on the Wellcome Trust data. P values are based on a likelihood ratio test comparing the likelihood with a multiplicative term versus that for an additive linear model. 63.5 billion SNP pairs were evaluated for seven common diseases (type I diabetes, type II diabetes, coronary artery disease, hypertension, rheumatoid arthritis, Crohn’s disease, and bipolar disorder).
  • Epitope Predictor
    This tool computes the probability that a given kmer is a T-cell epitope restricted to a given HLA allele. The tool can scan for 8, 9, 10, and 11mer epitopes and over all common HLA alleles.
  • False Discovery Rate Calculator for 2x2 Contingency Tables
    False discovery rate (FDR) estimates the proportion of false positives among those tests that are deemed significant. This tool computes FDR for 2x2 contingency tables based on Fisher's exact test (FET).
  • FaST-LMM - Factored Spectrally Transformed Linear Mixed Models - Windows Binary Files
    Download Windows Binary files Only - FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a program for performing genome-wide association studies (GWAS) on large data sets. It runs on both Windows and Linux system, and has been tested on data sets with over 120,000 individuals.
  • FaST-LMM - Factored Spectrally Transformed Linear Mixed Models - Linux Binary Files
    Download Linux Binary files Only - FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a program for performing genome-wide association studies (GWAS) on large data sets. It runs on both Windows and Linux system, and has been tested on data sets with over 120,000 individuals.
  • FaST-LMM - Factored Spectrally Transformed Linear Mixed Model
    FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a program for performing genome-wide association studies (GWAS) on large data sets. It runs on both Windows and Linux system, and has been tested on data sets with over 120,000 individuals.
  • FaSTLMM.Py
    FaST-LMM-Py extends the capabilities of FaST-LMM using Python. These features include capabilities such as FaST-LMM-SELECT which selects SNPs for FaST-LMM, FaST-LMM-SET which is a new approach for set tests that can handle confounding, and FaST-LMM-EWASher which performs epigenome-wide association analysis in the presence of confounders such as cell-type heterogeneity.
  • FaST-LMM-EWASher - R version
    This is an R version of FaST-LMM-EWASher, which performs epigenome-wide association analysis in the presence of confounders such as cell-type heterogeneity. A python version of this software is also available as part of Fast-LMM-Py. This software is associated with the paper: Epigenome-wide association studies without the need for cell-type composition. Nature Methods 2014, doi:10.1038/NMETH.2815.
  • GeoS
    GeoS is a software tool for the semi-automatic segmentation of 3D medical images such as CT or MR scans.
  • HLA Assignment
    This tool takes lab data from a series of patients and determines (probabilistically) which HLA genes are responsible for a patient’s reaction.
  • HLA Completion
    HLA sequence typing sometimes yields uncertain results. For example, an allele may be identified as A6801/6802 or simply A02. This tool takes the uncertain information, and (probabilistically) expands it to four digit alleles, making use of linkage disequilibrium to inform the expansion.
  • PhyloDet
    PhyloDet is a scalable tree visualization that enables biologists to visualize multiple traits or attributes mapped to large evolutionary trees with thousands of leaf nodes representing the species. It also preserves branch lengths that indicate the genetic similarity between two species. PhyloDet is interactive, enabling users to set any node of the tree as the root and to show or hide any branches without re-layout of the tree. PhyloDet enables biologists to visualize complex interactions and how they relate to the evolutionary history of the species.
  • PhyloDOR
    By applying this tool to large studies of infected patients, researchers are now able to start decoding the complex rules that govern the HIV mutations, in the hope of one day creating a vaccine to which the virus is unable to develop resistance.
  • SIGMA: Large-Scale and Parallel Machine-Learning Tool Kit
    The goal of SIGMA is to provide a group of parallel machine-learning algorithms that can meet the requirements of research work and applications typically with large-scale data or features. The tool kit includes more than 10 algorithms and it makes them run on single multicore machine or on a HPC cluster with hundreds of machines and thousands of CPU cores running. Release history: version 1.0: 2009/Oct/12: a basic version with algorithms. version 1.1: 2010/Feb/01: add up to ten algorithms and pass our internal testing. Contactor: Weizhu Chen (http://research.microsoft.com/en-us/people/wzchen/ )
  • SPiM Player
    A stochastic pi-calculus simulator with graphical user interface. Binary (.exe) release only. Written in F#.
  • Synthesizing Biological Theories
    This tool enables biologists and modelers to construct high-level theories and models of biological systems, capturing biological hypotheses, inferred mechanisms, and experimental results within the same framework. The main components of the tool are a visual editor for constructing the model and to visualize the dynamic behavior, and an execution and analysis engine that can run the models in batch mode, interactive mode, or deterministic versus nondeterministic modes. During runtime, the execution engine communicates with the user-interface layer to enable the visualization of the system dynamics and to support user interaction with the model. Among the key features of the tool are an ability to represent a theory about a biological system, including biological hypotheses and mechanisms, and experimental results in the same framework; convenient ways to represent several competing theories with the aim of refuting some of the theories or trying to suggest experiments that can refute them; and the interactive nature of building and running the models using an intuitive, rigorous scenario-based visual language.
  • Visual GEC
    Visual GEC is a tool for the design and simulation of transcriptional genetic circuits, or devices. The tool is based on Genetic Engineering of Living Cells (GEC), a programming language that allows logical interactions between potentially undetermined proteins and genes to be expressed in a modular manner. Programs can be translated by a compiler into sequences of standard biological parts.