Share this page
  • Share this page on Twitter Share this page on Facebook Share this page on Digg Share this page on Del.icio.us Read the Inside Microsoft Research blog
  • E-mail this page Print this page
  • RSS feeds
Home > Projects > Trinity
Trinity

Overview | ToolsApplications | Documents | Release

 

Graph is an abstract data structure that has high expressive power. Many real-life applications can be modeled by graphs, including biological networks, semantic web and social networks. Thus, a graph engine is important to many applications. Currently, there are several players in this field, including Neo4j, HyperGraphDB, InfiniteGraph, etc. Neo4j is a disk-based transactional graph database. HyperGraphDB is based on key/value pair store Berkeley DB. InfiniteGraph is a distributed system for large graph data analysis. In 2009, Google announced Pregel as its large scale graph processing platform. Pregel is a batch system, and it does not support online query processing or graph serving.

What is Trinity?

Trinity is a graph database and graph computation platform over distributed memory cloud. At the heart of Trinity is a distributed RAM-based key-value store. As an all-in-memory key-value store, Trinity provides fast random data access. This feature naturally makes Trinity suitable for large graph processing. Trinity is a graph database from the perspective of data management. It is a parallel graph computation platform from the perspective of graph analytics. As a database, it provides features such as data indexing, concurrent query processing, concurrency control. As a computation platform, it provides vertex-based parallel graph computation on large scale graphs.

There are many real-life applications handling billion node graphs, e.g. facebook social graph has over 800 million nodes and 104 billion edges, world wide web contains over 50 billion web pages and one trillion unique links. Data access on graphs has no locality, graph exploration incurs massive random data access. This makes large graph processing very challenging. To address this issue, Trinity makes graphs resident in a distributed memory storage. Trinity optimize the use of main memory and communication to deliver the best performance for both online query processing and offline graph analytics.

Features of Trinity 

  • Memory based distributed key-value store: Provides fast random data access.
  • Graph database.
  • Graph computation platform. 
  • Flexible data model: A wide range of graph models are supported by Trinity, such as simple graph, weighted simple graph, hypergraph and extended hypergraph.
  • High Performance: suitable for low-latency, high-throughput graph applications.

Peformance Highlight

  • Graph DB (online query processing):

–visiting 2.2 million users (3 hop neighborhood) on Facebook scale graphs: <= 100ms

–foundation for graph-based service, e.g., entity search

  • Computing platform (offline graph analytics):

–one iteration on a 1 billion node graph: <= 60sec

–foundation for analytics, e.g., social analytics

Architecture Overview

Trinity is built on top of a distributed memory storage layer called memory cloud. Trinity organizes the memory of multiple machines into a globally addressable, distributed memory address space to support large graphs. Above memory cloud is a distributed key-value store layer, which provides key-value store interfaces to the distributed memory storage. A set of utility tools are provided by Trinity, such as fast billion node graph generator, versatile Trinity Shell and management tools. Based on these modules, graph database modules (e.g. SPARQL query module and subgraph match module) and computation platform modules are built.

 

 

 

 

Project Contact

Bin Shao (binshao @ microsoft . com)
Haixun Wang (haixunw @ microsoft . com)