People Search in a Social Network
How to perform search efficiently in a web-scale social network is a challenging task. Consider the screenshot below. While logged in Facebook, a user performs a search in Bing. Bing checks the user's Facebook network to see if there is anything relevant.
Assuming on average, a person has 130 friends in Facebook. Searching within 2 hop means accessing 130+1302 nodes, and searching within 3 hop means accessing 130+1302+1303 nodes. The search must be conducted very efficiently.
In this demo, we simulate a Facebook social network where each person has 130 friends. The data is deployed on 8 machines.
The following screenshot shows that 2-hop query can be completed within 10 ms.
The following screenshot shows that 3-hop query can be completed within 100 ms.
For billion node graphs, the state-of-the-art approaches take months to build indices for subgraph matching. Unlike previous subgraph matching approaches that rely on sophisticated indices, Trinity perform subgraph matching queries without using any structural indices. With efficient graph exploration and massive parallel computing, the average response time for billion node graphs is under 500 ms.
For a 0.8 billion RDF graph, Trinity-RDF store can answer SPARQL queries within 50 ms. In contrast, the state-of-the-art approach's average response time is 255ms.
Trinity supports vertex-based computation under the BSP model as in Pregel. A computation task is expressed in multiple iterative super-steps and each vertex acts as an independent agent. During each super-step, each agent performs some computation and communication, independent of each other. It then waits for all agents to finish their computation and communication before the next super-step begins.
Trinity can partition billion node graphs within several hours. In contrast, the known best graph partitioning approach can only partition graphs with 22M nodes.
With sketch index, Trinity can answer shortest distance/path queries within 100 ms for billion node graphs.
Trinity is the infrastructure of Probase, a large-scale knowledgebase automatically acquired from the web. Probase has millions of nodes (representing concepts) and edges (represent relationships). Hypergraphs are more appropriate than simple graphs for modeling knowledge. Trinity is used for: 1) taxonomy building; 2) data integration (e.g. adding Freebase data into Probase); 3) querying Probase.
Microsoft Bing’s AEther project uses Trinity for managing AEther’s experimental data, which consists of large number of workflows, and the evolutions among the workflows.