Query-based Graph Cuboid Outlier Detection

Published by ACM - Association for Computing Machinery

Publication

Various projections or views of a heterogeneous information network can be modeled using the graph OLAP (On-line Analytical Processing) framework for effective decision making. Detecting anomalous projections of the network can help the analysts identify regions of interest from the graph specific to the projection attribute. While most previous studies on outlier detection in graphs deal with outlier nodes, edges or subgraphs, we are the first to propose detection of graph cuboid outliers. Further we perform this detection in a query sensitive way. Given a general subgraph query on a heterogeneous network, we study the problem of finding outlier cuboids from the graph OLAP lattice. A Graph Cuboid Outlier (GCOutlier) is a cuboid with exceptionally high density of matches for the query. The GCOutlier detection task is clearly challenging because: (1) finding matches for the query (subgraph isomorphism) is NPhard; (2) number of matches for the query can be very high; and (3) number of cuboids can be large. We provide an approximate solution to the problem by computing only a fraction of the total matches originating from a select set of candidate nodes and including a select set of edges, chosen smartly. We perform extensive experiments on synthetic datasets to showcase the execution time versus accuracy trade-off. Experiments on real datasets like Four Area and Delicious containing thousands of nodes reveal interesting GCOutliers.