Joel J. Pfeiffer, Jennifer Neville, and Paul N. Bennett
In network classification, a typical assumption is knowledge of all edges when computing the joint distribution of the instances in the network. That is, for an instance in the network, the neighbors of the instance and their attributes are known. Such settings include social networks such as Facebook where a person's friends are known, allowing for prediction of an attribute of the person given the description of their friends. However, in other domains, relationship information may not be available for all nodes in the network due to privacy or legal restrictions or because a cost is associated with determining the connections of a node. For example, it is unreasonable to expect to be able to access the phone records of the entire population when attempting to identify a handful of individuals involved in illegal or fraudulent activities.
We refer to this problem domain as Active Sampling, a domain where instances' labels and edges are acquired through an iterative process in order to identify a handful of instances in a network. In this work, we develop this problem domain formally and present methods estimating the probability of an instance being positively labeled using only the previously acquired samples. Furthermore, we extend our methods to allow for collective inference and learned priors and demonstrate the robustness of the techniques on two synthetic and two real-world datasets.
In Proceedings of the ICML 2012 Workshop on Mining and Learning with Graphs