Unsupervised machine learning aims to make predictions when labeled data is absent, and thus, supervised machine learning cannot be applied. These algorithms build on assumptions about how data and predictions relate to each other. One technique for unsupervised problem settings are generative models, which specify the set of assumptions as a probabilistic process that generates the data.
The subject of this thesis is how to most effectively exploit input data that has an underlying graph structure in unsupervised learning for three important use cases. The first use case deals with localizing defective code regions in software, given the execution graph of code lines and transitions. Citation networks are exploited in the next use case to quantify the influence of citations on the content of the citing publication. In the final use case, shared tastes of friends in a social network are identified, enabling the prediction of items from a user a particular friend of his would be interested in.
For each use case, prediction performance is evaluated via held-out test data that is only scarcely available in the domain. This comparison quantifies under which circumstances each generative model best exploits the given graph structure.
|Institution||Max-Planck-Institut für Informatik|