Activity recognition in videos is a key task in video content extraction; it is needed for applications such as monitoring and alerts, content-based indexing and human-computer interaction. There are several alternative approaches to this task. One approach is to compute local spatio-temporal features and then use their global distribution for classification. In this work, we take a more structural approach where an activity is defined by a sequence of states where each state characterizes the actor’s body pose and relations to objects of interest (and to other actors). This talk will describe our recent work in activity recognition using the structural approach, including the task of learning the models.