The long term goal of the Situated Interaction project is to enable a new generation of interactive systems that embed interaction and computation deeply into the natural flow of everyday tasks, activities and collaborations. Example scenarios include human-robot interaction, e-home, interactive billboards, systems that monitor, assist and coordinate teams of experts through complex tasks and procedures, etc.

core research areas
Some of the core research areas under investigation are:
 -  situational awareness (e.g. conversational scene analysis, multimodal sensing and fusion, etc.);
 -  engagement, attention, floor in multi-participant interaction;
 -  mixed-initiative, multi-participant interaction and dialog control;
 -  situated grounding, robustness and error handling;
 -  life-long learning and adaptation;
 -  spatio-temporal reasoning, behavior and intention recognition;
 -  high-resolution, coordinated behavioral models for embodied agents;


- Wang, W., Bohus, D., Kamar, E., and Horvitz, E. (2012) - Crowdsourcing the Acquisition of Natural Language Corpora: Methods and Observations, to appear in SLT'2012, Miami, USA [abs]
- Rosenthal, S., Bohus, D., Horvitz, E. (2012) - Value of Information with Streaming Evidence, Microsoft Research Technical Report, MSR-TR-2012-99 [abs]
- Vinyals, O., Bohus, D., Caruana, R. (2012) - Learning Speaker, Addressee and Overlap Detection Models from Multimodal Streams, to appear in ICMI'2012, Santa Monica, USA [abs]
- Bohus, D., Kamar, E., Horvitz, E. (2012) - Towards Situated Collaboration, in NAACL Workshop on Future Directions and Challenges in Spoken Dialog Systems: Tools and Data, Montreal, CA, 2012 [abs]
- Bohus, D., Horvitz, E. (2011) - Decisions about Turns in Multiparty Conversation: From Perception to Action, in ICMI-2011, Alicante, Spain [abs]
- Bohus, D., Horvitz, E., (2011) - Multiparty Turn Taking in Situated Dialog: Study, Lessons and Directions, in SIGdial'2011 [abs] [Supplemental materials and videos]
- Bohus, D., Horvitz, E. (2010) - On the Challenges and Opportunities of Physically Situated Dialog, in AAAI Fall Symposium on Dialog with Robots, Arlington, VA [abs]
- Bohus, D., Horvitz, E. (2010) - Facilitating Multiparty Dialog with Gaze, Gesture and Speech, in ICMI'10, Beijing, China [abs] [Supplemental materials and videos]
- Bohus, D., Horvitz, E., (2010) - Computational Models for Multiparty Turn-Taking, Microsoft Technical Report MSR-TR-2010-115 [abs] [Supplemental materials and videos]
- Bohus, D., Horvitz, E. (2009) - Dialog in the Open World: Platform and Applications, in Proceedings of ICMI'09, Boston, MA [abs] [Receptionist video] [Questions game video] | ICMI'09 outstanding paper award
- Bohus, D., Horvitz, E. (2009) - Learning to Predict Engagement with a Spoken Dialog System in Open-World Settings, in Proceedings of SIGdial'09, London, UK [abs] [note]
- Bohus, D., Horvitz, E. (2009) - Models for Multiparty Engagement in Open-World Dialog, in Proceedings of SIGdial'09, London, UK [abs] [video] | SIGdial'09 best paper award
- Bohus, D., Horvitz, E. (2009) - Open-World Dialog: Challenges, Directions and a Prototype, in Proceedings of IJCAI Workshop on Knowledge and Reasoning in Practical Dialog Systems, Pasadena, CA [abs] [video]

Here is a project overview video:
Situated interaction: project overview
Here are a couple of videos illustrating multiparty engagement and interaction in the receptionist and trivia game domains:
Receptionist multiparty engagement: illustrates reasoning about engagement and interleaved conversations in the Receptionist domain. Notice the inferences about engagement state, actions, intentions, as well as high level goals and activities. The red dot shows the system's gaze direction.
Questions game multiparty engagement: illustrates how a situated conversational agent can initiate engagement with bystanders to solicit help and attract them into playing a multiparticipant questions game. Notice the inferences about engagement state, actions, intentions, as well as high level goals and activities. The red dot shows the system's gaze direction.

Here is a set of videos illustrating the initial Receptionist prototype:
Basic interaction: illustrates a basic single-participant interaction with the system. Notice the various layers of scene analysis (system tracks user's face and pose, infers information about clothing, affiliation, task goals, etc.) and the natural engagement model (system engages as the user approaches)
Scene inferences and grounding: systems infers user goals from scene analysis (user is dressed formally, hence most likely external, hence probably wants registration), but grounds this information through dialog. Notice also the grounding of the building number.
Attention modeling and engagement: systems monitors the user's attention (using information from the face detector and pose tracker) and engages the user accordingly.
Handling people waiting in line: system monitors multiple users in the scene and acknowledges the presence of a waiting user with a quick glance (red dot shows system's gaze) and by engaging them temporarily towards the end of the conversation
Re-engagement: same as above, only that when system turns back the initial user is no longer paying attention. Knowing that a person is waiting in line, the system draws the user's attention and re-engages by saying "Excuse me!"
Multi-participant dialog: system infers from the scene (and confirms through dialog) that the two participants are in a group together. System then carries on a multi-participant conversation. Notice the gaze model (red dot) that is information by who is the speaking participant and also certain elements in the discourse structure.
Multi-participant dialog with side conversation: similar to the previous interaction; at the end the users engage in a side conversation. The system understands that the utterances are not addressed to it and, after a while, interrupts the two users to convey the shuttle information. Notice also the touch-screen interaction that is used as a fallback for cases when speech recognition fails.
Multi-participant dialog with a third person waiting: that also illustrates how the system handles a waiting participant while interacting with a group of two users.