attention recognition: from contextual analysis of head poses to 3D gaze tracking using remote RGB-D sensors

Jean-Marc Odobez,  IDIAP, Switzerland

Gaze (and its discrete version the visual focus of attention, VFOA) is acknowledged as
one of the most important non-verbal cues in human communication. However, its automatic estimation is
a highly challenging problem, in particular, when large user mobility is expected and minimal intrusion
is required. In this talk, I will discuss the main challenges associated to this task and how we have addressed them.
I will first describe how we addressed VFOA recognition in meetings using Dynamical Bayesian Networks to
jointly model speech conversation, gaze (represented by head pose), and task context.

In a second part, i will present recent techniques investigated to perform 3D gaze tracking from RGB-D
(color and depth) cameras like the Kinect that can represent an alternative to costly and/or intrusive
systems currently available. The methods will be illustrated using several examples from human-robot
or human-human interaction analysis like automatic gaze coding of natural dyadic interactions.