HuMoUR: Markerless 3D Human Motion Understanding for Adaptive Robot Behavior


The goal of HuMoUR is to develop novel computer vision tools to estimate and understand human motion using a simple camera and use this information as a demonstration to teach a general purpose robotic assistant to perform new complex manipulation tasks. In Robotics, this learning paradigm is referred to as Learning from Demonstration: a non-expert teacher repetitively executes a task so the robot can learn the steps and variability of the actions. Typical setups take place in controlled laboratory facilities and consist of a manipulator arm teleoperated by the user through a haptic device. In order to bring this technology to the next stage of development and out of the laboratory, we believe it is paramount to contribute both on the sensing and action fronts of the problem. HuMoUR will advance both these fields.

Specifically, on the sensing side we will: (1) research novel markerless methodologies to capture 3D human pose and motion from monocular cameras. We will leverage on current Deep Learning (DL) strategies to make these algorithms view-point invariant and reliable on images acquired in-the-wild. One important aspect that will be investigated is that of integrating geometric priors within the DL formulations in order to simultaneously exploit physical models and statistical evidence of the data; (2) Additionally, we will explore the use of Convolutional and Recurrent Networks to design new motion prediction algorithms able to infer the future position of the human body. (3) And finally, pose and motion estimates will be exploited to devise new strategies for 3D human action recognition.

The outcomes of the sensing modules will be the primary key to endow service robots with new features and learning possibilities. On this regard, we aim to (4) adapt existing reinforcement learning algorithms such that they can be carried out at end-users home with demonstrations recorded by one single camera; (5) propose new planning strategies to account for robot adaptation to user requirements, and to contacts between the robot and the environment/people; and (6) implement new protocols that ensure human safety in the tasks where there is a close interaction with the robot. 3D human motion prediction algorithms will play an essential role to deploy such protocols.

We plan to demonstrate our developments on three main scenarios: (a) Feeding a person and (b) brushing a persons hair, where stable spoon/brush trajectories need to be adapted to the pose of the head. (c) Help dressing a person, where the perception algorithms will need to tackle strong body occlusions caused by clothes and the robot will need to rapidly react to sudden changes of the persons pose. The main objectives we pursue are commercially and socially relevant robotics technologies, as endorsed by our EPOs. In particular, the project responds the demand for new technologies for the assistance of elderly and disabled people, one of the main pillars of EU H2020 plan.

  • A. Agudo and F. Moreno-Noguer. A scalable, efficient, and accurate solution to non-rigid structure from motion. Computer Vision and Image Understanding, 2018, to appear.

