Most recent approaches to 3D pose estimation from RGB-D images address the problem in a two-stage pipeline. First, they learn a classifier –typically a random forest– to predict the position of each input pixel on the object surface. These estimates are then used to define an energy function that is minimized w.r.t. the object pose. In this paper, we focus on the first stage of the problem and propose a novel classifier based on a depth-aware Convolutional Neural Network. This classifier is able to learn a scale-adaptive regression model that yields very accurate pixel-level predictions, allowing to finally estimate the pose using a simple RANSAC-based scheme, with no need to optimize complex ad hoc energy functions. Our experiments on publicly available datasets show that our approach achieves remarkable improvements over state-of-the-art methods.


computer vision.

Author keywords

Deep Learning

Scientific reference

L. Porzi, A. Penate-Sanchez, E. Ricci and F. Moreno-Noguer. Depth-aware convolutional neural networks for accurate 3D pose estimation in RGB-D images, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017, Vancouver, Canada, pp. 5777-5783.