[month] [year]

Rahul Sajnani – Reconstruction and canonicalization of objects

Rahul Sajnani received his MS  Dual Degree in  Electronics and Communication Engineering (ECE). His research work was supervised by Prof. Madhava Krishna. Here’s a summary of his research work on Weakly-supervised dense reconstruction and canonicalization of objects:

To execute meaningful real-world tasks, robots must perceive and understand the objects that surround them. Manipulation, autonomous driving, and human-robot interaction are just a few examples where robots must reason about previously unseen instances of objects.
We present DRACO, a method for Dense Reconstruction And Canonicalization of Object shape from one or more RGB images. Canonical shape reconstruction estimating 3D object shape in a coordinate space canonicalized for scale, rotation, and translation parameters—is an emerging paradigm that holds
promise for a multitude of robotic applications. Prior approaches either rely on painstakingly gathered dense 3D supervision, or produce only sparse canonical representations, limiting real-world applicability. DRACO performs dense canonicalization using only weak supervision in the form of camera poses and semantic keypoints at train time. During inference, DRACO predicts dense object-centric depth maps in a canonical coordinate-space, solely using one or more RGB images of an object. Extensive experiments on canonical shape reconstruction and pose estimation show that DRACO
is competitive or superior to fully-supervised methods.
This thesis delves into understanding objects to capture their pose and shape from a single image. Canonicalization is the operation of mapping an input representation to an invariant representation with respect to an operation. This input representation can be a point cloud of an object. For the scope of this thesis, we fix this operation to be a 3D transformation. Lets take an example of an object, say a car. By finding a canonical frame for this car, we have its 3D reconstruction in a frame that is invariant to any
3D transformation. This is important for tasks such as registration, feature propagation and annotation. Using this invariant 3D pose we can register different poses of the car to a single consistent frame. This single consistent frame across multiple cars is called the canonical frame. Humans have an inherent understanding of this canonical frame. For instance, we know that the headlights of the car are towards its forward direction. This forward direction is with respect to our mental canonical frame and this allows us to estimate the pose of the car with respect to our view.
canonicalization – the process of mapping an object instances to a category level container – has emerged as a useful tool for category-level understanding. We can extend datasets, automatically annotate for parts, register and manipulate, if we can map every object to the canonical orientation. Our
method, DRACO, estimates a canonical frame by predicting NOCS maps and reconstructs the object of interest in using a monocular image. Previous methods use pain-stakingly collected datasets to learn a dense canonical frame or weakly-supervise canonicalization for sparse keypoints. Our method uses weak-supervision in the form of semantic keypoints and camera relative motion to learn an object’s reconstruction and canonical frame which performs on par or better against fully-supervised methods. This enables estimating canonical frame for in-the-wild objects that enables canonicalizing them. Extensive experiments on our new large-scale mixed reality dataset (20K images/category) demonstrate on par shape reconstruction and canonicalization quality compared to state-of-the-art fully-supervised
approaches, and weakly-supervised baselines that strictly use more information. This thesis introduces concepts of single-view depth estimation for reconstructing objects using a monocular image and delves into canonicalization in a weakly-supervised manner