Dr. Ravi Kiran Sarvadevabhatla and his student Pranav Gupta presented a paper on FLOAT: Factorized Learning of Object Attributes for Improved Multi-object Multi-part Scene Parsing at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR-2022) from 19 – 24 June in New Orleans, Louisiana. The authors of this paper are Rishubh Singh, Pradeep Shenoy from Google Research.
Research work as explained by the authors: Multi-object multi-part scene parsing is a challenging task which requires detecting multiple object classes in a scene and segmenting the semantic parts within each object. In this paper, we propose FLOAT, a factorised label space framework for scalable multi-object multi-part parsing. Our framework involves independent dense prediction of object category and part attributes which increases scalability and reduces task complexity compared to the monolithic label space counterpart. In addition, we propose an inference-time ‘zoom’ refinement technique which significantly improves segmentation quality, especially for smaller objects/parts. Compared to state of the art, FLOAT obtains an absolute improvement of 2.0% for mean IOU (mIOU) and 4.8% for segmentation quality IOU (sqIOU) on the Pascal-Part-58 dataset. For the larger Pascal-Part-108 dataset, the improvements are 2.1% for mIOU and 3.9% for sqIOU. We incorporate previously excluded part attributes
The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) is the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. With its high quality and low cost, it provides an exceptional value for students, academics and industry researchers.
Conference page: (https://cvpr2022.thecvf.com/)
Project page: https://floatseg.github.io/