COCONut: Modernizing COCO Segmentation

In recent decades, the vision community has witnessed remarkable progress invisual recognition, partially owing to advancements in dataset benchmarks.Notably, the established COCO benchmark has propelled the development of moderndetection and segmentation systems. However, the COCO segmentation benchmarkhas seen comparatively slow improvement over the last decade. Originallyequipped with coarse polygon annotations for thing instances, it graduallyincorporated coarse superpixel annotations for stuff regions, which weresubsequently heuristically amalgamated to yield panoptic segmentationannotations. These annotations, executed by different groups of raters, haveresulted not only in coarse segmentation masks but also in inconsistenciesbetween segmentation types. In this study, we undertake a comprehensivereevaluation of the COCO segmentation annotations. By enhancing the annotationquality and expanding the dataset to encompass 383K images with more than 5.18Mpanoptic masks, we introduce COCONut, the COCO Next Universal segmenTationdataset. COCONut harmonizes segmentation annotations across semantic, instance,and panoptic segmentation with meticulously crafted high-quality masks, andestablishes a robust benchmark for all segmentation tasks. To our knowledge,COCONut stands as the inaugural large-scale universal segmentation dataset,verified by human raters. We anticipate that the release of COCONut willsignificantly contribute to the community’s ability to assess the progress ofnovel neural networks.

Further reading