RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer

In this report, we present RT-DETRv2, an improved Real-Time DEtectionTRansformer (RT-DETR). RT-DETRv2 builds upon the previous state-of-the-artreal-time detector, RT-DETR, and opens up a set of bag-of-freebies forflexibility and practicality, as well as optimizing the training strategy toachieve enhanced performance. To improve the flexibility, we suggest setting adistinct number of sampling points for features at different scales in thedeformable attention to achieve selective multi-scale feature extraction by thedecoder. To enhance practicality, we propose an optional discrete samplingoperator to replace the grid_sample operator that is specific to RT-DETRcompared to YOLOs. This removes the deployment constraints typically associatedwith DETRs. For the training strategy, we propose dynamic data augmentation andscale-adaptive hyperparameters customization to improve performance withoutloss of speed. Source code and pre-trained models will be available athttps://github.com/lyuwenyu/RT-DETR.

Further reading