Sampling-invariant fully metric learning for few-shot object detection
Few-shot object detection (FSOD) aims to learn models to detect unseen objects with a few annotated exemplars. Despite great success in FSOD, existing metric-based methods heavily rely on class prototypes extracted from limited training data and thus suffer severely from data shift, which can learn inductive bias and produce inconsistent performance across multiple runs. According to statistic law, data shift happens more frequently under low-shot scenarios than that in many-shot scenarios. However, there is insufficient research about utilizing intrinsic and robust properties of limited data for data-stable results instead of blindly achieving high performance on a specific dataset. This inspires us to model intrinsic properties of limited data for building a set of robust class prototypes. To this end, we propose a novel Sampling-Invariant Fully metric learning Network (SIF-Net) that enhances class prototypes in three aspects. Specifically, our model includes three novel modules: (1) a multi-scale context matching (MSCM) module that aggregates more accurate class concepts in a scale-matching manner, (2) a semantic drift correlation (SDC) module that recovers distorted class prototypes through object context, and (3) a fully metric learning (FML) module that encodes both class and spatial priors into class prototypes, which is fundamentally different from previous metric-based approaches. As a result, our SIF-Net is robust to various settings of few-shot data quality. Extensive experiments show that our fully metric-based framework is superior to other metric-based approaches and achieves state-of-the-art performance on PASCAL VOC and MS COCO datasets.