Adversarial Transformers for Weakly Supervised Object Localization

Meng Meng,Tianzhu Zhang,Zhe Zhang,Yongdong Zhang,Feng Wu

Adversarial Transformers for Weakly Supervised Object Localization

2022

Weakly supervised object localization (WSOL) aims at localizing objects with only image-level labels, which has better scalability and practicability than fully supervised methods. However, without pixel-level supervision, existing methods tend to generate rough localization maps, which hinders localization performance. To alleviate this problem, we propose an adversarial transformer network (ATNet), which aims to obtain a well-learned localization model with pixel-level pseudo labels. The proposed ATNet enjoys several merits. First, we design an object transformer (

$G$

) that can generate localization maps and pseudo labels effectively and dynamically, and a part transformer (

$D$

) to accurately discriminate detailed local differences between localization maps and pseudo labels. Second, we propose to train

$G$

and

$D$

via an adversarial process, where

$G$

can generate more accurate localization maps approaching pseudo labels to fool

$D$

. To the best of our knowledge, this is the first work to explore transformers with adversarial training to obtain a well-learned localization model for WSOL. Extensive experiments with four backbones on two standard benchmarks demonstrate that our ATNet achieves favorable performance against state-of-the-art WSOL methods. Besides, our adversarial training can provide higher robustness against adversarial attacks.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations