Adversarial Transformers for Weakly Supervised Object Localization
2022
Weakly supervised object localization (WSOL) aims at localizing objects with only image-level labels, which has better scalability and practicability than fully supervised methods. However, without pixel-level supervision, existing methods tend to generate rough localization maps, which hinders localization performance. To alleviate this problem, we propose an adversarial transformer network (ATNet), which aims to obtain a well-learned localization model with pixel-level pseudo labels. The proposed ATNet enjoys several merits. First, we design an object transformer (
$G$
) that can generate localization maps and pseudo labels effectively and dynamically, and a part transformer (
$D$
) to accurately discriminate detailed local differences between localization maps and pseudo labels. Second, we propose to train
$G$
and
$D$
via an adversarial process, where
$G$
can generate more accurate localization maps approaching pseudo labels to fool
$D$
. To the best of our knowledge, this is the first work to explore transformers with adversarial training to obtain a well-learned localization model for WSOL. Extensive experiments with four backbones on two standard benchmarks demonstrate that our ATNet achieves favorable performance against state-of-the-art WSOL methods. Besides, our adversarial training can provide higher robustness against adversarial attacks.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI