Adversarial Transformers for Weakly Supervised Object Localization

2022 
Weakly supervised object localization (WSOL) aims at localizing objects with only image-level labels, which has better scalability and practicability than fully supervised methods. However, without pixel-level supervision, existing methods tend to generate rough localization maps, which hinders localization performance. To alleviate this problem, we propose an adversarial transformer network (ATNet), which aims to obtain a well-learned localization model with pixel-level pseudo labels. The proposed ATNet enjoys several merits. First, we design an object transformer ( $G$ ) that can generate localization maps and pseudo labels effectively and dynamically, and a part transformer ( $D$ ) to accurately discriminate detailed local differences between localization maps and pseudo labels. Second, we propose to train $G$ and $D$ via an adversarial process, where $G$ can generate more accurate localization maps approaching pseudo labels to fool $D$ . To the best of our knowledge, this is the first work to explore transformers with adversarial training to obtain a well-learned localization model for WSOL. Extensive experiments with four backbones on two standard benchmarks demonstrate that our ATNet achieves favorable performance against state-of-the-art WSOL methods. Besides, our adversarial training can provide higher robustness against adversarial attacks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []