TranSalNet: Visual saliency prediction using transformers

Jianxun Lou,Hanhe Lin,David Marshall,Dietmar Saupe,Hantao Liu

TranSalNet: Visual saliency prediction using transformers

2021

Jianxun Lou
Hanhe Lin
David Marshall
Dietmar Saupe
Hantao Liu

Convolutional neural networks (CNNs) have significantly advanced computational modeling for saliency prediction. However, the inherent inductive biases of convolutional architectures cause insufficient long-range contextual encoding capacity, which potentially makes a saliency model less humanlike. Transformers have shown great potential in encoding long-range information by leveraging the self-attention mechanism. In this paper, we propose a novel saliency model integrating transformer components to CNNs to capture the long-range contextual information. Experimental results show that the new components make improvements, and the proposed model achieves promising results in predicting saliency.

Keywords:

contextual information
Convolutional neural network
Artificial intelligence
Pattern recognition
transformer
Encoding (memory)
Computer science
Mechanism (biology)
visual saliency

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations