Learning Visual Features from Product Title for Image Retrieval

2020 
There is a huge market demand for searching for products by images in e-commerce sites. Visual features play the most important role in solving this content-based image retrieval task. Most existing methods leverage pre-trained models on other large-scale datasets with well-annotated labels, e.g. the ImageNet dataset, to extract visual features. However, due to the large difference between the product images and the images in ImageNet, the feature extractor trained on ImageNet is not efficient in extracting the visual features of product images. And retraining the feature extractor on the product images is faced with the dilemma of lacking the annotated labels. In this paper, we utilize the easily accessible text information, that is, the product title, as a supervised signal to learn the features of the product image. Specifically, we use the n-grams extracted from the product title as the label of the product image to construct a dataset for image classification. This dataset is then used to fine-tuned a pre-trained model. Finally, the basic max-pooling activation of convolutions (MAC) feature is extracted from the fine-tuned model. As a result, we achieve the fourth position in the Grand Challenge of AI Meets Beauty in 2020 ACM Multimedia by using only a single ResNet-50 model without any human annotations and pre-processing or post-processing tricks. Our code is available at: \urlhttps://github.com/FangxiangFeng/AI-Meets-Beauty-2020.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []