Unsupervised Categorical Representation Learning for Package Arrival Time Prediction

2021 
Estimated Time of package Arrival (ETA) is an essential task for Alibaba E-commerce platforms like Taobao and Tmall, which may influence the user experiences of one billion customers. The main challenge in ETA prediction of Alibaba platforms is learning from high-dimensional categorical attributes, which is equally important to obtain appropriate representations for each feature, and describe the proximity among them. Although recent supervised end-to-end methods have achieved great improvements, the unsupervised embedding method for categorical attributes has not been well-studied yet, especially when dealing with large-scale sparse datasets. In this paper, we propose Bayesian Graph Embedding (BGE) to learn dense representations for high-dimensional categorical attributes in an unsupervised way. Ignited by the idea of Bayesian network and graph embedding, we design an unsupervised algorithm to absorb the knowledge of prior dependencies and unobserved attributes, which is ignored by end-to-end methods. A joint optimization objective is raised to mine the proximity between categorical attributes, which achieves a consistency with Bayesian network. Moreover, a multi-tower model architecture is put forward for the multi-task learning of the joint objective, based on which the dense representations of categorical attributes can be well-exploited. The produced embeddings can be applied in downstream tasks such as regression and classification with further fine-tuning. Extensive experiments have been conducted on two datasets with more than two million samples, collected from Alibaba real production environment. The experimental results demonstrate the proposed approach outperforms both supervised and unsupervised baseline methods in the effectiveness and efficiency of E-commerce ETA prediction task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []