TraDE: Transformers for Density Estimation.

Rasool Fakoor,Pratik Chaudhari,Jonas Mueller,Alexander J. Smola

TraDE: Transformers for Density Estimation.

2020

Rasool Fakoor
Pratik Chaudhari
Jonas Mueller
Alexander J. Smola

We present TraDE, an attention-based architecture for auto-regressive density estimation. In addition to a Maximum Likelihood loss we employ a Maximum Mean Discrepancy (MMD) two-sample loss to ensure that samples from the estimate resemble the training data. The use of attention means that the model need not retain conditional sufficient statistics during the process beyond what is needed for each covariate. TraDE performs significantly better than existing approaches such differentiable flow based estimators on standard tabular and image-based benchmarks in terms of the log-likelihood on held out data. TraDE works well wide range of tasks that includes classification methods to ascertain the quality of generated samples, out of distribution sample detection, and handling outliers in the training data.

Keywords:

Differentiable function
Outlier
Maximum likelihood
Estimator
Covariate
Mathematics
Density estimation
Sufficient statistic
Mathematical optimization
Training set
Statistics
classification methods

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations