NeurIPS 2018
Paper Name Author(s)
Modelling Sparsity, Heterogeneity, Reciprocity And Community Structure In Temporal Interaction Data

The authors propose a novel class of network models for temporal dyadic interaction data.

Xenia Miscouridou, Francois Caron, Yee Whye Teh
The Lingering Of Gradients: How To Reuse Gradients Over Time

Classically, the time complexity of a first-order method is estimated by its number of gradient computations.In this paper, the authors study a more refined complexity by taking into account the ``lingering' of gradients: once a gradient is computed at $x_k$, the additional time to compute gradients at $x_{k+1},x_{k+2},dots$ may be reduced.The authors show how this improves the running time of gradient descent and SVRG.

Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang
Quadratic Decomposable Submodular Function Minimization

The authors introduce a new convex optimization problem, termed quadratic decomposable submodular function minimization.

Pan Li, Niao He, Olgica Milenkovic
On The Global Convergence Of Gradient Descent For Over-parameterized Models Using Optimal Transport

Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure.For these problems, the authors study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions.

Lénaïc Chizat, Francis Bach
Leveraging The Exact Likelihood Of Deep Latent Variable Models

Deep latent variable models (DLVMs) combine the approximation abilities of deep neural networks and the statistical foundations of generative models.The purpose of this work is to study the general properties of this quantity and to show how they can be leveraged in practice.

Pierre-Alexandre Mattei, Jes Frellsen
DVAE#: Discrete Variational Autoencoders With Relaxed Boltzmann Priors

Boltzmann machines are powerful distributions that have been shown to be an effective prior over binary latent variables in variational autoencoders (VAEs).The authors propose two approaches for relaxing Boltzmann machines to continuous distributions that permit training with importance-weighted bounds.

Arash Vahdat, Evgeny Andriyash, William Macready
Amortized Inference Regularization

The variational autoencoder (VAE) is a popular model for density estimation and representation learning.

Rui Shu, Hung Bui, Shengjia Zhao, Mykel J Kochenderfer, Stefano Ermon
GumBolt: Extending Gumbel Trick To Boltzmann Priors

Boltzmann machines (BMs) are appealing candidates for powerful priors in variational autoencoders (VAEs), as they are capable of capturing nontrivial and multi-modal distributions over discrete variables.Here, the authors propose the GumBolt, a model that extends the Gumbel trick to BM priors in VAEs.

Amir H Khoshaman, Mohammad Amin
Constrained Generation Of Semantically Valid Graphs Via Regularizing Variational Autoencoders

Deep generative models have achieved remarkable success in various data domains, including images, time series, and natural languages.In this work, the authors propose a regularization framework for variational autoencoders as a step toward semantic validity.

Tengfei Ma, Jie Chen, Cao Xiao
Invertibility Of Convolutional Generative Networks From Partial Measurements

In this work, the authors present new theoretical results on convolutional generative neural networks, in particular their invertibility (i.e., the recovery of input latent code given the network output).

Fangchang Ma, Ulas Ayaz, Sertac Karaman
Glow: Generative Flow With Invertible 1x1 Convolutions

Flow-based generative models are conceptually attractive due to tractability of the exact log-likelihood, tractability of exact latent-variable inference, and parallelizability of both training and synthesis.In this paper the authors propose Glow, a simple type of generative flow using invertible 1x1 convolution.

Durk Kingma, Prafulla Dhariwal
Multimodal Generative Models For Scalable Weakly-Supervised Learning

Multiple modalities often co-occur when describing natural phenomena.Learning a joint representation of these modalities should yield deeper and more useful representations.Previous generative approaches to multi-modal input either do not learn a joint distribution or require additional computation to handle missing data.

Mike Wu, Noah Goodman
IntroVAE: Introspective Variational Autoencoders For Photographic Image Synthesis

The authors present a novel introspective variational autoencoder (IntroVAE) model for synthesizing high-resolution photographic images.

Huaibo Huang, Zhihang Li, Ran He, Zhenan Sun, Tieniu Tan
Towards Text Generation With Adversarially Learned Neural Outlines

Recent progress in deep generative models has been fueled by two paradigms -- autoregressive and adversarial models.The authors propose a combination of both approaches with the goal of learning generative models of text.

Sandeep Subramanian, Sai Rajeswar Mudumba, Alessandro Sordoni, Adam Trischler, Aaron Courville, Chris Pal
Unsupervised Image-to-Image Translation Using Domain-Specific Variational Information Bound

In this work, the authors propose an unsupervised image-to-image translation framework which maximizes a domain-specific variational information bound and learns the target domain-invariant representation of the two domain.

Hadi Kazemi, Sobhan Soleymani, Fariborz Taherkhani, Seyed Iranmanesh, Nasser Nasrabadi
Adversarial Scene Editing: Automatic Object Removal From Weak Supervision

While great progress has been made recently in automatic image manipulation, it has been limited to object centric images like faces or structured scene datasets.In this work, the authors take a step towards general scene-level image editing by developing an automatic interaction-free object removal model.

Rakshith R Shetty, Mario Fritz, Bernt Schiele
Generalizing Point Embeddings Using The Wasserstein Space Of Elliptical Distributions

Embedding complex objects as vectors in low dimensional spaces is a longstanding problem in machine learning.The authors propose in this work an extension of that approach, which consists in embedding objects as elliptical probability distributions, namely distributions whose densities have elliptical level sets.

Boris Muzellec, Marco Cuturi
Banach Wasserstein GAN

Wasserstein Generative Adversarial Networks (WGANs) can be used to generate realistic samples from complicated image distributions.

Jonas Adler, Sebastian Lunz
A Convex Duality Framework For GANs Farzan Farnia, David Tse
On The Convergence And Robustness Of Training GANs With Regularized Optimal Transport

Generative Adversarial Networks (GANs) are one of the most practical methods for learning data distributions.Consequently, the authors establish theoretical convergence guarantee to stationarity for a proposed class of GAN optimization algorithms.

Maziar Sanjabi, Jimmy Ba, Meisam Razaviyayn, Jason Lee
On Gradient Regularizers For MMD GANs

The authors propose a principled method for gradient-based regularization of the critic of GAN-like models trained by adversarially optimizing the kernel of a Maximum Mean Discrepancy (MMD).

Michael Arbel, Dougal Sutherland, Mikołaj Bińkowski, Arthur Gretton
PacGAN: The Power Of Two Samples In Generative Adversarial Networks

Generative adversarial networks (GANs) are a technique for learning generative models of complex data distributions from samples.The authors study a principled approach to handling mode collapse, which the authors call packing.

Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh
Are GANs Created Equal? A Large-Scale Study

Generative adversarial networks (GAN) are a powerful subclass of generative models.The authors conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures.

Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier Bousquet
Disconnected Manifold Learning For Generative Adversarial Networks

Natural images may lie on a union of disjoint manifolds rather than one globally connected manifold, and this can cause several difficulties for the training of common Generative Adversarial Networks (GANs).Next, the authors show how using a collection of generators can address this problem, providing new insights into the success of such multi-generator GANs.

Mahyar Khayatkhoei, Maneesh K. Singh, Ahmed Elgammal
Hessian-based Analysis Of Large Batch Training And Robustness To Adversaries

Large batch size training of Neural Networks has been shown to incur accuracyloss when trained with the current methods.Here, the authors study large batch sizetraining through the lens of the Hessian operator and robust optimization.

Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, Michael W Mahoney
Fast And Effective Robustness Certification

The authors present a new method and system, called DeepZ, for certifying neural networkrobustness based on abstract interpretation.

Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, Martin Vechev
Graphical Generative Adversarial Networks

The authors propose Graphical Generative Adversarial Networks (Graphical-GAN) to model structured data.

Chongxuan LI, Max Welling, Jun Zhu, Bo Zhang
Deep Defense: Training DNNs With Improved Adversarial Robustness

Despite the efficacy on a variety of computer vision tasks, deep neural networks (DNNs) are vulnerable to adversarial attacks, limiting their applications in security-critical systems.To address this problem, the authors propose a training recipe named "deep defense".

Ziang Yan, Yiwen Guo, Changshui Zhang
Learning To Repair Software Vulnerabilities With Generative Adversarial Networks Jacob Harer, Onur Ozdemir, Tomo Lazovich, Christopher Reale, Rebecca Russell, Louis Kim, Peter Chin
Memory Replay GANs: Learning To Generate New Categories Without Forgetting

Previous works on sequential learning address the problem of forgetting in discriminative models.Addressing this problem, the authors propose Memory Replay GANs (MeRGANs), a conditional GAN framework that integrates a memory replay generator.

Chenshen Wu, Luis Herranz, Xialei Liu, Yaxing Wang, Joost Van De Weijer, Bogdan Raducanu
Unsupervised Attention-guided Image-to-Image Translation

Current unsupervised image-to-image translation techniques struggle to focus their attention on individual objects without altering the background or the way multiple objects interact within a scene.

Youssef Alami Mejjati, Christian Richardt, James Tompkin, Darren Cosker, Kwang In Kim
Conditional Adversarial Domain Adaptation

Adversarial learning has been embedded into deep networks to learn disentangled and transferable representations for domain adaptation.

Mingsheng Long, Zhangjie CAO, Jianmin Wang, Michael I. Jordan
Video-to-Video Synthesis

The authors study the problem of video-to-video synthesis.

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Nikolai Yakovenko, Andrew Tao, Jan Kautz, Bryan Catanzaro
Generalized Zero-Shot Learning With Deep Calibration Network Shichen Liu, Mingsheng Long, Jianmin Wang, Michael I. Jordan
Low-shot Learning Via Covariance-Preserving Adversarial Augmentation Networks

Deep neural networks suffer from over-fitting and catastrophic forgetting when trained with small data.In this work, the authors propose Covariance-Preserving Adversarial Augmentation Networks to overcome existing limits of low-shot learning.

Hang Gao, Zheng Shou, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang
Trading Robust Representations For Sample Complexity Through Self-supervised Visual Experience

Learning in small sample regimes is among the most remarkable features of the human perceptual system.The authors explore the idea of allowing artificial systems to learn representations of visual stimuli through weak supervision prior to downstream supervised tasks.

Andrea Tacchetti, Stephen Voinea, Georgios Evangelopoulos
TADAM: Task Dependent Adaptive Metric For Improved Few-shot Learning

Few-shot learning has become essential for producing models that generalize from few examples.The authors further propose a simple and effective way of conditioning a learner on the task sample set, resulting in learning a task-dependent metric space.

Boris Oreshkin, Pau Rodríguez López, Alexandre Lacoste
FishNet: A Versatile Backbone For Image, Region, And Pixel Level Prediction

The basic principles in designing convolutional neural network (CNN) structures for predicting objects on different levels, e.g., image-level, region-level, and pixel-level, are diverging.

Shuyang Sun, Jiangmiao Pang, Jianping Shi, Shuai Yi, Wanli Ouyang
Batch-Instance Normalization For Adaptively Style-Invariant Neural Networks

Real-world image recognition is often challenged by the variability of visual styles including object textures, lighting conditions, filter effects, etc.Extending this idea to general visual recognition problems, the authors present Batch-Instance Normalization (BIN) to explicitly normalize unnecessary styles from images.

Hyeonseob Nam, Hyo-Eun Kim
A^2-Nets: Double Attention Networks

Learning to capture long-range relations is fundamental to image/video recognition.In this work, the authors propose the “double attention block”, a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access features from the entire space efficiently.

Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng
Pelee: A Real-Time Object Detection System On Mobile Devices Robert Wang, Tanner Bohn, Charles Ling
PointCNN: Convolution On X-Transformed Points

The authors present a simple and general framework for feature learning from point cloud.

Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, Baoquan Chen
Deep Neural Networks With Box Convolutions

Box filters computed using integral images have been part of the computer vision toolset for a long time.

Egor Burkov, Victor Lempitsky
An Intriguing Failing Of Convolutional Neural Networks And The CoordConv Solution

Few ideas have enjoyed as large an impact on deep learning as convolution.For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate.

Rosanne Liu, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank Frank, Alex Sergeev, Jason Yosinski
3D Steerable CNNs: Learning Rotationally Equivariant Features In Volumetric Data

The authors present a convolutional network that is equivariant to rigid body motions.

Maurice Weiler, Wouter Boomsma, Mario Geiger, Max Welling, Taco Cohen
Moonshine: Distilling With Cheap Convolutions

Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit.The authors propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used.

Elliot J. Crowley, Gavin Gray, Amos Storkey
Kalman Normalization: Normalizing Internal Representations Across Network Layers

As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer.

Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang, Liang Lin
SplineNets: Continuous Neural Decision Graphs

The authors present SplineNets, a practical and novel approach for using conditioning in convolutional neural networks (CNNs).

Cem Keskin, Shahram Izadi
CapProNet: Deep Feature Learning Via Orthogonal Projections Onto Capsule Subspaces

In this paper, the authors formalize the idea behind capsule nets of using a capsule vector rather than a neuron activation to predict the label of samples.To this end, the authors propose to learn a group of capsule subspaces onto which an input feature vector is projected.

Liheng Zhang, Marzieh Edraki, Guo-Jun Qi
Which Neural Net Architectures Give Rise To Exploding And Vanishing Gradients?

The authors give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations.

Boris Hanin
Exact Natural Gradient In Deep Linear Networks And Its Application To The Nonlinear Case

Stochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limitations arising for ill-behaved objective functions.In cases where it could be estimated, the natural gradient has proven very effective at mitigating the catastrophic effects of pathological curvature in the objective function, but little is known theoretically about its convergence properties, and it has yet to find a practical implementation that would scale to very deep and large networks.

Alberto Bernacchia, Mate Lengyel, Guillaume Hennequin
Fast Approximate Natural Gradient Descent In A Kronecker Factored Eigenbasis

Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions.In the present work the authors draw inspiration from both to propose a novel approximation that is provably better than KFAC and amendable to cheap partial updates.

Thomas George, César Laurent, Xavier Bouthillier, Nicolas Ballas, Pascal Vincent
Post: Device Placement With Cross-Entropy Minimization And Proximal Policy Optimization Yuanxiang Gao, Li Chen, Baochun Li
Paraphrasing Complex Network: Network Compression Via Factor Transfer Jangho Kim, Seonguk Park, Nojun Kwak
Learning Compressed Transforms With Low Displacement Rank

The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual.

Anna Thomas, Albert Gu, Tri Dao, Atri Rudra, Chris Ré
Knowledge Distillation By On-the-Fly Native Ensemble

Knowledge distillation is effective to train the small and generalisable network models for meeting the low-memory and fast running requirements.In this work, the authors present an On-the-fly Native Ensemble (ONE) learning strategy for one-stage online distillation.

Xu Lan, Xiatian Zhu, Shaogang Gong
Scalable Methods For 8-bit Training Of Neural Networks

Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e.Additionally, as QNNs require batch-normalization to be trained at high precision, the authors introduce Range Batch-Normalization (BN) which has significantly higher tolerance to quantization noise and improved computational complexity.

Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry
Training Deep Models Faster With Robust, Approximate Importance Sampling

In theory, importance sampling speeds up stochastic gradient algorithms for supervised learning by prioritizing training examples.The authors propose a robust, approximate importance sampling procedure (RAIS) for stochastic gradient de- scent.

Tyler Johnson, Carlos Guestrin
Collaborative Learning For Deep Neural Networks

The authors introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost.

Guocong Song, Wei Chai
A Linear Speedup Analysis Of Distributed Deep Learning With Sparse And Quantized Communication

The large communication overhead has imposed a bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) for training deep neural networks.In this paper, the authors study the convergence rate of distributed SGD for non-convex optimization with two communication reducing strategies: sparse parameter averaging and gradient quantization.

Peng Jiang, Gagan Agrawal
Bayesian Distributed Stochastic Gradient Descent

The authors introduce Bayesian distributed stochastic gradient descent (BDSGD), a high-throughput algorithm for training deep neural networks on parallel clusters.

Michael Teng, Frank Wood
Regularizing By The Variance Of The Activations' Sample-Variances

Normalization techniques play an important role in supporting efficient and often more effective training of deep neural networks.

Etai Littwin, Lior Wolf
BML: A High-performance, Low-cost Gradient Synchronization Algorithm For DML Training

In distributed machine learning (DML), the network performance between machines significantly impacts the speed of iterative training.In this paper the authors propose BML, a new gradient synchronization algorithm with higher network performance and lower network cost than the current practice.

Songtao Wang, Dan Li, Yang Cheng, Jinkun Geng, Yanshu Wang, Shuai Wang, Shu-Tao Xia, Jianping Wu
L4: Practical Loss-based Stepsize Adaptation For Deep Learning

The authors propose a stepsize adaptation scheme for stochastic gradient descent.It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss.The authors demonstrate its capabilities by conclusively improving the performance of Adam and Momentum optimizers.The enhanced optimizers with default hyperparameters consistently outperform their constant stepsize counterparts, even the best ones, without a measurable increase in computational cost.The performance is validated on multiple architectures including dense nets, CNNs, ResNets, and the recurrent Differential Neural Computer on classical datasets MNIST, fashion MNIST, CIFAR10 and others.

Michal Rolinek, Georg Martius
Synaptic Strength For Convolutional Neural Network

Convolutional Neural Networks(CNNs) are both computation and memory inten-sive which hindered their deployment in mobile devices.Inspired by the relevantconcept in neural science literature, the authors propose Synaptic Pruning: a data-drivenmethod to prune connections between input and output feature maps with a newlyproposed class of parameters called Synaptic Strength.

CHEN LIN, Zhao Zhong, Wu Wei, Junjie Yan
ChannelNets: Compact And Efficient Convolutional Neural Networks Via Channel-Wise Convolutions

Convolutional neural networks (CNNs) have shown great capability of solving various artificial intelligence tasks.In this work, the authors propose to compress deep models by using channel-wise convolutions, which replace dense connections among feature maps with sparse ones in CNNs.

Hongyang Gao, Zhengyang Wang, Shuiwang Ji
Frequency-Domain Dynamic Pruning For Convolutional Neural Networks

Deep convolutional neural networks have demonstrated their powerfulness in a variety of applications.Considering that there are spatial redundancy within most filters in a CNN, the authors propose a frequency-domain dynamic pruning scheme to exploit the spatial correlations.

Zhenhua Liu, Jizheng Xu, Xiulian Peng, Ruiqin Xiong
TETRIS: TilE-matching The TRemendous Irregular Sparsity

Compressing neural networks by pruning weights with small magnitudes can significantly reduce the computation and storage cost.

Yu Ji, Ling Liang, Lei Deng, Youyang Zhang, Youhui Zhang, Yuan Xie
Heterogeneous Bitwidth Binarization In Convolutional Neural Networks

Recent work has shown that fast, compact low-bitwidth neural networks canbe surprisingly accurate.However, modern hardware allows efficient designs whereeach arithmetic instruction can have a custom bitwidth, motivating heterogeneousbinarization, where every parameter in the network may have a different bitwidth.In this paper, the authors show that it is feasible and useful to select bitwidths at theparameter granularity during training.

Joshua Fromm, Shwetak Patel, Matthai Philipose
HitNet: Hybrid Ternary Recurrent Neural Network Peiqi Wang, Xinfeng Xie, Lei Deng, Guoqi Li, Dongsheng Wang, Yuan Xie
A General Method For Amortizing Variational Filtering

The authors introduce the variational filtering EM algorithm, a simple, general-purpose method for performing variational inference in dynamical latent variable models using information from only past and present variables, i.e.

Joe Marino, Milan Cvitkovic, Yisong Yue
Multiple Instance Learning For Efficient Sequential Data Classification On Resource-constrained Devices

The authors study the problem of fast and efficient classification of sequential data (such astime-series) on tiny devices, which is critical for various IoT related applicationslike audio keyword detection or gesture detection.

Don Dennis, Chirag Pabbaraju, Harsha Simhadri, Prateek Jain
Navigating With Graph Representations For Fast And Scalable Decoding Of Neural Language Models

Neural language models (NLMs) have recently gained a renewed interest by achieving state-of-the-art performance across many natural language processing (NLP) tasks.However, NLMs are very computationally demanding largely due to the computational cost of the decoding process, which consists of a softmax layer over a large vocabulary.The authors observe that in the decoding of many NLP tasks, only the probabilities of the top-K hypotheses need to be calculated preciously and K is often much smaller than the vocabulary size.This paper proposes a novel softmax layer approximation algorithm, called Fast Graph Decoder (FGD), which quickly identifies, for a given context, a set of K words that are most likely to occur according to a NLM.

Minjia Zhang, Wenhan Wang, Xiaodong Liu, Jianfeng Gao, Yuxiong He
Representer Point Selection For Explaining Deep Neural Networks

The authors propose to explain the predictions of a deep neural network, by pointing to the set of what the authors call representer points in the training set, for a given test point prediction.

Chih-Kuan Yeh, Joon Kim, Ian En-Hsu Yen, Pradeep Ravikumar
Interpreting Neural Network Judgments Via Minimal, Stable, And Symbolic Corrections

The authors present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output.

Xin Zhang, Armando Solar-Lezama, Rishabh Singh
DropMax: Adaptive Variational Softmax

The authors propose DropMax, a stochastic version of softmax classifier which at each iteration drops non-target classes according to dropout probabilities adaptively decided for each instance.

Hae Beom Lee, Juho Lee, Saehoon Kim, Eunho Yang, Sung Ju Hwang
Out-of-Distribution Detection Using Multiple Semantic Label Representations

Deep Neural Networks are powerful models that attained remarkable results on a variety of tasks.The authors propose to use multiple semantic dense representations instead of sparse representation as the target label.

Gabi Shalev, Yossi Adi, Yossi Keshet
End-to-end Symmetry Preserving Inter-atomic Potential Energy Model For Finite And Extended Systems

Machine learning models are changing the paradigm of molecular modeling, which is a fundamental tool for material science, chemistry, and computational biology.Here the authors develop Deep Potential - Smooth Edition (DeepPot-SE), an end-to-end machine learning-based PES model, which is able to efficiently represent the PES for a wide variety of systems with the accuracy of ab initio quantum mechanics models.

Linfeng Zhang, Jiequn Han, Han Wang, Wissam Saidi, Roberto Car, Weinan E
Deep Predictive Coding Network With Local Recurrent Processing For Object Recognition

Inspired by "predictive coding" - a theory in neuroscience, the authors develop a bi-directional and dynamic neural network with local recurrent processing, namely predictive coding network (PCN).Feedback and feedforward connections enable adjacent layers to interact locally and recurrently to refine representations towards minimization of layer-wise prediction errors.

Kuan Han, Haiguang Wen, Yizhen Zhang, Di Fu, Eugenio Culurciello, Zhongming Liu
SLAYER: Spike Layer Error Reassignment In Time

Configuring deep Spiking Neural Networks (SNNs) is an exciting research avenue for low power spike event based computation.In this paper, the authors introduce a new general backpropagation mechanism for learning synaptic weights and axonal delays which overcomes the problem of non-differentiability of the spike function and uses a temporal credit assignment policy for backpropagating error to preceding layers.

Sumit Bam Shrestha, Garrick Orchard
DeepPINK: Reproducible Feature Selection In Deep Neural Networks

Deep learning has become increasingly popular in both supervised and unsupervised machine learning thanks to its outstanding empirical performance.In this paper, the authors describe a method to increase the interpretability and reproducibility of DNNs by incorporating the idea of feature selection with controlled error rate.

Yang Lu, Yingying Fan, Jinchi Lv, William Stafford Noble
Learning Long-range Spatial Dependencies With Horizontal Gated Recurrent Units

Progress in deep learning has spawned great successes in many engineering applications.The authors introduce a visual challenge, Pathfinder, and describe a novel recurrent neural network architecture called the horizontal gated recurrent unit (hGRU) to learn intrinsic horizontal connections -- both within and across feature columns.

Drew Linsley, Junkyung Kim, Vijay Veerabadran, Charles Windolf, Thomas Serre
Neural Interaction Transparency (NIT): Disentangling Learned Interactions For Improved Interpretability

Neural networks are known to model statistical interactions, but they entangle the interactions at intermediate hidden layers for shared representation learning.

Michael Tsang, Hanpeng Liu, Sanjay Purushotham, Pavankumar Murali, Yan Liu
A Bridging Framework For Model Optimization And Deep Propagation

Optimizing task-related mathematical model is one of the most fundamental methodologies in statistic and learning areas.However, generally designed schematic iterations may hard to investigate complex data distributions in real-world applications.

Risheng Liu, Shichao Cheng, Xiaokun Liu, Long Ma, Xin Fan, Zhongxuan Luo
The Importance Of Sampling InMeta-Reinforcement Learning

The authors interpret meta-reinforcement learning as the problem of learning how to quickly find a good sampling distribution in a new environment.

Bradly Stadie, Ge Yang, Rein Houthooft, Peter Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever
Latent Alignment And Variational Attention

Neural attention has become central to many state-of-the-art models in natural language processing and related domains.The authors further propose methods for reducing the variance of gradients to make these approaches computationally feasible.

Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander Rush
Variational Memory Encoder-Decoder

Introducing variability while maintaining coherence is a core task in learning to generate utterances in conversation.The authors empirically compare the proposed model against other recent approaches on various conversational datasets.

Hung Le, Truyen Tran, Thin Nguyen, Svetha Venkatesh
Relational Recurrent Neural Networks

Memory-based neural networks model temporal data by leveraging an ability to remember information for long periods.

Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, Timothy Lillicrap
Learning To Reason With Third Order Tensor Products

The authors combine Recurrent Neural Networks with Tensor Product Representations tolearn combinatorial representations of sequential data.

Imanol Schlag, Jürgen Schmidhuber
Reversible Recurrent Neural Networks

Recurrent neural networks (RNNs) provide state-of-the-art performance in processing sequential data but are memory intensive to train, limiting the flexibility of RNN models which can be trained.

Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse
Breaking The Activation Function Bottleneck Through Adaptive Parameterization

Standard neural network architectures are non-linear only by virtue of a simple element-wise activation function, making them both brittle and excessively large.The authors present an adaptive LSTM that advances the state of the art for the Penn Treebank and Wikitext-2 word-modeling tasks while using fewer parameters and converging in half as many iterations.

Sebastian Flennerhag, Hujun Yin, John Keane, Mark Elliot
BRITS: Bidirectional Recurrent Imputation For Time Series

Time series are widely used as signals in many classification/regression tasks.In this paper, the authors propose BRITS, a novel method based on recurrent neural networks for missing value imputation in time series data.

Wei Cao, Dong Wei, Jian Li, Hao Zhou, Lei Li, Yitan Li
Complex Gated Recurrent Neural Networks Moritz Wolter, Angela Yao
Middle-Out Decoding

Despite being virtually ubiquitous, sequence-to-sequence models are challenged by their lack of diversity and inability to be externally controlled.To address this issue, the authors propose a novel middle-out decoder architecture that begins from an initial middle-word and simultaneously expands the sequence in both directions.

Shikib Mehri, Leonid Sigal
Recurrently Controlled Recurrent Networks

Recurrent neural networks (RNNs) such as long short-term memory and gated recurrent units are pivotal building blocks across a broad spectrum of sequence modeling problems.This paper proposes a recurrently controlled recurrent network (RCRN) for expressive and powerful sequence encoding.

Yi Tay, Anh Tuan Luu, Siu Cheung Hui
Tree-to-tree Neural Networks For Program Translation Xinyun Chen, Chang Liu, Dawn Song
HOUDINI: Lifelong Learning As Program Synthesis

The authors present a neurosymbolic framework for the lifelong learning of algorithmic tasks that mix perception and procedural reasoning.

Lazar Valkov, Dipak Chaudhari, Akash Srivastava, Charles Sutton, Swarat Chaudhuri
Neural Guided Constraint Logic Programming For Program Synthesis

Synthesizing programs using example input/outputs is a classic problem in artificial intelligence.The authors present a method for solving Programming By Example (PBE) problems by using a neural model to guide the search of a constraint logic programming system called miniKanren.

Lisa Zhang, Gregory Rosenblatt, Ethan Fetaya, Renjie Liao, William Byrd, Matthew Might, Raquel Urtasun, Richard Zemel
Embedding Logical Queries On Knowledge Graphs

Learning low-dimensional embeddings of knowledge graphs is a powerful approach used to predict unobserved or missing edges between entities.Here the authors introduce a framework to efficiently make predictions about conjunctive logical queries -- a flexible but tractable subset of first-order logic -- on incomplete knowledge graphs.

Will Hamilton, Payal Bajaj, Marinka Zitnik, Dan Jurafsky, Jure Leskovec
Expanding Holographic Embeddings For Knowledge Completion

Neural models operating over structured spaces such as knowledge graphs require a continuous embedding of the discrete elements of this space (such as entities) as well as the relationships between them.The authors propose a new family of embeddings for knowledge graphs that interpolate between a method with high model complexity and one, namely Holographic embeddings (HolE), with low dimensionality and high training efficiency.

Yexiang Xue, Yang Yuan, Zhitian Xu, Ashish Sabharwal
On The Dimensionality Of Word Embedding

In this paper, the authors provide a theoretical understanding of word embedding and its dimensionality.Motivated by the unitary-invariance of word embedding, the authors propose the Pairwise Inner Product (PIP) loss, a novel metric on the dissimilarity between word embeddings.

Zi Yin, Yuanyuan Shen
Flexible Neural Representation For Physics Prediction

Humans have a remarkable capacity to understand the physical dynamics of objects in their environment, flexibly capturing complex structures and interactions at multiple levels of detail.Inspired by this ability, the authors propose a hierarchical particle-based object representation that covers a wide variety of types of three-dimensional objects, including both arbitrary rigid geometrical shapes and deformable materials.

Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Li Fei-Fei, Josh Tenenbaum, Daniel Yamins
Content Preserving Text Generation With Attribute Controls

In this work, the authors address the problem of modifying textual attributes of sentences.To ensure that the model generates content compatible sentences, the authors introduce a reconstruction loss which interpolates between auto-encoding and back-translation loss components.

Lajanugen Logeswaran, Honglak Lee, Samy Bengio
Recurrent Relational Networks

This paper is concerned with learning to solve tasks that require a chain of interde-pendent steps of relational inference, like answering complex questions about therelationships between objects, or solving puzzles where the smaller elements of asolution mutually constrain each other.The authors introduce the recurrent relational net-work, a general purpose module that operates on a graph representation of objects.As a generalization of Santoro et al.

Rasmus Palm, Ulrich Paquet, Ole Winther
GLoMo: Unsupervised Learning Of Transferable Relational Graphs

Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision.However, these approaches usually transfer unary features and largely ignore more structured graphical representations.

Zhilin Yang, Jake Zhao, Bhuwan Dhingra, Kaiming He, William Cohen, Russ Salakhutdinov, Yann LeCun
Predictive Uncertainty Estimation Via Prior Networks

Estimating how uncertain an AI system is in its predictions is important to improve the safety of such systems.This work proposes a new framework for modeling predictive uncertainty called Prior Networks (PNs) which explicitly models emph{distributional uncertainty}.

Andrey Malinin, Mark Gales
Adversarial Multiple Source Domain Adaptation

The authors propose new generalization bounds and algorithms under both classification and regression settings for unsupervised multiple source domain adaptation.

Han Zhao, Shanghang Zhang, Guanhang Wu, José M. F. Moura, Joao P Costeira, Geoffrey Gordon
Adversarial Examples That Fool Both Computer Vision And Time-Limited Humans

Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich.

Gamaleldin Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alexey Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein
A Simple Cache Model For Image Recognition

Training large-scale image recognition models is computationally expensive.The authors propose to extract this extra class-relevant information using a simple key-value cache memory to improve the classification performance of the model at test time.

Emin Orhan
Co-teaching: Robust Training Of Deep Neural Networks With Extremely Noisy Labels

Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training.Therefore in this paper, the authors propose a new deep learning paradigm called 'Co-teaching' for combating with noisy labels.

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, Masashi Sugiyama
Improved Network Robustness With Adversary Critic

Ideally, what confuses neural network should be confusing to humans.To address this gap in perception, the authors propose a novel approach for learning robust classifier.

Alexander Matyasko, Lap-Pui Chau
Unsupervised Learning Of Object Landmarks Through Conditional Image Generation

The authors propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision.

Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi
Multi-Task Learning As Multi-Objective Optimization

In multi-task learning, multiple tasks are solved jointly, sharing inductive bias between them.The authors therefore propose an upper bound for the multi-objective loss and show that it can be optimized efficiently.

Ozan Sener, Vladlen Koltun
Deep Anomaly Detection Using Geometric Transformations

The authors consider the problem of anomaly detection in images, and present a new detection technique.

Izhak Golan, Ran El-Yaniv
Practical Deep Stereo (PDS): Toward Applications-friendly Deep Stereo Matching

End-to-end deep-learning networks recently demonstrated extremely good performance for stereo matching.

Stepan Tulyakov, Anton Ivanov, François Fleuret
VideoCapsuleNet: A Simplified Network For Action Detection

The recent advances in Deep Convolutional Neural Networks (DCNNs) have shown extremely good results for video human action classification, however, action detection is still a challenging problem.In this work, the authors present a more elegant solution for action detection based on the recently developed capsule network.

Kevin Duarte, Yogesh Rawat, Mubarak Shah
With Friends Like These, Who Needs Adversaries?

The vulnerability of deep image classification networks to adversarial attack is now well known, but less well understood.

Saumya Jetley, Nick Lord, Philip Torr
Multi-Task Zipping Via Layer-wise Neuron Sharing

Future mobile devices are anticipated to perceive, understand and react to the world on their own by running multiple correlated deep neural networks on-device.The authors propose Multi-Task Zipping (MTZ), a framework to automatically merge correlated, pre-trained deep neural networks for cross-model compression.

Xiaoxi He, Zimu Zhou, Lothar Thiele
Learning Versatile Filters For Efficient Convolutional Neural Networks

This paper introduces versatile filters to construct efficient convolutional neural network.

Yunhe Wang, Chang Xu, Chunjing XU, Chao Xu, Dacheng Tao
Evolutionary Stochastic Gradient Descent For Optimization Of Deep Neural Networks

The authors propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks.

Xiaodong Cui, Wei Zhang, Zoltán Tüske, Michael Picheny
Structure-Aware Convolutional Neural Networks

Convolutional neural networks (CNNs) are inherently subject to invariable filters that can only aggregate local inputs with the same topological structures.

Jianlong Chang, Jie Gu, Lingfeng Wang, GAOFENG MENG, SHIMING XIANG, Chunhong Pan
Global Gated Mixture Of Second-order Pooling For Improving Deep Convolutional Neural Networks

In most of existing deep convolutional neural networks (CNNs) for classification, global average (first-order) pooling (GAP) has become a standard module to summarize activations of the last convolution layer as final representation for prediction.

Qilong Wang, Zilin Gao, Jiangtao Xie, Wangmeng Zuo, Peihua Li
Hybrid Macro/Micro Level Backpropagation For Training Deep Spiking Neural Networks

Spiking neural networks (SNNs) are positioned to enable spatio-temporal information processing and ultra-low power event-driven neuromorphic hardware.The authors present a hybrid macro/micro level backpropagation (HM2-BP) algorithm for training multi-layer SNNs.

Yingyezhe Jin, Wenrui Zhang, Peng Li
Deep Neural Nets With Interpolating Function As Output Activation

The authors replace the output layer of deep neural nets, typically the softmax function, by a novel interpolating function.And the authors propose end-to-end training and testing algorithms for this new architecture.

Bao Wang, Xiyang Luo, Zhen Li, Wei Zhu, Zuoqiang Shi, Stanley Osher
Neural Edit Operations For Biological Sequences

The evolution of biological sequences, such as proteins or DNAs, is driven by the three basic edit operations: substitution, insertion, and deletion.Motivated by the recent progress of neural network models for biological tasks, the authors implement two neural network architectures that can treat such edit operations.

Satoshi Koide, Keisuke Kawano, Takuro Kutsuna
Improved Expressivity Through Dendritic Neural Networks

A typical biological neuron, such as a pyramidal neuron of the neocortex, receives thousands of afferent synaptic inputs on its dendrite tree and sends the efferent axonal output downstream.In typical artificial neural networks, dendrite trees are modeled as linear structures that funnel weighted synaptic inputs to the cell bodies.

Xundong Wu, Xiangwen Liu, Wei Li, Qing Wu
Neural Proximal Gradient Descent For Compressive Imaging

Recovering high-resolution images from limited sensory data typically leads to a serious ill-posed inverse problem, demanding inversion algorithms that effectively capture the prior information.

Morteza Mardani, Qingyun Sun, David Donoho, Vardan Papyan, Hatef Monajemi, Shreyas Vasanawala, John Pauly
Sigsoftmax: Reanalysis Of The Softmax Bottleneck

Softmax is an output activation function for modeling categorical probability distributions in many applications of deep learning.However, a recent study revealed that softmax can be a bottleneck of representational capacity of neural networks in language modeling (the softmax bottleneck).

Sekitoshi Kanai, Yasuhiro Fujiwara, Yuki Yamanaka, Shuichi Adachi
Visualizing The Loss Landscape Of Neural Nets

The authors explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods.

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein
Clebsch–Gordan Nets: A Fully Fourier Space Spherical Convolutional Neural Network

Recent work by Cohen et al.has achieved state-of-the-art results for learning spherical images in a rotation invariant way by using ideas from group representation theory and noncommutative harmonic analysis.

Risi Kondor, Zhen Lin, Shubhendu Trivedi
Adaptive Sampling Towards Fast Graph Representation Learning

Graph Convolutional Networks (GCNs) have become a crucial tool on learning representations of graph vertices.

Wenbing Huang, Tong Zhang, Yu Rong, Junzhou Huang
NAIS-Net: Stable Deep Networks From Non-Autonomous Differential Equations

This paper introduces Non-Autonomous Input-Output Stable Network (NAIS-Net), a very deep architecture where each stacked processing block is derived from a time-invariant non-autonomous dynamical system.

Marco Ciccone, Marco Gallieri, Jonathan Masci, Christian Osendorfer, Faustino Gomez
Scaling Provable Adversarial Defenses

Recent work has developed methods for learning deep network classifiers that are emph{provably} robust to norm-bounded adversarial perturbation; however, these methods are currently only possible for relatively small feedforward networks.First, the authors present a technique for extending these training procedures to much more general networks, with skip connections (such as ResNets) and general nonlinearities; the approach is fully modular, and can be implemented automatically analogously to automatic differentiation.

Eric Wong, Frank Schmidt, Jan Hendrik Metzen, J. Zico Kolter
Lipschitz Regularity Of Deep Neural Networks: Analysis And Efficient Estimation

Deep neural networks are notorious for being sensitive to small well-chosen perturbations, and estimating the regularity of such architectures is of utmost importance for safe and robust practical applications.Second, for sequential neural networks, the authors propose an improved algorithm named SeqLip that takes advantage of the linear computation graph to split the computation per pair of consecutive layers.

Aladin Virmaux, Kevin Scaman
Training DNNs With Hybrid Block Floating Point

The wide adoption of DNNs has given birth to unrelenting computing requirements, forcing datacenter operators to adopt domain-specific accelerators to train them.The authors identify block floating point (BFP) as a promising alternative representation since it exhibits wide dynamic range and enables the majority of DNN operations to be performed with fixed-point logic.

Mario Drumond, Tao LIN, Martin Jaggi, Babak Falsafi
Mesh-TensorFlow: Deep Learning For Supercomputers

Batch-splitting (data-parallelism) is the dominant distributed Deep Neural Network (DNN) training strategy, due to its universal applicability and its amenability to Single-Program-Multiple-Data (SPMD) programming.Unfortunately, efficient model-parallel algorithms tend to be complicated to discover, describe, and to implement, particularly on large clusters.

Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman
Thwarting Adversarial Examples: An $L_0$-Robust Sparse Fourier Transform

The authors give a new algorithm for approximating the Discrete Fourier transform of an approximately sparse signal that is robust to worst-case $L_0$ corruptions

Mitali Bafna, Jack Murtagh, Nikhil Vyas
Bayesian Adversarial Learning

Deep neural networks have been known to be vulnerable to adversarial attacks, raising lots of security concerns in the practical deployment.In this work, a novel robust training framework is proposed to alleviate this issue, Bayesian Robust Learning, in which a distribution is put on the adversarial data-generating distribution to account for the uncertainty of the adversarial data-generating process.

Lincoln Ye, Zhanxing Zhu
Dendritic Cortical Microcircuits Approximate The Backpropagation Algorithm

Deep learning has seen remarkable developments over the last years, many of them inspired by neuroscience.Here, the authors introduce a multilayer neuronal network model with simplified dendritic compartments in which error-driven synaptic plasticity adapts the network towards a global desired output.

João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn
Learning A Latent Manifold Of Odor Representations From Neural Responses In Piriform Cortex

A major difficulty in studying the neural mechanisms underlying olfactory perception is the lack of obvious structure in the relationship between odorants and the neural activity patterns they elicit.

Anqi Wu, Stan Pashkovski, Sandeep Datta, Jonathan W Pillow
Size-Noise Tradeoffs In Generative Networks

This paper investigates the ability of generative networks to convert their input noise distributions into other distributions.Firstly, the authors demonstrate a construction that allows ReLU networks to increase the dimensionality of their noise distribution by implementing a ``space-filling' function based on iterated tent maps.

Bolton Bailey, Matus Telgarsky
On Neuronal Capacity

The authors define the capacity of a learning machine to be the logarithm of the number (or volume) of the functions it can implement.

Pierre Baldi, Roman Vershynin
Learning Overparameterized Neural Networks Via Stochastic Gradient Descent On Structured Data

Neural networks have many successful applications, while much less theoretical understanding has been gained.Towards bridging this gap, the authors study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization.

Yuanzhi Li, Yingyu Liang
Deep, Complex, Invertible Networks For Inversion Of Transmission Effects In Multimode Optical Fibres

The authors use complex-weighted, deep networks to invert the effects of multimode optical fibre distortion of a coherent input image.The authors generated experimental data based on collections of optical fibre responses to greyscale input images generated with coherent light, by measuring only image amplitude (not amplitude and phase as is typical) at the output of SI{1}{metre} and SI{10}{metre} long, SI{105}{micrometre} diameter multimode fibre.

Oisín Moran, Piergiorgio Caramazza, Daniele Faccio, Rod Murray-Smith
Learning Towards Minimum Hyperspherical Energy

Neural networks are a powerful class of nonlinear functions that can be trained end-to-end on various applications.While the over-parametrization nature in many neural networks renders the ability to fit complex functions and the strong representation power to handle challenging tasks, it also leads to highly correlated neurons that can hurt the generalization ability and incur unnecessary computation cost.

Weiyang Liu, Rongmei Lin, Zhen Liu, Lixin Liu, Zhiding Yu, Bo Dai, Le Song
Soft-Gated Warping-GAN For Pose-Guided Person Image Synthesis

Despite remarkable advances in image synthesis research, existing works often fail in manipulating images under the context of large geometric transformations.Synthesizing person images conditioned on arbitrary poses is one of the most representative examples where the generation quality largely relies on the capability of identifying and modeling arbitrary transformations on different body parts.

Haoye Dong, Xiaodan Liang, Ke Gong, Hanjiang Lai, Jia Zhu, Jian Yin
Deep Attentive Tracking Via Reciprocative Learning

Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data.In this paper, the authors propose a reciprocative learning algorithm to exploit visual attention for training deep classifiers.

Shi Pu, Yibing Song, Chao Ma, Honggang Zhang, Ming-Hsuan Yang
Robot Learning In Homes: Improving Generalization And Reducing Dataset Bias

Data-driven approaches to solving robotic tasks have gained a lot of traction in recent years.However, most existing policies are trained on large-scale datasets collected in curated lab settings.

Abhinav Gupta, Adithyavairavan Murali, Dhiraj Prakashchand Gandhi, Lerrel Pinto
A Flexible Model For Training Action Localization With Varying Levels Of Supervision

Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame.In this work the authors propose a unifying framework that can handle and combine varying types of less demanding weak supervision.

Guilhem Chéron, Jean-Baptiste Alayrac, Ivan Laptev, Cordelia Schmid
Bilinear Attention Networks

Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively.In this paper, the authors propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly.

Jin-Hwa Kim, Jaehyun Jun, Byoung-Tak Zhang
MacNet: Transferring Knowledge From Machine Comprehension To Sequence-to-Sequence Models

Machine Comprehension (MC) is one of the core problems in natural language processing, requiring both understanding of the natural language and knowledge about the world.The authors propose MacNet: a novel encoder-decoder supplementary architecture to the widely used attention-based sequence-to-sequence models.

Boyuan Pan, Yazheng Yang, Hao Li, Zhou Zhao, Yueting Zhuang, Deng Cai, Xiaofei He
Diffusion Maps For Textual Network Embedding

Textual network embedding leverages rich text information associated with the network to learn low-dimensional vectorial representations of vertices.Rather than using typical natural language processing (NLP) approaches, recent research exploits the relationship of texts on the same edge to graphically embed text.

Xinyuan Zhang, Yitong Li, Dinghan Shen, Lawrence Carin
FRAGE: Frequency-Agnostic Word Representation

Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks.

Chengyue Gong, Di He, Xu Tan, Tao Qin, Liwei Wang, Tie-Yan Liu
A Retrieve-and-Edit Framework For Predicting Structured Outputs Tatsunori Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang
Unsupervised Text Style Transfer Using Language Models As Discriminators

Binary classifiers are employed as discriminators in GAN-based unsupervised style transfer models to ensure that transferred sentences are similar to sentences in the target domain.In this paper, the authors propose a technique of using a target domain language model as the discriminator to provide richer, token-level feedback during the learning process.

Zichao Yang, Zhiting Hu, Chris Dyer, Eric Xing, Taylor Berg-Kirkpatrick
Unsupervised Cross-Modal Alignment Of Speech And Text Embedding Spaces

Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision.The proposed framework learns the individual speech and text embedding spaces, and attempts to align the two spaces via adversarial training, followed by a refinement procedure.

Yu-An Chung, Wei-Hung Weng, Schrasing Tong, Jim Glass
GradiVeQ: Vector Quantization For Bandwidth-Efficient Gradient Aggregation In Distributed CNN Training

Data parallelism can boost the training speed of convolutional neural networks (CNN), but could suffer from significant communication costs caused by gradient aggregation.In this paper, the authors empirically demonstrate the strong linear correlations between CNN gradients, and propose a gradient vector quantization technique, named GradiVeQ, to exploit these correlations through principal component analysis (PCA) for substantial gradient dimension reduction.

Mingchao Yu, Zhifeng Lin, Krishna Narra, Songze Li, Youjie Li, Nam Sung Kim, Alex Schwing, Murali Annavaram, Salman Avestimehr
Gradient Sparsification For Communication-Efficient Distributed Optimization

Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed computational architectures.

Jianqiao Wangni, Jialei Wang, Ji Liu, Tong Zhang
Bayesian Multi-domain Learning For Cancer Subtype Discovery From Next-generation Sequencing Count Data

Precision medicine aims for personalized prognosis and therapeutics by utilizing recent genome-scale high-throughput profiling techniques, including next-generation sequencing (NGS).Second, compared to the number of involved molecules and system complexity, the number of available samples for studying complex disease, such as cancer, is often limited, especially considering disease heterogeneity.

Ehsan Hajiramezanali, Siamak Zamani Dadaneh, Alireza Karbalayghareh, Mingyuan Zhou, Xiaoning Qian
Parsimonious Quantile Regression Of Financial Asset Tail Dynamics Via Sequential Learning

The authors propose a parsimonious quantile regression framework to learn the dynamic tail behaviors of financial asset returns.

Xing Yan, Weizhong Zhang, Lin Ma, Wei Liu, Qi Wu
Global Geometry Of Multichannel Sparse Blind Deconvolution On The Sphere Yanjun Li, Yoram Bresler
Phase Retrieval Under A Generative Prior

The authors introduce a novel deep-learning inspired formulation of the extit{phase retrieval problem}, which asks to recover a signal $y_0 in R^n$ from $m$ quadratic observations, under structural assumptions on the underlying signal.

Paul Hand, Oscar Leong, Vlad Voroninski
Theoretical Linear Convergence Of Unfolded ISTA And Its Practical Weights And Thresholds

In recent years, unfolding iterative algorithms as neural networks has become an empirical success in solving sparse recovery problems.In this work, the authors study unfolded ISTA (Iterative Shrinkage Thresholding Algorithm) for sparse signal recovery.

Xiaohan Chen, Jialin Liu, Zhangyang Wang, Wotao Yin
Modern Neural Networks Generalize On Small Data Sets

In this paper, the authors use a linear program to empirically decompose fitted neural networks into ensembles of low-bias sub-networks.The authors then demonstrate this in practice by applying large neural networks, with hundreds of parameters per training observation, to a collection of 116 real-world data sets from the UCI Machine Learning Repository.

Matthew Olson, Adi Wyner, Richard Berk
Co-regularized Alignment For Unsupervised Domain Adaptation Abhishek Kumar, Prasanna Sattigeri, Kahini Wadhawan, Leonid Karlinsky, Rogerio Feris, Bill Freeman, Gregory Wornell
Neural Networks Trained To Solve Differential Equations Learn General Representations

The authors introduce a technique based on the singular vector canonical correlation analysis (SVCCA) for measuring the generality of neural network layers across a continuously-parametrized set of tasks.

Martin Magill, Faisal Qureshi, Hendrick De Haan
Spectral Filtering For General Linear Dynamical Systems

The authors give a polynomial-time algorithm for learning latent-state linear dynamical systems without system identification, and without assumptions on the spectral radius of the system's transition matrix.

Elad Hazan, HOLDEN LEE, Karan Singh, Cyril Zhang, Yi Zhang
Where Do You Think You're Going?: Inferring Beliefs About Dynamics From Behavior

Inferring intent from observed behavior has been studied extensively within the frameworks of Bayesian inverse planning and inverse reinforcement learning.The authors demonstrate in simulation and in a user study with 12 participants that this approach enables us to more accurately model human intent, and can be used in a variety of applications, including offering assistance in a shared autonomy framework and inferring human preferences.

Sid Reddy, Anca Dragan, Sergey Levine
Point Process Latent Variable Models Of Larval Zebrafish Behavior

A fundamental goal of systems neuroscience is to understand how neural activity gives rise to natural behavior.The authors develop a new class of probabilistic models to tackle this challenge in the study of larval zebrafish, an important model organism for neuroscience.

Anuj Sharma, Scott Linderman, Robert Johnson, Florian Engert
Provably Correct Automatic Sub-Differentiation For Qualified Programs

The emph{Cheap Gradient Principle}~citep{Griewank:2008:EDP:1455489} --- the computational cost of computing a $d$-dimensional vector of partial derivatives of a scalar function is nearly the same (often within a factor of $5$) as that of simply computing the scalar function itself --- is of central importance in optimization; it allows us to quickly obtain (high-dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures.

Sham Kakade, Jason Lee
Neural Ordinary Differential Equations

The authors introduce a new family of deep neural network models.

Ricky Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud
Information Constraints On Auto-Encoding Variational Bayes

Parameterizing the approximate posterior of a generative model with neural networks has become a common theme in recent machine learning research.The authors propose a framework for learning representations that relies on Auto-Encoding Variational Bayes and whose search space is constrained via kernel-based measures of independence.

Romain Lopez, Jeff Regier, Michael I. Jordan, Nir Yosef
Robustness Of Conditional GANs To Noisy Labels

The authors study the problem of learning conditional generators from noisy labeled samples, where the labels are corrupted by random noise.

Kiran Thekumparampil, Ashish Khetan, Zinan Lin, Sewoong Oh
Bias And Generalization In Deep Generative Models: An Empirical Study

In high dimensional settings, density estimation algorithms rely crucially on their inductive bias.In this paper the authors propose a framework to systematically investigate bias and generalization in deep generative models of images by probing the learning algorithm with carefully designed training datasets.

Shengjia Zhao, Hongyu Ren, Arianna Yuan, Jiaming Song, Noah Goodman, Stefano Ermon
Probabilistic Neural Programmed Networks For Scene Generation

In this paper the authors address the text to scene image generation problem.The authors propose PNP-Net, a variational auto-encoder framework that addresses these three challenges: it flexibly composes images with a dynamic network structure, learns a set of distribution transformers that can compose distributions based on semantics, and decodes samples from these distributions into realistic images.

Zhiwei Deng, Jiacheng Chen, YIFANG FU, Greg Mori
Step Size Matters In Deep Learning

Training a neural network with the gradient descent algorithm gives rise to a discrete-time nonlinear dynamical system.To elucidate the effects of the step size on training of neural networks, the authors study the gradient descent algorithm as a discrete-time dynamical system, and by analyzing the Lyapunov stability of different solutions, the authors show the relationship between the step size of the algorithm and the solutions that can be obtained with this algorithm.

Kamil Nar, Shankar Sastry
Neural Tangent Kernel: Convergence And Generalization In Neural Networks

At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods.The authors prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function (which maps input vectors to output vectors) follows the so-called kernel gradient associated with a new object, which the authors call the Neural Tangent Kernel (NTK).

Arthur Jacot-Guillarmod, Clement Hongler, Franck Gabriel
How Does Batch Normalization Help Optimization? Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry
Towards Robust Detection Of Adversarial Examples

Although the recent progress is substantial, deep learning methods can be vulnerable to the maliciously generated adversarial examples.In this paper, the authors present a novel training procedure and a thresholding test strategy, towards robust detection of adversarial examples.

Tianyu Pang, Chao Du, Yinpeng Dong, Jun Zhu
Training Neural Networks Using Features Replay Zhouyuan Huo, Bin Gu, Heng Huang
Faster Neural Networks Straight From JPEG

The simple, elegant approach of training convolutional neural networks (CNNs) directly from RGB pixels has enjoyed overwhelming empirical success.But can more performance be squeezed out of networks by using different input representations?

Lionel Gueguen, Alex Sergeev, Ben Kadlec, Rosanne Liu, Jason Yosinski
Hierarchical Graph Representation Learning With Differentiable Pooling

Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction.

Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, Jure Leskovec
Bayesian Model-Agnostic Meta-Learning

Due to the inherent model uncertainty, learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning.In this paper, the authors propose a novel Bayesian model-agnostic meta-learning method.

Jaesik Yoon, Taesup Kim, Ousmane Dia, Sungwoong Kim, Yoshua Bengio, Sungjin Ahn
Evolved Policy Gradients

The authors propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms.

Rein Houthooft, Richard Chen, Phillip Isola, Bradly Stadie, Filip Wolski, OpenAI Jonathan Ho, Pieter Abbeel
BourGAN: Generative Networks With Metric Embeddings

This paper addresses the mode collapse for generative adversarial networks (GANs).

Chang Xiao, Peilin Zhong, Changxi Zheng
Scalable End-to-End Autonomous Vehicle Testing Via Rare-event Simulation

While recent developments in autonomous vehicle (AV) technology highlight substantial progress, the authors lack tools for rigorous and scalable testing.The authors implement a simulation framework that can test an entire modern autonomous driving system, including, in particular, systems that employ deep-learning perception and control algorithms.

Matthew O'Kelly, Aman Sinha, Hong Namkoong, Russ Tedrake, John Duchi
Generalisation Of Structural Knowledge In The Hippocampal-entorhinal System

A central problem to understanding intelligence is the concept of generalisation.The authors propose that to generalise structural knowledge, the representations of the structure of the world, i.e.

James Whittington, Timothy Muller, Shirely Mark, Caswell Barry, Tim Behrens
Task-Driven Convolutional Recurrent Models Of The Visual System

Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet.The authors extended these design principles in an automated search over thousands of model architectures, which identified novel local recurrent cells and long-range feedback connections useful for object recognition.

Aran Nayebi, Daniel Bear, Jonas Kubilius, Kohitij Kar, Surya Ganguli, David Sussillo, James J DiCarlo, Daniel Yamins
Neural-Symbolic VQA: Disentangling Reasoning From Vision And Language Understanding

The authors marry two powerful ideas: deep representation learning for visual recognition and language understanding, and symbolic program execution for reasoning.

Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Josh Tenenbaum
Extracting Relationships By Multi-Domain Matching Yitong Li, Michael Murias, Geraldine Dawson, David Carlson
Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past.However, humans are often reminded of past memories or mental states which are associated with the current mental state.The authors consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state.

Rosemary Ke, Anirudh Goyal ALIAS PARTH GOYAL, Olexa Bilaniuk, Jonathan Binas, Mike Mozer, Chris Pal, Yoshua Bengio
Beauty-in-averageness And Its Contextual Modulations: A Bayesian Statistical Account

Understanding how humans perceive the likability of high-dimensional objects' such as faces is an important problem in both cognitive science and AI/ML.This statistical coding cost account explains both BiA, where facial blends generally have higher likelihood than ``parent faces', and UiA, when the preceding context or task restricts face representation to a task-relevant subset of features, thus redefining statistical typicality and encoding cost within that subspace.

Chaitanya Ryali, Angela J Yu
Stimulus Domain Transfer In Recurrent Models For Large Scale Cortical Population Prediction On Video

To better understand the representations in visual cortex, the authors need to generate better predictions of neural activity in awake animals presented with their ecological input: natural video.

Fabee Sinz, Alexander Ecker, Paul Fahey, Edgar Walker, Erick Cobos, Emmanouil Froudarakis, Dimitri Yatsenko, Zachary Pitkow, Jacob Reimer, Andreas Tolias
A Probabilistic Population Code Based On Neural Samples

Sensory processing is often characterized as implementing probabilistic inference: networks of neurons compute posterior beliefs over unobserved causes given the sensory inputs.

Sabyasachi Shivkumar, Richard Lange, Ankani Chattoraj, Ralf Haefner
CpSGD: Communication-efficient And Differentially-private Distributed SGD

Distributed stochastic gradient descent is an important subroutine in distributed learning.Several recent works have focused on reducing the communication cost or introducing privacy guarantees, but none of the proposed communication efficient methods are known to be privacy preserving and none of the known privacy mechanisms are known to be communication efficient.

Naman Agarwal, Ananda Theertha Suresh, Felix Xinnan Yu, Sanjiv Kumar, Brendan McMahan
Bounded-Loss Private Prediction Markets Rafael Frongillo, Bo Waggoner
Ex Ante Coordination And Collusion In Zero-sum Multi-player Extensive-form Games

Recent milestones in equilibrium computation, such as the success of Libratus, show that it is possible to compute strong solutions to two-player zero-sum games in theory and practice.The reasons for this are closely related to the fact that the team can be represented as a single player with imperfect recall.

Gabriele Farina, Andrea Celli, Nicola Gatti, Tuomas Sandholm
Model-Agnostic Private Learning

The authors design differentially private learning algorithms that are agnostic to the learning model assuming access to limited amount of unlabeled public data.

Raef Bassily, Abhradeep Guha Thakurta, Om Thakkar
Adversarially Robust Generalization Requires More Data

Machine learning models are often susceptible to adversarial perturbations of their inputs.To better understand this phenomenon, the authors study adversarially robust learning from the viewpoint of generalization.

Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, Aleksander Madry
Probabilistic Pose Graph Optimization Via Bingham Distributions And Tempered Geodesic MCMC

The authors introduce Tempered Geodesic Markov Chain Monte Carlo (TG-MCMC) algorithm for initializing pose graph optimization problems, arising in various scenarios such as SFM (structure from motion) or SLAM (simultaneous localization and mapping).

Tolga Birdal, Umut Simsekli, Mustafa Onur Eken, Slobodan Ilic
A Statistical Recurrent Model On The Manifold Of Symmetric Positive Definite Matrices

In a number of disciplines, the data (e.g., graphs, manifolds) to beanalyzed are non-Euclidean in nature.In this work, the authors study the setting where the data(or measurements) are ordered, longitudinal or temporal in nature andlive on a Riemannian manifold -- this setting is common in a varietyof problems in statistical machine learning, vision and medicalimaging.

Rudrasis Chakraborty, Chun-Hao Yang, Xingjian Zhen, Monami Banerjee, Derek Archer, David Vaillancourt, Vikas Singh, Baba Vemuri
Designing By Training: Acceleration Neural Network For Fast High-Dimensional Convolution

The high-dimensional convolution is widely used in various disciplines but has a serious performance problem due to its high computational complexity.Over the decades, people took a handmade approach to design fast algorithms for the Gaussian convolution.

Longquan Dai, Liang Tang, Yuan Xie, Jinhui Tang
Greedy Hash: Towards Fast Optimization For Accurate Hash Coding In CNN Shupeng Su, Chao Zhang, Kai Han, Yonghong Tian
Generative Probabilistic Novelty Detection With Adversarial Autoencoders

Novelty detection is the problem of identifying whether a new data point is considered to be an inlier or an outlier.The authors assume that training data is available to describe only the inlier distribution.

Stanislav Pidhorskyi, Ranya Almohsen, Gianfranco Doretto
Joint Sub-bands Learning With Clique Structures For Wavelet Domain Super-Resolution

Convolutional neural networks (CNNs) have recently achieved great success in single-image super-resolution (SISR).To solve these problems, the authors propose the Super-Resolution CliqueNet (SRCliqueNet) to reconstruct the high resolution (HR) image with better textural details in the wavelet domain.

Zhisheng Zhong, Tiancheng Shen, Yibo Yang, Zhouchen Lin, Chao Zhang
Compact Generalized Non-local Network

The non-local module is designed for capturing long-range spatio-temporal dependencies in images and videos.

Kaiyu Yue, Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding, Fuxin Xu
BinGAN: Learning Compact Binary Descriptors With A Regularized GAN

In this paper, the authors propose a novel regularization method for Generative Adversarial Networks that allows the model to learn discriminative yet compact binary representations of image patches (image descriptors).

Maciej Zieba, Piotr Semberecki, Tarek El-Gaaly, Tomasz Trzcinski
RenderNet: A Deep Convolutional Network For Differentiable Rendering From 3D Shapes

Traditional computer graphics rendering pipelines are designed for procedurallygenerating 2D images from 3D shapes with high performance.

Thu H Nguyen-Phuoc, Chuan Li, Stephen Balaban, Yongliang Yang
LF-Net: Learning Local Features From Images

The authors present a novel deep architecture and a training strategy to learn a local feature pipeline from scratch, using collections of images without the need for human supervision.

Yuki Ono, Eduard Trulls, Pascal Fua, Kwang Yi
Unsupervised Learning Of Shape And Pose With Differentiable Point Clouds

The authors address the problem of learning accurate 3D shape and camera pose from a collection of unlabeled category-specific images.

Eldar Insafutdinov, Alexey Dosovitskiy
Modelling And Unsupervised Learning Of Symmetric Deformable Object Categories

The authors propose a new approach to model and learn, without manual supervision, the symmetries of natural objects, such as faces or flowers, given only images as input.

James Thewlis, Hakan Bilen, Andrea Vedaldi
Multi-View Silhouette And Depth Decomposition For High Resolution 3D Object Representation

The authors consider the problem of scaling deep generative shape models to high-resolution.Drawing motivation from the canonical view representation of objects, the authors introduce a novel method for the fast up-sampling of 3D objects in voxel space through networks that perform super-resolution on the six orthographic depth projections.

Edward Smith, Scott Fujimoto, David Meger
Image Inpainting Via Generative Multi-column Convolutional Neural Networks

In this paper, the authors propose a generative multi-column network for image inpainting.

Yi Wang, Xin Tao, Xiaojuan Qi, Xiaoyong Shen, Jiaya Jia
Beyond Grids: Learning Graph Representations For Visual Recognition

The authors propose learning graph representations from 2D feature maps for visual recognition.

Yin Li, Abhinav Gupta
Foreground Clustering For Joint Segmentation And Localization In Videos And Images

This paper presents a novel framework in which video/image segmentation and localization are cast into a single optimization problem that integrates information from low level appearance cues with that of high level localization cues in a very weakly supervised manner.

Abhishek Sharma
LinkNet: Relational Embedding For Scene Graph

Objects and their relationships are critical contents for image understanding.In this paper, the authors present a novel method that improves scene graph generation by explicitly modeling inter-dependency among the entire object instances.

Sanghyun Woo, Dahun Kim, Donghyeon Cho, In So Kweon
Context-aware Synthesis And Placement Of Object Instances

Learning to insert an object instance into an image in a semantically coherentmanner is a challenging and interesting problem.In this paper, the authors propose an end-to-end trainable neural network for the task of inserting an object instance mask of a specified class into the semantic label map of an image.

Donghoon Lee, Ming-Yu Liu, Ming-Hsuan Yang, Sifei Liu, Jinwei Gu, Jan Kautz
Geometry-Aware Recurrent Neural Networks For Active Visual Recognition

The authors present recurrent geometry-aware neural networks that integrate visual in-formation across multiple views of a scene into 3D latent feature tensors, whilemaintaining an one-to-one mapping between 3D physical locations in the worldscene and latent feature locations.

Ricson Cheng, Ziyan Wang, Katerina Fragkiadaki
See And Think: Disentangling Semantic Scene Completion

Semantic scene completion predicts volumetric occupancy and object category of a 3D scene, which helps intelligent agents to understand and interact with the surroundings.In this work, the authors propose a disentangled framework, sequentially carrying out 2D semantic segmentation, 2D-3D reprojection and 3D semantic scene completion.

Shice Liu, YU HU, Yiming Zeng, Qiankun Tang, Beibei Jin, Yinhe Han, Xiaowei Li
Active Matting

Image matting is an ill-posed problem.To this end, the authors propose an active matting method with recurrent reinforcement learning.

Xin Yang, Ke Xu, Shaozhe Chen, Shengfeng He, Baocai Yin Yin, Rynson Lau
A Unified Feature Disentangler For Multi-Domain Image Translation And Manipulation

The authors present a novel and unified deep learning framework which is capable of learning domain-invariant representation from data across multiple domains.

Alexander H. Liu, Yen-Cheng Liu, Yu-Ying Yeh, Yu-Chiang Frank Wang
Turbo Learning For CaptionBot And DrawingBot

The authors study in this paper the problems of both image captioning andtext-to-image generation, and present a novel turbo learningapproach to jointly training an image-to-text generator (a.k.a.CaptionBot) and a text-to-image generator (a.k.a.

Qiuyuan Huang, Pengchuan Zhang, Dapeng Wu, Lei Zhang
Dialog-based Interactive Image Retrieval

Existing methods for interactive image retrieval have demonstrated the merit of integrating user feedback, improving retrieval results.In this paper, the authors introduce a new approach to interactive image search that enables users to provide feedback via natural language, allowing for more natural and effective interaction.

Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, Rogerio Feris
Hybrid Retrieval-Generation Reinforced Agent For Medical Image Report Generation

Generating long and coherent reports to describe medical images poses challenges to bridging visual patterns with informative human linguistic descriptions.

Yuan Li, Xiaodan Liang, Zhiting Hu, Eric Xing
Sequential Context Encoding For Duplicate Removal

Duplicate removal is a critical step to accomplish a reasonable amount of predictions in prevalent proposal-based object detection frameworks.In this work, the authors design a new two-stage framework to effectively select the appropriate proposal candidate for each object.

Lu Qi, Shu Liu, Jianping Shi, Jiaya Jia
Hybrid Knowledge Routed Modules For Large-scale Object Detection

Abstract The dominant object detection approaches treat the recognition of each region separately and overlook crucial semantic correlations between objects in one scene.Particularly, the authors present Hybrid Knowledge Routed Modules (HKRM) that incorporates the reasoning routed by two kinds of knowledge forms: an explicit knowledge module for structured constraints that are summarized with linguistic knowledge (e.g.

ChenHan Jiang, Hang Xu, Xiaodan Liang, Liang Lin
SNIPER: Efficient Multi-Scale Training

The authors present SNIPER, an algorithm for performing efficient multi-scale training in instance level visual recognition tasks.

Bharat Singh, Mahyar Najibi, Larry Davis
Revisiting Multi-Task Learning With ROCK: A Deep Residual Auxiliary Block For Visual Detection Taylor Mordan, Nicolas THOME, Gilles Henaff, Matthieu Cord
MetaAnchor: Learning To Detect Objects With Customized Anchors

The authors propose a novel and flexible anchor mechanism named MetaAnchor for object detection frameworks.

Tong Yang, Xiangyu Zhang, Zeming Li, Wenqiang Zhang, Jian Sun
Learning Hierarchical Semantic Image Manipulation Through Structured Representations

Understanding, reasoning, and manipulating semantic concepts of images have been a fundamental research problem for decades.In this work, the authors present a novel hierarchical framework for semantic image manipulation.

Seunghoon Hong, Xinchen Yan, Honglak Lee, Thomas Huang
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, And Camera Pose Estimation

Holistic 3D indoor scene understanding refers to jointly recovering the i) object bounding boxes, ii) room layout, and iii) camera pose, all in 3D.In this paper, the authors propose an end-to-end model that simultaneously solves all three tasks in real-time given only a single RGB image.

Siyuan Huang, Siyuan Qi, Yinxue Xiao, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu
3D-Aware Scene Manipulation Via Inverse Graphics

The authors aim to obtain an interpretable, expressive, and disentangled scene representation that contains comprehensive structural and textural information for each object.

Shunyu Yao, Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, Bill Freeman, Josh Tenenbaum
FD-GAN: Pose-guided Feature Distilling GAN For Robust Person Re-identification Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li
Sequence-to-Segment Networks For Segment Detection

Detecting segments of interest from an input sequence is a challenging problem which often requires not only good knowledge of individual target segments, but also contextual understanding of the entire input sequence and the relationships between the target segments.To address this problem, the authors propose the Sequence-to-Segment Network (S$^2$N), a novel end-to-end sequential encoder-decoder architecture.

Zijun Wei, Boyu Wang, Minh Hoai Nguyen, Jianming Zhang, Zhe Lin, Xiaohui Shen, Radomir Mech, Dimitris Samaras
Stacked Semantics-Guided Attention Model For Fine-Grained Zero-Shot Learning

Zero-Shot Learning (ZSL) is generally achieved via aligning the semantic relationships between the visual features and the corresponding class semantic descriptions.However, using the global features to represent fine-grained images may lead to sub-optimal results since they neglect the discriminative differences of local regions.

Yunlong Yu, Zhong Ji, Yanwei Fu, Jichang Guo, Yanwei Pang, Zhongfei (Mark) Zhang
DeepExposure: Learning To Expose Photos With Asynchronously Reinforced Adversarial Learning

The accurate exposure is the key of capturing high-quality photos in computational photography, especially for mobile phones that are limited by sizes of camera modules.Based on these sub-images, a local exposure for each sub-image is automatically learned by virtue of policy network sequentially while the reward of learning is globally designed for striking a balance of overall exposures.

Runsheng Yu, Wenyu Liu, Yasen Zhang, Zhi Qu, Deli Zhao, Bo Zhang
Self-Erasing Network For Integral Object Attention

Recently, adversarial erasing for weakly-supervised object attention has been deeply studied due to its capability in localizing integral object regions.To tackle such an issue as well as promote the quality of object attention, the authors introduce a simple yet effective Self-Erasing Network (SeeNet) to prohibit attentions from spreading to unexpected background regions.

Andrew Hou, PengTao Jiang, Yunchao Wei, Ming-Ming Cheng
Searching For Efficient Multi-Scale Architectures For Dense Image Prediction

The design of neural network architectures is an important component for achieving state-of-the-art performance with machine learning systems across a broad array of tasks.

Maxwell Collins, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Bo Chen, Jon Shlens
DifNet: Semantic Segmentation By Diffusion Networks

Deep Neural Networks (DNNs) have recently shown state of the art performance on semantic segmentation tasks, however, they still suffer from problems of poor boundary localization and spatial fragmented predictions.The proposed DifNet consistently produces improvements over the baseline models with the same depth and with the equivalent number of parameters, and also achieves promising performance on Pascal VOC and Pascal Context dataset.

Peng Jiang, Fanglin Gu, Yunhai Wang, Changhe Tu, Baoquan Chen
Learning A High Fidelity Pose Invariant Model For High-resolution Face Frontalization

Face frontalization refers to the process of synthesizing the frontal view of a face from a given profile.This paper proposes a High Fidelity Pose Invariant Model (HF-PIM) to produce photographic and identity-preserving results.

Jie Cao, Yibo Hu, Hongwen Zhang, Ran He, Zhenan Sun
Attention In Convolutional LSTM For Gesture Recognition

Convolutional long short-term memory (LSTM) networks have been widely used for action/gesture recognition, and different attention mechanisms have also been embedded into the LSTM or the convolutional LSTM (ConvLSTM) networks.

Liang Zhang, Guangming Zhu, Lin Mei, Peiyi Shen, Syed Afaq Ali Shah, Mohammed Bennamoun
Partially-Supervised Image Captioning

Image captioning models are becoming increasingly successful at describing the content of images in restricted domains.The authors then propose a novel algorithm for training sequence models, such as recurrent neural networks, on partially-specified sequences which the authors represent using finite state automata.

Peter Anderson, Stephen Gould, Mark Johnson
Learning To Specialize With Knowledge Distillation For Visual Question Answering

Visual Question Answering (VQA) is a notoriously challenging problem because it involves various heterogeneous tasks defined by questions within a unified framework.The authors present a principled algorithm to learn specialized models with knowledge distillation under a multiple choice learning (MCL) framework, where training examples are assigned dynamically to a subset of models for updating network parameters.

Jonghwan Mun, Kimin Lee, Jinwoo Shin, Bohyung Han
Chain Of Reasoning For Visual Question Answering

Reasoning plays an essential role in Visual Question Answering (VQA).This paper proposes a novel reasoning model for addressing these problems.

Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong
Learning Conditioned Graph Structures For Interpretable Visual Question Answering

Visual Question answering is a challenging problem requiring a combination of concepts from Computer Vision and Natural Language Processing.Nonetheless, very few rely on higher level image representations, which can capture semantic and spatial relationships.

Will Norcliffe-Brown, Stathis Vafeias, Sarah Parisot
Out Of The Box: Reasoning With Graph Convolution Nets For Factual Visual Question Answering Medhini Narasimhan, Svetlana Lazebnik, Alex Schwing
Overcoming Language Priors In Visual Question Answering With Adversarial Regularization

Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training -- eg overwhelmingly reporting the type of room as kitchen or the sport being played as tennis, irrespective of the image.

Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee
Non-Local Recurrent Network For Image Restoration

Many classic methods have shown non-local self-similarity in natural images to be an effective prior for image restoration.In this paper, the authors propose a non-local recurrent network (NLRN) as the first attempt to incorporate non-local operations into a recurrent neural network (RNN) for image restoration.

Ding Liu, Bihan Wen, Yuchen Fan, Chen Change Loy, Thomas Huang
Neural Nearest Neighbors Networks

Non-local methods exploiting the self-similarity of natural signals have been well studied, for example in image analysis and restoration.To overcome this, the authors propose a continuous deterministic relaxation of KNN selection that maintains differentiability w.r.t.

Tobias Plötz, Stefan Roth
Training Deep Learning Based Denoisers Without Ground Truth Data

Recently developed deep-learning-based denoisers often outperform state-of-the-art conventional denoisers, such as the BM3D.In this article, the authors propose a method based on Stein’s unbiased risk estimator (SURE) for training deep neural network denoisers only based on the use of noisy images.

Shakarim Soltanayev, Se Young Chun
Adversarial Regularizers In Inverse Problems

Inverse Problems in medical imaging and computer vision are traditionally solved using purely model-based methods.The authors propose a new framework for applying data-driven approaches to inverse problems, using a neural network as a regularization functional.

Sebastian Lunz, Carola Schoenlieb, Ozan Öktem
Densely Connected Attention Propagation For Reading Comprehension

The authors propose DecaProp (Densely Connected Attention Propagation), a new densely connected neural architecture for reading comprehension (RC).

Yi Tay, Anh Tuan Luu, Siu Cheung Hui, Jian Su
Layer-Wise Coordination Between Encoder And Decoder For Neural Machine Translation

Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures.In this paper, the authors propose the concept of layer-wise coordination for NMT, which explicitly coordinates the learning of hidden representations of the encoder and decoder together layer by layer, gradually from low level to high level.

Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, Tie-Yan Liu
E-SNLI: Natural Language Inference With Natural Language Explanations

In order for machine learning to garner widespread public adoption, models must be able to provide interpretable and robust explanations for their decisions, as well as learn from human-provided explanations at train time.The authors further implement models that incorporate these explanations into their training process and output them at test time.

Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom
The Challenge Of Realistic Music Generation: Modelling Raw Audio At Scale

Realistic music generation is a challenging task.When building generative models of music that are learnt from data, typically high-level representations such as scores or MIDI are used that abstract away the idiosyncrasies of a particular performance.

Sander Dieleman, Aaron Van Den Oord, Karen Simonyan
Fully Neural Network Based Speech Recognition On Mobile And Embedded Devices

Real-time automatic speech recognition (ASR) on mobile and embedded devices has been of great interests for many years.The authors present real-time speech recognition on smartphones or embedded systems by employing recurrent neural network (RNN) based acoustic models, RNN based language models, and beam-search decoding.

Jinhwan Park, Yoonho Boo, Iksoo Choi, Sungho Shin, Wonyong Sung
Transfer Learning From Speaker Verification To Multispeaker Text-To-Speech Synthesis

The authors describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training.

Ye Jia, Yu Zhang, Ron Weiss, Quan Wang, Jonathan Shen, Fei Ren, ZF Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu
SING: Symbol-to-Instrument Neural Generator

Recent progress in deep learning for audio synthesis opensthe way to models that directly produce the waveform, shifting awayfrom the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation.Inthis work, the authors study the more computationally efficient alternative of generating the waveform frame-by-frame with large strides.The authors present a lightweight neural audio synthesizer for the original task of generating musical notes given desired instrument, pitch and velocity.

Alexandre Defossez, Neil Zeghidour, Nicolas Usunier, Leon Bottou, Francis Bach
Neural Voice Cloning With A Few Samples Sercan Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou
GroupReduce: Block-Wise Low-Rank Approximation For Neural Language Model Shrinking Patrick Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh
Dialog-to-Action: Conversational Question Answering Over A Large-Scale Knowledge Base

The authors present an approach to map utterances in conversation to logical forms, which will be executed on a large-scale knowledge base.

Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, Jian Yin
Generating Informative And Diverse Conversational Responses Via Adversarial Information Maximization

Responses generated by neural conversational models tend to lack informativeness and diversity.The authors present Adversarial Information Maximization (AIM), an adversarial learning framework that addresses these two related but distinct problems.

Yizhe Zhang, Michel Galley, Jianfeng Gao, Zhe Gan, Xiujun Li, Chris Brockett, Bill Dolan
Answerer In Questioner's Mind: Information Theoretic Approach To Goal-Oriented Visual Dialog Sang-Woo Lee, Yu-Jung Heo, Byoung-Tak Zhang
Trajectory Convolution For Action Recognition

How to leverage the temporal dimension is a key question in video analysis.In this work, the authors propose a new CNN architecture TrajectoryNet, which incorporates trajectory convolution, a new operation for integrating features along the temporal dimension, to replace the existing temporal convolution.

Yue Zhao, Yuanjun Xiong, Dahua Lin
Video Prediction Via Selective Sampling

(1) Video prediction can be approached as a stochastic process. (2) De-coupling combined loss functions into dedicatedly designed sub-networks encourages them to work in a collaborative way.

Jingwei Xu, Bingbing Ni, Xiaokang Yang
Unsupervised Learning Of Artistic Styles With Archetypal Style Analysis

The authors introduce an unsupervised learning approach to automatically dis-cover, summarize, and manipulate artistic styles from large collections of paintings.

Daan Wynen, Cordelia Schmid, Julien Mairal
Attacks Meet Interpretability: Attribute-steered Detection Of Adversarial Samples

Adversarial sample attacks perturb benign inputs to induce DNN misbehaviors.Therefore, the authors propose a novel adversarial sample detection technique for face recognition models, based on interpretability.

Guanhong Tao, Shiqing Ma, Yingqi Liu, Xiangyu Zhang
Speaker-Follower Models For Vision-and-Language Navigation

Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers.

Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, LP Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell
Neural Code Comprehension: A Learnable Representation Of Code Semantics

With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis.Most works attempt to process the code directly or use a syntactic tree representation, treating it like sentences written in a natural language.

Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler
MiME: Multilevel Medical Embedding Of Electronic Health Records For Predictive Healthcare

Deep learning models exhibit state-of-the-art performance for many predictive healthcare tasks using electronic health records (EHR) data.The authors propose Multilevel Medical Embedding (MiME) which learns the multilevel embedding of EHR data while jointly performing auxiliary prediction tasks that rely on this inherent EHR structure without the need for external labels.

Edward Choi, Cao Xiao, Walter Stewart, Jimeng Sun
Distilled Wasserstein Learning For Word Embedding And Topic Modeling

The authors propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics.

Hongteng Xu, Wenlin Wang, Wei Liu, Lawrence Carin
Learning To Optimize Tensor Programs

The authors introduce a learning-based framework to optimize tensor programs for deep learning workloads.

Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
Learning To Solve SMT Formulas

The authors present a new approach for learning to solve SMT formulas.

Mislav Balunovic, Pavol Bielik, Martin Vechev
Data Center Cooling Using Model-predictive Control

Despite impressive recent advances in reinforcement learning (RL), its deployment in real-world physical systems is often complicated by unexpected events, limited data, and the potential for expensive failures.In this paper, the authors describe an application of RL “in the wild” to the task of regulating temperatures and airflow inside a large-scale data center (DC).

Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, MK Ryu, Greg Imwalle
Bayesian Inference Of Temporal Task Specifications From Demonstrations

When observing task demonstrations, human apprentices are able to identify whether a given task is executed correctly long before they gain expertise in actually performing that task.Inspired by this, the authors present Bayesian specification inference, a probabilistic model for inferring task specification as a temporal logic formula.

Ankit Shah, Pritish Kamath, Julie A Shah, Shen Li
Training Deep Neural Networks With 8-bit Floating Point Numbers

The state-of-the-art hardware platforms for training deep neural networks are moving from traditional single precision (32-bit) computations towards 16 bits of precision - in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations.

Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, Kailash Gopalakrishnan
Snap ML: A Hierarchical Framework For Machine Learning

The authors describe a new software framework for fast training of generalized linear models.

Celestine Dünner, Thomas Parnell, Dimitrios Sarigiannis, Nikolas Ioannou, Andreea Anghel, Gummadi Ravi, Madhusudanan Kandasamy , Haris Pozidis
Learning Filter Widths Of Spectral Decompositions With Wavelets

Time series classification using deep neural networks, such as convolutional neural networks (CNN), operate on the spectral decomposition of the time series computed using a preprocessing step.The authors propose the wavelet deconvolution (WD) layer as an efficient alternative to this preprocessing step that eliminates a significant number of hyperparameters.

Haidar Khan, Bulent Yener
A Likelihood-Free Inference Framework For Population Genetic Data Using Exchangeable Neural Networks

An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data.Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models.

Jeffrey Chan, Valerio Perrone, Jeffrey Spence, Paul Jenkins, Sara Mathieson, Yun Song
Latent Gaussian Activity Propagation: Using Smoothness And Structure To Separate And Localize Sounds In Large Noisy Environments

The authors present an approach for simultaneously separating and localizingmultiple sound sources using recorded microphone data.

Daniel Johnson, Daniel Gorelik, Ross E Mawhorter, Kyle Suver, Weiqing Gu, Steven Xing, Cody Gabriel, Peter Sankhagowit
Inferring Latent Velocities From Weather Radar Data Using Gaussian Processes

Archived data from the US network of weather radars hold detailed information about bird migration over the last 25 years, including very high-resolution partial measurements of velocity.This paper presents a Gaussian process (GP) model to reconstruct high-resolution full velocity fields across the entire US.

Rico Angell, Dan Sheldon
Bayesian Nonparametric Spectral Estimation

Spectral estimation (SE) aims to identify how the energy of a signal (e.g., a time series) is distributed across different frequencies.In this context, the authors propose a joint probabilistic model for signals, observations and spectra, where SE is addressed as an inference problem.

Felipe Tobar
Learning A Warping Distance From Unlabeled Time Series Using Sequence Autoencoders

Measuring similarities between unlabeled time series trajectories is an important problem in many domains such as medicine, economics, and vision.In this paper, the authors propose an end-to-end framework, autowarp, that optimizes and learns a good metric given unlabeled trajectories.

Abubakar Abid, James Zou
Precision And Recall For Time Series

Classical anomaly detection is principally concerned with point-based anomalies, those anomalies that occur at a single point in time.Motivated by this observation, the authors present a new mathematical model to evaluate the accuracy of time series classification algorithms.

Nesime Tatbul, Tae Jun Lee, Stan Zdonik, Mejbah Alam, Justin Gottschlich
Deep Generative Markov State Models

The authors propose a deep generative Markov State Model (DeepGenMSM) learning framework for inference of metastable dynamical systems and prediction of trajectories.

Hao Wu, Andreas Mardt, Luca Pasquali, Frank Noe
Doubly Robust Bayesian Inference For Non-Stationary Streaming Data With $\beta$-Divergences

The authors present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with $eta$-divergences.

Jeremias Knoblauch, Jack E Jewson, Theo Damoulas
Regularization Learning Networks: Deep Learning For Tabular Datasets

Despite their impressive performance, Deep Neural Networks (DNNs) typically underperform Gradient Boosting Trees (GBTs) on many tabular-dataset learning tasks.The authors propose that applying a different regularization coefficient to each weight might boost the performance of DNNs by allowing them to make more use of the more relevant inputs.

Ira Shavitt, Eran Segal
Generative Modeling For Protein Structures

Analyzing the structure and function of proteins is a key part of understanding biology at the molecular and cellular level.In addition, a major engineering challenge is to design new proteins in a principled and methodical way.

Namrata Anand, Possu Huang
Geometry Based Data Generation

The authors propose a new type of generative model for high-dimensional data that learns a manifold geometry of the data, rather than density, and can generate points evenly along this manifold.

Ofir Lindenbaum, Jay Stanley, Guy Wolf, Smita Krishnaswamy
Learning Concave Conditional Likelihood Models For Improved Analysis Of Tandem Mass Spectra

The most widely used technology to identify the proteins present in a complex biological sample is tandem mass spectrometry, which quickly produces a large collection of spectra representative of the peptides (i.e., protein subsequences) present in the original sample.

John T Halloran, David M Rocke
Generalizing Tree Probability Estimation Via Bayesian Networks

Probability estimation is one of the fundamental tasks in statistics and machine learning.

Cheng Zhang, Frederick A Matsen IV
Learning Temporal Point Processes Via Reinforcement Learning

Social goods, such as healthcare, smart city, and information networks, often produce ordered event data in continuous time.To alleviate the risk of model-misspecification in MLE, the authors propose to generate samples from the generative model and monitor the quality of the samples in the process of training until the samples and the real data are indistinguishable.

Shuang Li, Benjamin Xiao, Shixiang Zhu, Nan Du, Yao Xie, Le Song
Exponentially Weighted Imitation Learning For Batched Historical Data

The authors consider deep policy learning with only batched historical trajectories.To solve this problem, the authors propose a monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space.

Qing Wang, Jiechao Xiong, Lei Han, Peng Sun, Han Liu, Tong Zhang
Evidential Deep Learning To Quantify Classification Uncertainty

Deterministic neural nets have been shown to learn effective predictors on a wide range of machine learning problems.Orthogonally to Bayesian neural nets that indirectly infer prediction uncertainty through weight uncertainties, the authors propose explicit modeling of the same using the theory of subjective logic.

Murat Sensoy, Lance Kaplan, Melih Kandemir
Explanations Based On The Missing: Towards Contrastive Explanations With Pertinent Negatives

In this paper the authors propose a novel method that provides contrastive explanations justifying the classification of an input by a black box classifier such as a deep neural network.

Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, Payel Das
Towards Robust Interpretability With Self-Explaining Neural Networks

Most recent work on interpretability of complex machine learning models has focused on estimating a-posteriori explanations for previously trained models around specific predictions.The authors propose three desiderata for explanations in general -- explicitness, faithfulness, and stability -- and show that existing methods do not satisfy them.

David Alvarez Melis, Tommi Jaakkola
Model Agnostic Supervised Local Explanations

Model interpretability is an increasingly important component of practical machine learning.One of the main challenges in interpretability is designing explanation systems that can capture aspects of each of these explanation types, in order to develop a more thorough understanding of the model.

Gregory Plumb, Denali Molitor, Ameet Talwalkar
To Trust Or Not To Trust A Classifier

The authors propose a new score, called the {\it trust score}, which measures the agreement between the classifier and a modified nearest-neighbor classifier on the testing example.

Heinrich Jiang, Been Kim, Melody Guan, Maya Gupta
Predict Responsibly: Improving Fairness And Accuracy By Learning To Defer

In many machine learning applications, there are multiple decision-makers involved, both automated and human.The authors propose a learning algorithm which accounts for potential biases held by external decision-makers in a system.

David Madras, Toni Pitassi, Richard Zemel
Hunting For Discriminatory Proxies In Linear Regression Models

A machine learning model may exhibit discrimination when used to make decisions involving people.In this paper the authors formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies.

Samuel Yeom, Anupam Datta, Matt Fredrikson
Empirical Risk Minimization Under Fairness Constraints

The authors address the problem of algorithmic fairness: ensuring that sensitive information does not unfairly influence the outcome of a classifier.The authors present an approach based on empirical risk minimization, which incorporates a fairness constraint into the learning problem.

Michele Donini, Luca Oneto, Shai Ben-David, John Shawe-Taylor, Massimiliano Pontil
Approximation Algorithms For Stochastic Clustering

The authors consider stochastic settings for clustering, and develop provably-good (approximation) algorithms for a number of these notions.

David Harris, Shi Li, Aravind Srinivasan, Khoa Trinh, Thomas Pensyl
Re-evaluating Evaluation

Progress in machine learning is measured by careful evaluation on problems of outstanding common interest.Deliberate or accidental cherry picking is increasingly likely, and designing well-balanced evaluation suites requires increasing effort.

David Balduzzi, Karl Tuyls, Julien Perolat, Thore Graepel
Does Mitigating ML's Impact Disparity Require Treatment Disparity?

Following precedent in employment discrimination law, two notions of disparity are widely-discussed in papers on fairness and ML.In this paper, the authors show that: (i) when sensitive and (nominally) nonsensitive features are correlated, DLPs will indirectly implement treatment disparity, undermining the policy desiderata they are designed to address; (ii) when group membership is partly revealed by other features, DLPs induce within-class discrimination; and (iii) in general, DLPs provide suboptimal trade-offs between accuracy and impact parity.

Zachary Lipton, Julian McAuley, Alexandra Chouldechova
Enhancing The Accuracy And Fairness Of Human Decision Making

Societies often rely on human experts to take a wide variety of decisions affecting their members, from jail-or-release decisions taken by judges and stop-and-frisk decisions taken by police officers to accept-or-reject decisions taken by academics.

Isabel Valera, Adish Singla, Manuel Gomez Rodriguez
The Price Of Fair PCA: One Extra Dimension

The authors investigate whether the standard dimensionality reduction technique of PCA inadvertently produces data representations with different fidelity for two different populations.

Samira Samadi, Tao (Uthaipon) Tantipongpipat, Jamie Morgenstern, Mohit Singh, Santosh Vempala
Practical Methods For Graph Two-Sample Testing

Hypothesis testing for graphs has been an important tool in applied research fields for more than two decades, and still remains a challenging problem as one often needs to draw inference from few replicates of large graphs.

Debarghya Ghoshdastidar, Ulrike Von Luxburg
Topkapi: Parallel And Fast Sketches For Finding Top-K Frequent Elements

Identifying the top-K frequent items is one of the most common and important operations in large data processing systems.As a result, several solutions have been proposed to solve this problem approximately.

Ankush Mandal, He Jiang, ANSHUMALI Shrivastava, Vivek Sarkar
KDGAN: Knowledge Distillation With Generative Adversarial Networks

The authors propose a three-player game named KDGAN consisting of a classifier, a teacher, and a discriminator.

Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi
Modeling Dynamic Missingness Of Implicit Feedback For Recommendation

Implicit feedback is widely used in collaborative filtering methods for recommendation.To model and exploit the dynamics of missingness, the authors propose a latent variable named ``emph{user intent}' to govern the temporal changes of item missingness, and a hidden Markov model to represent such a process.

Menghan Wang, Mingming Gong, Xiaolin Zheng, Kun Zhang
Gamma-Poisson Dynamic Matrix Factorization Embedded With Metadata Influence

A conjugate Gamma-Poisson model for Dynamic Matrix Factorization incorporated with metadata influence (mGDMF for short) is proposed to effectively and efficiently model massive, sparse and dynamic data in recommendations.

Trong Dinh Thac Do, Longbing Cao
Non-metric Similarity Graphs For Maximum Inner Product Search

In this paper the authors address the problem of Maximum Inner Product Search (MIPS) that is currently the computational bottleneck in a large number of machine learning applications.The authors propose to solve the MIPS problem with the usage of similarity graphs, i.e., graphs where each vertex is connected to the vertices that are the most similar in terms of some similarity function.

Stanislav Morozov, Artem Babenko
Norm-Ranging LSH For Maximum Inner Product Search

Neyshabur and Srebro proposed SIMPLE-LSH, which is the state-of-the-art hashing based algorithm for maximum inner product search (MIPS).

Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, James Cheng
A Dual Framework For Low-rank Tensor Completion

One of the popular approaches for low-rank tensor completion is to use the latent trace norm regularization.The experiments illustrate the efficacy of the proposed algorithm on several real-world datasets across applications.

Madhav Nimishakavi, Pratik Kumar Jawanpuria, Bamdev Mishra
Low-Rank Tucker Decomposition Of Large Tensors Using TensorSketch

The authors propose two randomized algorithms for low-rank Tucker decomposition of tensors.

Osman Malik, Stephen Becker
Dropping Symmetry For Fast Symmetric Nonnegative Matrix Factorization

Symmetric nonnegative matrix factorization (NMF)---a special but important class of the general NMF---is demonstrated to be useful for data analysis and in particular for various clustering tasks.Unfortunately, designing fast algorithms for Symmetric NMF is not as easy as for the nonsymmetric counterpart, the latter admitting the splitting property that allows efficient alternating-type algorithms.

Zhihui Zhu, Xiao Li, Kai Liu, Qiuwei Li
Semidefinite Relaxations For Certifying Robustness To Adversarial Examples

Despite their impressive performance on diverse tasks, neural networks fail catastrophically in the presence of adversarial inputs—imperceptibly but adversarially perturbed versions of natural inputs.In this paper, the authors propose a new semidefinite relaxation for certifying robustness that applies to arbitrary ReLU networks.

Aditi Raghunathan, Jacob Steinhardt, Percy Liang
Privacy Amplification By Subsampling: Tight Analyses Via Couplings And Divergences

Differential privacy comes equipped with multiple analytical tools for thedesign of private data analyses.

Borja Balle, Gilles Barthe, Marco Gaboardi
Differentially Private Testing Of Identity And Closeness Of Discrete Distributions

The authors study the fundamental problems of identity testing (goodness of fit), and closeness testing (two sample test) of distributions over $k$ elements, under differential privacy.

Jayadev Acharya, Ziteng Sun, Huanyu Zhang
Differentially Private K-Means With Constant Multiplicative Error

The authors design new differentially private algorithms for the Euclidean k-means problem, both in the centralized model and in the local model of differential privacy.

Uri Stemmer, Haim Kaplan
Local Differential Privacy For Evolving Data

There are now several large scale deployments of differential privacy used to collect statistical information about users.

Matthew Joseph, Aaron Roth, Jonathan Ullman, Bo Waggoner
Adversarial Attacks On Stochastic Bandits

The authors study adversarial attacks that manipulate the reward signals to control the actions chosen by a stochastic multi-armed bandit algorithm.

Kwang-Sung Jun, Lihong Li, Yuzhe Ma, Jerry Zhu
Distributed Learning Without Distress: Privacy-Preserving Empirical Risk Minimization

Distributed learning allows a group of independent data owners to collaboratively learn a model over their data sets without exposing their private data.The authors present a distributed learning approach that combines differential privacy with secure multi-party computation.

Bargav Jayaraman, Lingxiao Wang, David Evans, Quanquan Gu
A Spectral View Of Adversarially Robust Features

Given the apparent difficulty of learning models that are robust to adversarial perturbations, the authors propose tackling the simpler problem of developing adversarially robust features.

Shivam Garg, Vatsal Sharan, Brian Zhang, Gregory Valiant
Efficient Formal Safety Analysis Of Neural Networks

Neural networks are increasingly deployed in real-world safety-critical domains such as autonomous driving, aircraft collision avoidance, and malware detection.In this paper, the authors present a new efficient approach for rigorously checking different safety properties of neural networks that significantly outperforms existing approaches by multiple orders of magnitude.

Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, Suman Jana
Contamination Attacks And Mitigation In Multi-Party Machine Learning

Machine learning is data hungry; the more data a model has access to in training, the more likely it is to perform well at inference time.

Jamie Hayes, Olga Ohrimenko
Explaining Deep Learning Models -- A Bayesian Non-parametric Approach

Understanding and interpreting how machine learning (ML) models make decisions have been a big challenge.While recent research has proposed various technical approaches to provide some clues as to how an ML model makes individual predictions, they cannot provide users with an ability to inspect a model as a complete entity.

Wenbo Guo, Sui Huang, Yunzhe Tao, Xinyu (X.Y.) Xing, Lin Lin
Data-Driven Clustering Via Parameterized Lloyd's Families

Algorithms for clustering points in metric spaces is a long-studied area of research.

Maria-Florina Balcan, Travis Dick, Colin White
Manifold Structured Prediction

Structured prediction provides a general framework to deal with supervised problems where the outputs have semantically rich structure.Specifically, the authors study a structured prediction approach to manifold-valued regression.

Alessandro Rudi, Carlo Ciliberto, GianMaria Marconi, Lorenzo Rosasco
Loss Surfaces, Mode Connectivity, And Fast Ensembling Of DNNs

The loss functions of deep neural networks are complex and their geometric properties are not well understood.The authors introduce a training procedure to discover these high-accuracy pathways between modes.

Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, Andrew Wilson
Masking: A New Perspective Of Noisy Supervision

It is important to learn various types of classifiers given training data with noisy labels.In this paper, the authors propose a human-assisted approach called 'Masking' that conveys human cognition of invalid class transitions and naturally speculates the structure of the noise transition matrix.

Bo Han, Jiangchao Yao, Gang Niu, Mingyuan Zhou, Ivor Tsang, Ya Zhang, Masashi Sugiyama
Supervising Unsupervised Learning

The authors introduce a framework to transfer knowledge acquired from a repository of (heterogeneous) supervised datasets to new unsupervised datasets.

Vikas Garg, Adam Kalai
One-Shot Unsupervised Cross Domain Translation Sagie Benaim, Lior Wolf
Neural Architecture Search With Bayesian Optimisation And Optimal Transport

Bayesian Optimisation (BO) refers to a class of methods for global optimisationof a function f which is only accessible via point evaluations.

Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, Eric Xing
Adapted Deep Embeddings: A Synthesis Of Methods For K-Shot Inductive Transfer Learning Tyler Scott, Karl Ridgeway, Mike Mozer
Completing State Representations Using Spectral Learning

A central problem in dynamical system modeling is state discovery—that is, finding a compact summary of the past that captures the information needed to predict the future.Predictive State Representations (PSRs) enable clever spectral methods for state discovery; however, while consistent in the limit of infinite data, these methods often suffer from poor performance in the low data regime.

Nan Jiang, Alex Kulesza, Satinder Singh
Data-Efficient Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve more complex tasks.Yet, the majority of current HRL methods require careful task-specific design and on-policy training, making them difficult to apply in real-world scenarios.

Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, Sergey Levine
The Cluster Description Problem - Complexity Results, Formulations And Approximations

Consider the situation where you are given an existing $k$-way clustering $pi$.

Ian Davidson, Antoine Gourru, S Ravi
Balanced Policy Evaluation And Learning

The authors present a new approach to the problems of evaluating and learning personalized decision policies from observational data of past contexts, decisions, and outcomes.

Nathan Kallus
Exponentiated Strongly Rayleigh Distributions

Strongly Rayleigh (SR) measures are discrete probability distributions over the subsets of a ground set.The authors introduce in this paper Exponentiated Strongly Rayleigh (ESR) measures, which sharpen (or smoothen) the negative dependence property of SR measures via a single parameter (the exponent) that can intuitively understood as an inverse temperature.

Zelda Mariet, Suvrit Sra, Stefanie Jegelka
Parsimonious Bayesian Deep Networks

Combining Bayesian nonparametrics and a forward model selection strategy, the authors construct parsimonious Bayesian deep networks (PBDNs) that infer capacity-regularized network architectures from the data and require neither cross-validation nor fine-tuning when training the model.

Mingyuan Zhou
Stein Variational Gradient Descent As Moment Matching

Stein variational gradient descent (SVGD) is a non-parametric inference algorithm that evolves a set of particles to fit a given distribution of interest.

Qiang Liu, Dilin Wang
Hamiltonian Variational Auto-Encoder

Variational Auto-Encoders (VAE) have become very popular techniques to performinference and learning in latent variable models as they allow us to leverage the richrepresentational power of neural networks to obtain flexible approximations of theposterior of latent variables as well as tight evidence lower bounds (ELBO).

Anthony L Caterini, Arnaud Doucet, Dino Sejdinovic
Predictive Approximate Bayesian Computation Via Saddle Points

Approximate Bayesian computation (ABC) is an important methodology for Bayesian inference when the likelihood function is intractable.In this paper, the authors introduce an optimization-based ABC framework that addresses these deficiencies.

Yingxiang Yang, Bo Dai, Negar Kiyavash, Niao He
Importance Weighting And Variational Inference

Recent work used importance sampling ideas for better variational bounds on likelihoods.

Justin Domke, Dan Sheldon
Orthogonally Decoupled Variational Gaussian Processes

Gaussian processes (GPs) provide a powerful non-parametric framework for reasoning over functions.Despite appealing theory, its superlinear computational and memory complexities have presented a long-standing challenge.

Hugh Salimbeni, Ching-An Cheng, Byron Boots, Marc Deisenroth
ATOMO: Communication-efficient Learning Via Atomic Sparsification

Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes.To mitigate these overheads, several studies propose the use of sparsified stochastic gradients.

Hongyi Wang, Scott Sievert, Shengchao Liu, Micky Charles, Dimitrios Papailiopoulos, Stephen Wright
Sparsified SGD With Memory

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e.Various recent works proposed to use quantization or sparsification techniques to reduce the amount of data that needs to be communicated, for instance by only sending the most significant entries of the stochastic gradient (top-k sparsification).

Sebastian Stich, Jean-Baptiste Cordonnier, Martin Jaggi
SEGA: Variance Reduction Via Gradient Sketching

The authors propose a novel randomized first order optimization method---SEGA (SkEtched GrAdient method)---which progressively throughout its iterations builds a variance-reduced estimate of the gradient from random linear measurements (sketches) of the gradient provided at each iteration by an oracle.

Filip Hanzely, Konstantin Mishchenko, Peter Richtarik
Non-monotone Submodular Maximization In Exponentially Fewer Iterations

In this paper the authors consider parallelization for applications whose objective can beexpressed as maximizing a non-monotone submodular function under a cardinality constraint.

Eric Balkanski, Adam Breuer, Yaron Singer
Stochastic Primal-Dual Method For Empirical Risk Minimization With O(1) Per-Iteration Complexity

Regularized empirical risk minimization problem with linear predictor appears frequently in machine learning.In this paper, the authors propose a new stochastic primal-dual method to solve this class of problems.

Conghui Tan, Tong Zhang, Shiqian Ma, Ji Liu
Rest-Katyusha: Exploiting The Solution's Structure Via Scheduled Restart Schemes

The authors propose a structure-adaptive variant of the state-of-the-art stochastic variance-reduced gradient algorithm Katyusha for regularized empirical risk minimization.

Junqi Tang, Mohammad Golbabaee, Francis Bach, Mike E Davies
Inexact Trust-region Algorithms On Riemannian Manifolds

The authors consider an inexact variant of the popular Riemannian trust-region algorithm for structured big-data minimization problems.The proposed algorithm approximates the gradient and the Hessian in addition to the solution of a trust-region sub-problem.

Hiroyuki Kasai, Bamdev Mishra
On Markov Chain Gradient Descent

Stochastic gradient methods are the workhorse (algorithms) of large-scale optimization problems in machine learning, signal processing, and other computational sciences and engineering.To obtain these results, the authors introduce a new technique that varies the mixing levels of the Markov chains.

Tao Sun, Yuejiao Sun, Wotao Yin
Gradient Descent Meets Shift-and-Invert Preconditioning For Eigenvector Computation

Shift-and-invert preconditioning, as a classic acceleration technique for the leading eigenvector computation, has received much attention again recently, owing to fast least-squares solvers for efficiently approximating matrix inversions in power iterations.The authors present a novel convergence analysis for the constant step-size setting that achieves a rate at $ ilde{O}(sqrt{frac{lambda_{1}}{lambda_{1}-lambda_{p+1}}})$, where $lambda_{i}$ represents the $i$-th largest eigenvalue of the given real symmetric matrix and $p$ is the multiplicity of $lambda_{1}$.

Zhiqiang Xu
Global Non-convex Optimization With Discretized Diffusions

An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems.This allows us to design diffusions suitable for globally optimizing convex and non-convex functions not covered by the existing Langevin theory.

Murat A Erdogdu, Lester Mackey, Ohad Shamir
A Theory On The Absence Of Spurious Solutions For Nonconvex And Nonsmooth Optimization

The authors study the set of continuous functions that admit no spurious local optima (i.e.

Cedric Josz, Yi Ouyang, Richard Zhang, Javad Lavaei, Somayeh Sojoudi
The Limit Points Of (Optimistic) Gradient Descent In Min-Max Optimization

Motivated by applications in Optimization, Game Theory, and the training of Generative Adversarial Networks, the convergence properties of first order methods in min-max problems have received extensive study.

Constantinos Daskalakis, Ioannis Panageas
Porcupine Neural Networks: Approximating Neural Network Landscapes

Neural networks have been used prominently in several machine learning and statistics applications.In particular, for two-layer neural networks the authors introduce Porcupine Neural Networks (PNNs) whose weight vectors are constrained to lie over a finite set of lines.

Soheil Feizi, Hamid Javadi, Jesse Zhang, David Tse
Adding One Neuron Can Eliminate All Bad Local Minima

One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima.In this paper, the authors study the landscape of neural networks for binary classification tasks.

SHIYU LIANG, Ruoyu Sun, Jason Lee, R. Srikant
Improving Explorability In Variational Inference With Annealed Variational Objectives Chin-Wei Huang, Shawn Tan, Alexandre Lacoste, Aaron Courville
Sequential Attend, Infer, Repeat: Generative Modelling Of Moving Objects

The authors present Sequential Attend, Infer, Repeat (SQAIR), an interpretable deep generative model for image sequences.It can reliably discover and track objects through the sequence; it can also conditionally generate future frames, thereby simulating expected motion of objects.

Adam Kosiorek, Hyunjik Kim, Yee Whye Teh, Ingmar Posner
Delta-encoder: An Effective Sample Synthesis Method For Few-shot Object Recognition

Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision.In this work, the authors propose a simple yet effective method for few-shot (and one-shot) object recognition.

Eli Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Abhishek Kumar, Rogerio Feris, Raja Giryes, Alex Bronstein
Joint Active Feature Acquisition And Classification With Variable-Size Set Encoding Hajin Shim, Sung Ju Hwang, Eunho Yang
PCA Of High Dimensional Random Walks With Comparison To Neural Network Training Joseph Antognini, Jascha Sohl-Dickstein
Insights On Representational Similarity In Neural Networks With Canonical Correlation Ari Morcos, Maithra Raghu, Samy Bengio
Adversarial Vulnerability For Any Classifier

Despite achieving impressive performance, state-of-the-art classifiers remain highly vulnerable to small, imperceptible, adversarial perturbations.In this paper, the authors study the phenomenon of adversarial perturbations under the assumption that the data is generated with a smooth generative model.

Alhussein Fawzi, Hamza Fawzi, Omar Fawzi
Sanity Checks For Saliency Maps

Saliency methods have emerged as a popular tool to highlight features in an inputdeemed relevant for the prediction of a learned model.Several saliency methodshave been proposed, often guided by visual appeal on image data.

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim
MetaGAN: An Adversarial Approach To Few-Shot Learning

In this paper, the authors propose a conceptually simple and general framework called MetaGAN for few-shot learning problems.

Ruixiang ZHANG, Tong Che, Zoubin Ghahramani, Yoshua Bengio, Yangqiu Song
Deep Generative Models With Learnable Knowledge Constraints

The broad set of deep generative models (DGMs) has achieved remarkable advances.

Zhiting Hu, Zichao Yang , Russ Salakhutdinov, LIANHUI Qin, Xiaodan Liang, Haoye Dong, Eric Xing
Learning Attractor Dynamics For Generative Memory

A central challenge faced by memory systems is the robust retrieval of a stored pattern in the presence of interference due to other stored patterns and noise.

Yan Wu, Greg Wayne, Karol Gregor, Timothy Lillicrap
Fast Deep Reinforcement Learning Using Online Adjustments From The Past

The authors propose Ephemeral Value Adjusments (EVA): a means of allowing deep reinforcement learning agents to rapidly adapt to experience in their replay buffer.EVA shifts the value predicted by a neural network with an estimate of the value function found by prioritised sweeping over experience tuples from the replay buffer near the current state.

Steven Hansen, Alexander Pritzel, Pablo Sprechmann, Andre Barreto, Charles Blundell
Blockwise Parallel Decoding For Deep Autoregressive Models

Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years.To overcome this limitation, the authors propose a novel blockwise parallel decoding scheme in which the authors make predictions for multiple time steps in parallel then back off to the longest prefix validated by a scoring model.

Mitchell Stern, Noam Shazeer, Jakob Uszkoreit
Automatic Program Synthesis Of Long Programs With A Learned Garbage Collector Amit Zohar, Lior Wolf
The Global Anchor Method For Quantifying Linguistic Shifts And Domain Adaptation

Language is dynamic, constantly evolving and adapting with respect to time, domain or topic.In this paper, the authors introduce the global anchor method for detecting corpus-level language shifts.

Zi Yin, Vin Sachidananda, Balaji Prabhakar
End-to-End Differentiable Physics For Learning And Control

The authors present a differentiable physics engine that can be integrated as a module in deep neural networks for end-to-end learning.

Filipe De Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, J. Zico Kolter
Neural Arithmetic Logic Units

Neural networks can learn to represent and manipulate numerical information, but they seldom generalize well outside of the range of numerical values encountered during training.

Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, Phil Blunsom
Reinforced Continual Learning

Most artificial intelligence models are limited in their ability to solve new tasks faster, without forgetting previously acquired knowledge.In this work, a novel approach for continual learning is proposed, which searches for the best neural architecture for each coming task via sophisticatedly designed reinforcement learning strategies.

Ju Xu, Zhanxing Zhu
Poison Frogs! Targeted Clean-Label Poisoning Attacks On Neural Networks Ali Shafahi, Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, Tom Goldstein
Generalizing To Unseen Domains Via Adversarial Data Augmentation Riccardo Volpi, Hong Namkoong, Ozan Sener, John Duchi, Vittorio Murino, Silvio Savarese
On The Local Hessian In Back-propagation

Back-propagation (BP) is the foundation for successfully training deep neural networks.Meanwhile, BP often works well when combining with ``designing tricks' like orthogonal initialization, batch normalization and skip connection.

Huishuai Zhang, Wei Chen, Tie-Yan Liu
Lipschitz-Margin Training: Scalable Certification Of Perturbation Invariance For Deep Neural Networks

High sensitivity of neural networks against malicious perturbations on inputs causes security concerns.From the relationship between the Lipschitz constants and prediction margins, the authors present a computationally efficient calculation technique to lower-bound the size of adversarial perturbations that can deceive networks, and that is widely applicable to various complicated networks.

Yusuke Tsuzuku, Issei Sato, Masashi Sugiyama
Tangent: Automatic Differentiation Using Source-code Transformation For Dynamically Typed Array Programming

The need to efficiently calculate first- and higher-order derivatives of increasingly complex models expressed in Python has stressed or exceeded the capabilities of available tools.The authors implement and demonstrate these ideas in the Tangent software library for Python, the first AD framework for a dynamic language that uses SCT.

Bart Van Merrienboer, Dan Moldovan, Alexander Wiltschko
Simple, Distributed, And Accelerated Probabilistic Programming

The authors describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem.

Dustin Tran, Matthew Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, Alexey Radul, Matthew Johnson, Rif A. Saurous
Power-law Efficient Neural Codes Provide General Link Between Perceptual Bias And Discriminability Michael Morais, Jonathan W Pillow
DropBlock: A Regularization Method For Convolutional Networks

Deep neural networks often work well when they are over-parameterized and trained with a massive amount of noise and regularization, such as weight decay and dropout.In this paper, the authors introduce DropBlock, a form of structured dropout, where units in a contiguous region of a feature map are dropped together.

Golnaz Ghiasi, Tsung-Yi Lin, Quoc V Le
Learning Sparse Neural Networks Via Sensitivity-driven Regularization

The ever-increasing number of parameters in deep neural networks poses challenges for memory-limited applications.their relevance to the network output) and introduce a regularization term that gradually lowers the absolute value of parameters with low sensitivity.

Enzo Tartaglione, Skjalg Lepsøy, Attilio Fiandrotti, Gianluca Francini
Critical Initialisation For Deep Signal Propagation In Noisy Rectifier Neural Networks Arnu Pretorius, Elan Van Biljon, Steve Kroon, Herman Kamper
The Streaming Rollout Of Deep Networks - Towards Fully Model-parallel Execution

Deep neural networks, and in particular recurrent networks, are promising candidates to control autonomous agents that interact in real-time with the physical world.In this study, the authors present a theoretical framework to describe rollouts, the level of model-parallelization they induce, and demonstrate differences in solving specific tasks.

Volker Fischer, Jan Koehler, Thomas Pfeil
The Spectrum Of The Fisher Information Matrix Of A Single-Hidden-Layer Neural Network

An important factor contributing to the success of deep learning has been the remarkable ability to optimize large neural networks using simple first-order optimization algorithms like stochastic gradient descent.In this work, the authors extend a recently-developed framework for studying spectra of nonlinear random matrices to characterize an important measure of curvature, namely the eigenvalues of the Fisher information matrix.

Jeffrey Pennington, Pratik Worah
Learning Optimal Reserve Price Against Non-myopic Bidders

The authors consider the problem of learning optimal reserve price in repeated auctions against non-myopic bidders, who may bid strategically in order to gain in future rounds even if the single-round auctions are truthful.The authors introduce algorithms that obtain small regret against non-myopic bidders either when the market is large, i.e., no bidder appears in a constant fraction of the rounds, or when the bidders are impatient, i.e., they discount future utility by some factor mildly bounded away from one.

Jinyan Liu, Zhiyi Huang, Xiangning Wang
Beyond Log-concavity: Provable Guarantees For Sampling Multi-modal Distributions Using Simulated Tempering Langevin Monte Carlo

A key task in Bayesian machine learning is sampling from distributions that are only specified up to a partition function (i.e., constant of proportionality).

HOLDEN LEE, Andrej Risteski, Rong Ge
On Binary Classification In Extreme Regions

In pattern recognition, a random label Y is to be predicted based upon observing a random vector X valued in $mathbb{R}^d$ with d>1 by means of a classification rule with minimum probability of error.Beyond theoretical results, numerical experiments are presented in order to illustrate the relevance of the approach developed.

Hamid JALALZAI, Stephan Clémençon, Anne Sabourin
PAC-Bayes Bounds For Stable Algorithms With Instance-dependent Priors

PAC-Bayes bounds have been proposed to get risk estimates based on a training sample.

Omar Rivasplata, Csaba Szepesvari, John Shawe-Taylor, Emilio Parrado-Hernandez, Shiliang Sun
Improved Algorithms For Collaborative PAC Learning

The authors study a recent model of collaborative PAC learning where $k$ players with $k$ different tasks collaborate to learn a single classifier that works for all tasks.

Huy Nguyen, Lydia Zakynthinou
Data Amplification: A Unified And Competitive Approach To Property Estimation

Estimating properties of discrete distributions is a fundamental problem in statistical learning.The authors design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just 2n samples to achieve the performance attained by the empirical estimator with nsqrt{log n} samples.

Yi HAO, Alon Orlitsky, Ananda Theertha Suresh, Yihong Wu
Mean-field Theory Of Graph Neural Networks In Graph Partitioning

A theoretical performance analysis of the graph neural network (GNN) is presented.

Tatsuro Kawamoto, Masashi Tsubaki, Tomoyuki Obuchi
Statistical Mechanics Of Low-rank Tensor Decomposition

Often, large, high dimensional datasets collected across multiplemodalities can be organized as a higher order tensor.

Jonathan Kadmon, Surya Ganguli
Plug-in Estimation In High-Dimensional Linear Inverse Problems: A Rigorous Analysis

Estimating a vector $mathbf{x}$ from noisy linear measurements $mathbf{Ax+w}$ often requires use of prior knowledge or structural constraintson $mathbf{x}$ for accurate reconstruction.Several recent works have considered combining linear least-squares estimation with a generic or plug-in ``denoiser" function that can be designed in a modular manner based on the prior knowledge about $mathbf{x}$.

Alyson Fletcher, Parthe Pandit, Sundeep Rangan, Subrata Sarkar, Phil Schniter
The Description Length Of Deep Learning Models

Deep learning models often have more parameters than observations, and still perform well.This is sometimes described as a paradox.

Léonard Blier, Yann Ollivier
Deepcode: Feedback Codes Via Deep Learning

The design of codes for communicating reliably over a statistically well defined channel is an important endeavor involving deep mathematical research and wide- ranging practical applications.

Hyeji Kim, Yihan Jiang, Sreeram Kannan, Sewoong Oh, Pramod Viswanath
Binary Rating Estimation With Graph Side Information Kwangjun Ahn, Kangwook Lee, Hyunseung Cha, Changho Suh
Optimization Of Smooth Functions With Noisy Observations: Local Minimax Rates

The authors consider the problem of global optimization of an unknown non-convex smooth function with noisy zeroth-order feedback.The authors propose a local minimax framework to study the fundamental difficulty of optimizing smooth functions with adaptive function evaluations.

Yining Wang, Sivaraman Balakrishnan, Aarti Singh
Parameters As Interacting Particles: Long Time Convergence And Asymptotic Error Scaling Of Neural Networks

The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods.

Grant Rotskoff, Eric Vanden-Eijnden
Towards Understanding Acceleration Tradeoff Between Momentum And Asynchrony In Nonconvex Stochastic Optimization

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) have been widely used in distributed machine learning, e.g., training large collaborative filtering systems and deep neural networks.Therefore, the authors propose to analyze the algorithm through a simpler but nontrivial nonconvex problems --- streaming PCA.

Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao
Asymptotic Optimality Of Adaptive Importance Sampling François Portier, Bernard Delyon
Inference Aided Reinforcement Learning For Incentive Mechanism Design In Crowdsourcing Zehong Hu, Yitao Liang, Jie Zhang, Zhao Li, Yang Liu
A Game-Theoretic Approach To Recommendation Systems With Strategic Content Providers

The authors introduce a game-theoretic approach to the study of recommendation systems with strategic content providers.

Omer Ben-Porat, Moshe Tennenholtz
A Mathematical Model For Optimal Decisions In A Representative Democracy

Direct democracy, where each voter casts one vote, fails when the average voter competence falls below 50%.Representative democracy, where voters choose representatives to vote, can be an elixir in both these situations.

Malik Magdon-Ismail, Lirong Xia
Universal Growth In Production Economies

The authors study a simple variant of the von Neumann model of an expanding economy, in which multiple producers make goods according to their production function.

Simina Branzei, Ruta Mehta, Noam Nisan
Convex Elicitation Of Continuous Properties

A property or statistic of a distribution is said to be elicitable if it can be expressed as the minimizer of some loss function in expectation.Recent work shows that continuous real-valued properties are elicitable if and only if they are identifiable, meaning the set of distributions with the same property value can be described by linear constraints.

Jessie Finocchiaro, Rafael Frongillo
Contextual Pricing For Lipschitz Buyers

The authors investigate the problem of learning a Lipschitz function from binary feedback.The problem is motivated by extit{contextual dynamic pricing}, where a firm must sell a stream of differentiated products to a collection of buyers with non-linear valuations for the items and observes only whether the item was sold or not at the posted price.

Jieming Mao, Renato Leme, Jon Schneider
Learning In Games With Lossy Feedback

The authors consider a game-theoretical multi-agent learning problem where the feedback information can be lost during the learning process and rewards are given by a broad class of games known as variationally stable games.The authors propose a simple variant of the classical online gradient descent algorithm, called reweighted online gradient descent (ROGD) and show that in variationally stable games, if each agent adopts ROGD, then almost sure convergence to the set of Nash equilibria is guaranteed, even when the feedback loss is asynchronous and arbitrarily corrrelated among agents.

Zhengyuan Zhou, Panayotis Mertikopoulos, Susan Athey, Nicholas Bambos, Peter W Glynn, Yinyu Ye
Multiplicative Weights Updates With Constant Step-Size In Graphical Constant-Sum Games Yun Kuen Cheung
Solving Large Sequential Games With The Excessive Gap Technique

There has been tremendous recent progress on equilibrium-finding algorithms for zero-sum imperfect-information extensive-form games, but there has been a puzzling gap between theory and practice.Experiments with first-order methods have only been conducted on small- and medium-sized games because those methods are complicated to implement in this setting, and because CFR variants have been enhanced extensively for over a decade they perform well in practice.

Christian Kroer, Gabriele Farina, Tuomas Sandholm
Practical Exact Algorithm For Trembling-hand Equilibrium Refinements In Games

Nash equilibrium strategies have the known weakness that they do not prescribe rational play in situations that are reached with zero probability according to the strategies themselves, for example, if players have made mistakes.In this paper, the authors design an exact polynomial-time algorithm for finding trembling-hand equilibria in zero-sum extensive-form games.

Gabriele Farina, Nicola Gatti, Tuomas Sandholm
Improving Online Algorithms Via ML Predictions

In this work the authors study the problem of using machine-learned predictions to improve performance of online algorithms.

Manish Purohit, Zoya Svitkina, Ravi Kumar
Variance-Reduced Stochastic Gradient Descent On Streaming Data

The authors present an algorithm STRSAGA for efficiently maintaining a machine learning model over data points that arrive over time, quickly updating the model as new training data is observed.

Ellango Jothimurugesan, Ashraf Tahmasbi, Phillip Gibbons, Srikanta Tirthapura
Regret Bounds For Robust Adaptive Control Of The Linear Quadratic Regulator

The authors consider adaptive control of the Linear Quadratic Regulator (LQR), where anunknown linear system is controlled subject to quadratic costs.Leveraging recentdevelopments in the estimation of linear systems and in robust controller synthesis,the authors present the first provably polynomial time algorithm that achieves sub-linearregret on this problem.

Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, Stephen Tu
PAC-learning In The Presence Of Adversaries

The existence of evasion attacks during the test phase of machine learning algorithms represents a significant challenge to both their deployment and understanding.

Daniel Cullina, Arjun Nitin Bhagoji, Prateek Mittal
Tight Bounds For Collaborative PAC Learning Via Multiplicative Weights

The authors study the collaborative PAC learning problem recently proposed in Blum et al.~\cite{BHPQ17}, in which we have $k$ players and they want to learn a target function collaboratively, such that the learned function approximates the target function well on all players' distributions simultaneously.

Jiecao Chen, Qin Zhang, Yuan Zhou
Understanding Weight Normalized Deep Neural Networks With Rectified Linear Units

This paper presents a general framework for norm-based capacity control for $L_{p,q}$ weight normalized deep neural networks.

Yixi Xu, Xiao Wang
Generalization Bounds For Uniformly Stable Algorithms

Uniform stability of a learning algorithm is a classical notion of algorithmic stability introduced to derive high-probability bounds on the generalization error (Bousquet and Elisseeff, 2002).

Vitaly Feldman, Jan Vondrak
Minimax Statistical Learning With Wasserstein Distances

As opposed to standard empirical risk minimization (ERM), distributionally robust optimization aims to minimize the worst-case risk over a larger ambiguity set containing the original empirical distribution of the training data.In this work, the authors describe a minimax framework for statistical learning with ambiguity sets given by balls in Wasserstein space.

Jaeho Lee, Maxim Raginsky
Differentially Private Uniformly Most Powerful Tests For Binomial Data

The authors derive uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of Differential Privacy (DP), optimizing finite sample performance.

Jordan Awan, Aleksandra Slavković
Sketching Method For Large Scale Combinatorial Inference

The authors present computationally efficient algorithms to test various combinatorial structures of large-scale graphical models.

Will Wei Sun, Junwei Lu, Han Liu
An Improved Analysis Of Alternating Minimization For Structured Multi-Response Regression

Multi-response linear models aggregate a set of vanilla linear models by assuming correlated noise across them, which has an unknown covariance structure.In this work, the authors present a resampling-free analysis for the alternating minimization algorithm applied to the multi-response regression.

Sheng Chen, Arindam Banerjee
MixLasso: Generalized Mixed Regression Via Convex Atomic-Norm Regularization

The authors consider a generalization of mixed regression where the response is an additive combination of several mixture components.In this work, the authors study a novel convex estimator emph{MixLasso} for the estimation of generalized mixed regression, based on an atomic norm specifically constructed to regularize the number of mixture components.

Ian En-Hsu Yen, Wei-Cheng Lee, Kai Zhong, Sung-En Chang, Pradeep Ravikumar, Shou-De Lin
A Theory-Based Evaluation Of Nearest Neighbor Models Put Into Practice

In the $k$-nearest neighborhood model ($k$-NN), the authors are given a set of points $P$, and the authors shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric.Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit.

Hendrik Fichtenberger, Dennis Rohde
Sharp Bounds For Generalized Uniformity Testing

The authors study the problem of generalized uniformity testing of a discrete probability distribution: Given samples from a probability distribution p over an unknown size discrete domain Ω, the authors want to distinguish, with probability at least 2/3, between the case that p is uniform on some subset of Ω versus ε-far, in total variation distance, from any such uniform distribution.

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart
Testing For Families Of Distributions Via The Fourier Transform

The authors study the general problem of testing whether an unknown discrete distribution belongs to a specified family of distributions.

Alistair Stewart, Ilias Diakonikolas, Clement Canonne
High Dimensional Linear Regression Using Lattice Basis Reduction

The authors consider a high dimensional linear regression problem where the goal is to efficiently recover an unknown vector eta^* from n noisy linear observations Y=X eta^+W in R^n, for known X in R^{n imes p} and unknown W in R^n.The authors propose a new polynomial-time algorithm for this task which is based on the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction algorithm.

Ilias Zadik, David Gamarnik
$\ell_1$-regression With Heavy-tailed Distributions

In this paper, the authors consider the problem of linear regression with heavy-tailed distributions.To address the challenge that both the input and output could be heavy-tailed, the authors propose a truncated minimization problem, and demonstrate that it enjoys an $O(sqrt{d/n})$ excess risk, where $d$ is the dimensionality and $n$ is the number of samples.

Lijun Zhang, Zhi-Hua Zhou
Constant Regret, Generalized Mixability, And Mirror Descent

The authors consider the setting of prediction with expert advice; a learner makes predictions by aggregating those of a group of experts.

Zakaria Mhammedi, Robert Williamson
Stochastic Composite Mirror Descent: Optimal Bounds With High Probabilities

The authors study stochastic composite mirror descent, a class of scalable algorithms able to exploit the geometry and composite structure of a problem.

Yunwen Lei, Ke Tang
Uniform Convergence Of Gradients For Non-Convex Learning And Optimization

The authors investigate 1) the rate at which refined properties of the empirical risk---in particular, gradients---converge to their population counterparts in standard non-convex learning tasks, and 2) the consequences of this convergence for optimization.The authors propose vector-valued Rademacher complexities as a simple, composable, and user-friendly tool to derive dimension-free uniform convergence bounds for gradients in non-convex learning problems.

Dylan Foster, Ayush Sekhari, Karthik Sridharan
Fast Rates Of ERM And Stochastic Approximation: Adaptive To Error Bound Conditions

Error bound conditions (EBC) are properties that characterize the growth of an objective function when a point is moved away from the optimal set.

Mingrui Liu, Xiaoxuan Zhang, Lijun Zhang, Jing Rong, Tianbao Yang
Nearly Tight Sample Complexity Bounds For Learning Mixtures Of Gaussians Via Sample Compression Schemes

The authors prove that ϴ(k d^2 / ε^2) samples are necessary and sufficient for learning a mixture of k Gaussians in R^d, up to error ε in total variation distance.

Hassan Ashtiani, Shai Ben-David, Nick Harvey, Christopher Liaw, Abbas Mehrabian, Yaniv Plan
Early Stopping For Nonparametric Testing

Early stopping of iterative algorithms is an algorithmic regularization method to avoid over-fitting in estimation and classification.

Meimei Liu, Guang Cheng
Chaining Mutual Information And Tightening Generalization Bounds

Bounding the generalization error of learning algorithms has a long history, which yet falls short in explaining various generalization successes including those of deep learning.In this paper, the authors introduce a technique to combine chaining and mutual information methods, to obtain a generalization bound that is both algorithm-dependent and that exploits the dependencies between the hypotheses.

Amir Asadi, Emmanuel Abbe, Sergio Verdu
Dimensionality Reduction Has Quantifiable Imperfections: Two Geometric Bounds

In this paper, the authors investigate Dimensionality reduction (DR) maps in an information retrieval setting from a quantitative topology point of view.The authors therefore propose a new measure based on Wasserstein distance that comes with similar theoretical guarantee.

Kry Lui, Gavin Weiguang Ding, Ruitong Huang, Robert McCann
Minimax Estimation Of Neural Net Distance

An important class of distance metrics proposed for training generative adversarial networks (GANs) is the integral probability metric (IPM), in which the neural net distance captures the practical GAN training via two neural networks.

Kaiyi Ji, Yingbin Liang
Quantifying Learning Guarantees For Convex But Inconsistent Surrogates

The authors study consistency properties of machine learning methods based on minimizing convex surrogates.

Kirill Struminsky, Simon Lacoste-Julien, Anton Osokin
Learning Signed Determinantal Point Processes Through The Principal Minor Assignment Problem

Symmetric determinantal point processes (DPP) are a class of probabilistic models that encode the random selection of items that have a repulsive behavior.

Victor-Emmanuel Brunel
Data-dependent PAC-Bayes Priors Via Differential Privacy

The Probably Approximately Correct (PAC) Bayes framework (McAllester, 1999) can incorporate knowledge about the learning algorithm and (data) distribution through the use of distribution-dependent priors, yielding tighter generalization bounds on data-dependent posteriors.Using this flexibility, however, is difficult, especially when the data distribution is presumed to be unknown.

Gintare Karolina Dziugaite, Dan Roy
Computationally And Statistically Efficient Learning Of Causal Bayes Nets Using Path Queries

Causal discovery from empirical data is a fundamental problem in many scientific domains.In this paper the authors first propose a polynomial time algorithm for learning the exact correctly-oriented structure of the transitive reduction of any causal Bayesian network with high probability, by using interventional path queries.

Kevin Bello, Jean Honorio
PAC-Bayes Tree: Weighted Subtrees With Guarantees

The authors present a weighted-majority classification approach over subtrees of a fixed tree, which provably achieves excess-risk of the same order as the best tree-pruning.

Stan D Nguyen, Samory Kpotufe
A Loss Framework For Calibrated Anomaly Detection

Given samples from a probability distribution, anomaly detection is the problem of determining if a given point lies in a low-density region.

Aditya Menon, Robert Williamson
On Oracle-Efficient PAC RL With Rich Observations

The authors study the computational tractability of PAC reinforcement learning with rich observations.

Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert Schapire
Adversarial Risk And Robustness: General Definitions And Implications For The Uniform Distribution

The authors study adversarial perturbations when the instances are uniformly distributed over {0,1}^n.

Dimitrios Diochnos, Saeed Mahloujifar, Mohammad Mahmoody
Learning From Discriminative Feature Feedback

The authors consider the problem of learning a multi-class classifier from labels as well as simple explanations that the authors call "discriminative features".The authors present an efficient online algorithm for learning from such feedback and the authors give tight bounds on the number of mistakes made during the learning process.

Sanjoy Dasgupta, Sivan Sabato, Nick Roberts, Akansha Dey
How To Start Training: The Effect Of Initialization And Architecture

The authors identify and study two common failure modes for early training in deep ReLU nets.

Boris Hanin, David Rolnick
Bilevel Distance Metric Learning For Robust Image Recognition

Metric learning, aiming to learn a discriminative Mahalanobis distance matrix M that can effectively reflect the similarity between data samples, has been widely studied in various image recognition problems.In this paper, the authors integrate both feature extraction and metric learning into one joint optimization framework and propose a new bilevel distance metric learning model.

Jie Xu, Lei Luo, Cheng Deng, Heng Huang
Deep Non-Blind Deconvolution Via Generalized Low-Rank Approximation

In this paper, the authors present a deep convolutional neural network to capture the inherent properties of image degradation, which can handle different kernels and saturated pixels in a unified framework.

Wenqi Ren, Jiawei Zhang, Lin Ma, Jinshan Pan, Xiaochun Cao, Wangmeng Zuo, Wei Liu, Ming-Hsuan Yang
Unsupervised Depth Estimation, 3D Face Rotation And Replacement

The authors present an unsupervised approach for learning to estimate three dimensional (3D) facial structure from a single image while also predicting 3D viewpoint transformations that match a desired pose and facial geometry.The authors achieve this by inferring the depth of facial keypoints of an input image in an unsupervised manner, without using any form of ground-truth depth information.

Joel Ruben Antony Moniz, Christopher Beckham, Simon Rajotte, Sina Honari, Chris Pal
Neighbourhood Consensus Networks

The authors address the problem of finding reliable dense correspondences between a pair of images.

Ignacio Rocco, Mircea Cimpoi, Relja Arandjelović, Akihiko Torii, Tomas Pajdla, Josef Sivic
Recurrent Transformer Networks For Semantic Correspondence

The authors present recurrent transformer networks (RTNs) for obtaining dense correspondences between semantically similar images.

Seungryong Kim, Stephen Lin, SANG RYUL JEON, Dongbo Min, Kwanghoon Sohn
Deep Network For The Integrated 3D Sensing Of Multiple People In Natural Images

The authors present MubyNet -- a feed-forward, multitask, bottom up system for the integrated localization, as well as 3d pose and shape estimation, of multiple people in monocular images.

Andrei Zanfir, Elisabeta Marinoiu, Mihai Zanfir, Alin-Ionut Popa, Cristian Sminchisescu
A Neural Compositional Paradigm For Image Captioning

Mainstream captioning models often follow a sequential structure to generate cap-tions, leading to issues such as introduction of irrelevant semantics, lack of diversity in the generated captions, and inadequate generalization performance.In this paper, the authors present an alternative paradigm for image captioning, which factorizes the captioning procedure into two stages: (1) extracting an explicit semantic representation from the given image; and (2) constructing the caption based on a recursive compositional procedure in a bottom-up manner.

Bo Dai, Sanja Fidler, Dahua Lin
Visual Memory For Robust Path Following

Humans routinely retrace a path in a novel environment both forwards and backwards despite uncertainty in their motion.In this paper, the authors present an approach for doing so.

Ashish Kumar, Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik
Learning To Exploit Stability For 3D Scene Parsing

Human scene understanding uses a variety of visual and non-visual cues to perform inference on object types, poses, and relations.The authors then present a novel architecture for 3D scene parsing named Prim R-CNN, learning to predict bounding boxes as well as their 3D size, translation, and rotation.

Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu
Learning To Decompose And Disentangle Representations For Video Prediction Jun-Ting Hsieh, Bingbin Liu, De-An Huang, Li Fei-Fei, Juan Carlos Niebles
Weakly Supervised Dense Event Captioning In Videos

Dense event captioning aims to detect and describe all events of interest contained in a video.

Xuguang Duan, Wenbing Huang, Chuang Gan, Jingdong Wang, Wenwu Zhu, Junzhou Huang
Text-Adaptive Generative Adversarial Networks: Manipulating Images With Natural Language

This paper addresses the problem of manipulating images using natural language description.In this paper, the authors propose the text-adaptive generative adversarial network (TAGAN) to generate semantically manipulated images while preserving text-irrelevant contents.

Seonghyeon Nam, Yunji Kim, Seon Joo Kim
A Probabilistic U-Net For Segmentation Of Ambiguous Images

Many real-world vision problems suffer from inherent ambiguities.To this end the authors propose a generative segmentation model based on a combination of a U-Net with a conditional variational autoencoder that is capable of efficiently producing an unlimited number of plausible hypotheses.

Simon Kohl, Bernardino Romera-Paredes, Clemens Meyer, Jeffrey De Fauw, Joseph R. Ledsam, Klaus Maier-Hein, Ali Eslami, Danilo Jimenez Rezende, Olaf Ronneberger
Forward Modeling For Partial Observation Strategy Games - A StarCraft Defogger

The authors formulate the problem of defogging as state estimation and future state prediction from previous, partial observations in the context of real-time strategy games.The authors propose to employ encoder-decoder neural networks for this task, and introduce proxy tasks and baselines for evaluation to assess their ability of capturing basic game rules and high-level dynamics.

Gabriel Synnaeve, Zeming Lin, Jonas Gehring, Dan Gant, Vegard Mella, Vasil Khalidov, Nicolas Carion, Nicolas Usunier
Adversarial Text Generation Via Feature-Mover's Distance

Generative adversarial networks (GANs) have achieved significant success in generating real-valued data.Instead of using the standard GAN objective, the authors propose to improve text-generation GAN via a novel approach inspired by optimal transport.

Liqun Chen, Shuyang Dai, Chenyang Tao, Haichao Zhang, Zhe Gan, Dinghan Shen, Yizhe Zhang, Guoyin Wang, Ruiyi Zhang, Lawrence Carin
Virtual Class Enhanced Discriminative Embedding Learning

Recently, learning discriminative features to improve the recognition performances gradually becomes the primary goal of deep learning, and numerous remarkable works have emerged.In this paper, the authors propose a novel yet extremely simple method Virtual Softmax to enhance the discriminative property of learned features by injecting a dynamic virtual negative class into the original softmax.

Binghui Chen, Weihong Deng, Haifeng Shen
Learning Pipelines With Limited Data And Domain Knowledge: A Study In Parsing Physics Problems

As machine learning becomes more widely used in practice, the authors need new methods to build complex intelligent systems that integrate learning with existing software, and with domain knowledge encoded as rules.As a case study, the authors present such a system that learns to parse Newtonian physics problems in textbooks.

Mrinmaya Sachan, Kumar Avinava Dubey, Tom Mitchell, Dan Roth, Eric Xing
Pipe-SGD: A Decentralized Pipelined SGD Framework For Distributed Deep Net Training

Distributed training of deep nets is an important technique to address some of the present day computing challenges like memory consumption and computational demands.

Youjie Li, Mingchao Yu, Songze Li, Salman Avestimehr, Nam Sung Kim, Alex Schwing
MULAN: A Blind And Off-Grid Method For Multichannel Echo Retrieval

This paper addresses the general problem of blind echo retrieval, i.e., given M sensors measuring in the discrete-time domain M mixtures of K delayed and attenuated copies of an unknown source signal, can the echo location and weights be recovered?

Helena Peic Tukuljac, Antoine Deleforge, Remi Gribonval
Diminishing Returns Shape Constraints For Interpretability And Regularization

The authors investigate machine learning models that can provide diminishing returns and accelerating returns guarantees to capture prior knowledge or policies about how outputs should depend on inputs.

Maya Gupta, Dara Bahri, Andy Cotter, Kevin Canini
Fairness Through Computationally-Bounded Awareness

The authors study the problem of fair classification within the versatile framework of Dwork et al.

Michael Kim, Omer Reingold, Guy Rothblum
On Preserving Non-discrimination When Combining Expert Advice

The authors study the interplay between sequential decision making and avoiding discrimination against protected groups, when examples arrive online and do not follow distributional assumptions.

Avrim Blum, Suriya Gunasekar, Thodoris Lykouris, Nati Srebro
Fairness Behind A Veil Of Ignorance: A Welfare Analysis For Automated Decision Making

The authors draw attention to an important, yet largely overlooked aspect of evaluating fairness for automated decision making systems---namely risk and welfare considerations

Hoda Heidari, Claudio Ferrari, Krishna Gummadi, Andreas Krause
Inequity Aversion Improves Cooperation In Intertemporal Social Dilemmas

Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas.

Edward Hughes, Joel Leibo, Matthew Phillips, Karl Tuyls, Edgar Dueñez-Guzman, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin McKee, Raphael Koster, Heather Roff, Thore Graepel
On Misinformation Containment In Online Social Networks

The widespread online misinformation could cause public panic and serious economic damages.Motivated by realistic scenarios, the authors present the first analysis of the misinformation containment problem for the case when an arbitrary number of cascades are allowed.

Amo Tong, Ding-Zhu Du, Weili Wu
Found Graph Data And Planted Vertex Covers

A typical way in which network data is recorded is to measure all interactions involving a specified set of core nodes, which produces a graph containing this core together with a potentially larger set of fringe nodes that link to the core.Interactions between nodes in the fringe, however, are not present in the resulting graph data.

Austin Benson, Jon Kleinberg
Inferring Networks From Random Walk-Based Node Similarities

Digital presence in the world of online social media entails significant privacy risks.

Jeremy Hoskins, Cameron Musco, Christopher Musco, Babis Tsourakakis
Fast Greedy MAP Inference For Determinantal Point Process To Improve Recommendation Diversity

The determinantal point process (DPP) is an elegant probabilistic model of repulsion with applications in various machine learning tasks including summarization and search.To overcome the computational challenge, in this paper, the authors propose a novel algorithm to greatly accelerate the greedy MAP inference for DPP.

Laming Chen, Guoxin Zhang, Eric Zhou
Unorganized Malicious Attacks Detection

Recommender systems have attracted much attention during the past decade.This attack style occurs in many real applications, yet relevant study remains open.

Ming Pang, Wei Gao, Min Tao, Zhi-Hua Zhou
Scalable Robust Matrix Factorization With Nonconvex Loss

Robust matrix factorization (RMF), which uses the $ell_1$-loss, often outperforms standard matrix factorization using the $ell_2$-loss, particularly when outliers are present.

Quanming Yao, James Kwok
Differentially Private Robust Low-Rank Approximation

In this paper, the authors study the following robust low-rank matrix approximation problem: given a matrix $A in R^{n imes d}$, find a rank-$k$ matrix $B$, while satisfying differential privacy, such that $ orm{ A - B }_p leq alpha mathsf{OPT}_k(A) + au,$ where $ orm{ M }_p$ is the entry-wise $ell_p$-norm and $mathsf{OPT}_k(A):=min_{mathsf{rank}(X) leq k} orm{ A - X}_p$.

Raman Arora, Vladimir Braverman, Jalaj Upadhyay
Differential Privacy For Growing Databases

The large majority of differentially private algorithms focus on the static setting, where queries are made on an unchanging database.

Rachel Cummings, Sara Krehbiel, Kevin A Lai, Tao (Uthaipon) Tantipongpipat
Efficient Neural Network Robustness Certification With General Activation Functions

Finding minimum distortion of adversarial examples and thus certifying robustness in neural networks classifiers is known to be a challenging problem.To address this issue, in this paper the authors introduce CROWN, a general framework to certify robustness of neural networks with general activation functions.

Huan Zhang, Lily Weng, Pin-Yu Chen, Cho-Jui Hsieh, Luca Daniel
Spectral Signatures In Backdoor Attacks

A recent line of work has uncovered a new form of data poisoning: so-called backdoor attacks.

Brandon Tran, Jerry Li, Aleksander Madry
Constructing Unrestricted Adversarial Examples With Generative Models

Adversarial examples are typically constructed by perturbing an existing data point within a small matrix norm, and current defense methods are focused on guarding against this type of attack.In this paper, the authors propose a new class of adversarial examples that are synthesized entirely from scratch using a conditional generative model, without being restricted to norm-bounded perturbations.

Yang Song, Rui Shu, Nate Kushman, Stefano Ermon
Sublinear Time Low-Rank Approximation Of Distance Matrices

Let $PP={ p_1, p_2, ldots p_n }$ and $QQ = { q_1, q_2 ldots q_m }$ be two point sets in an arbitrary metric space.Let $AA$ represent the $m imes n$ pairwise distance matrix with $AA_{i,j} = d(p_i, q_j)$.

Ainesh Bakshi, David Woodruff
Leveraged Volume Sampling For Linear Regression

Suppose an n x d design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested.

Michal Derezinski, Manfred Warmuth, Daniel Hsu
Revisiting $(\epsilon, \gamma, \tau)$-similarity Learning For Domain Adaptation

Similarity learning is an active research area in machine learning that tackles the problem of finding a similarity function tailored to an observable data sample in order to achieve efficient classification.In this paper, the authors propose to extend the theoretical analysis of similarity learning to the domain adaptation setting, a particular situation occurring when the similarity is learned and then deployed on samples following different probability distributions.

Sofiane Dhouib, Ievgen Redko
Differential Properties Of Sinkhorn Approximation For Learning With Wasserstein Distance

Applications of optimal transport have recently gained remarkable attention as a result of the computational advantages of entropic regularization.

Giulia Luise, Alessandro Rudi, Massimiliano Pontil, Carlo Ciliberto
Algorithms And Theory For Multiple-Source Adaptation

The authors present a number of novel contributions to the multiple-source adaptation problem.

Judy Hoffman, Mehryar Mohri, Ningshan Zhang
Synthesize Policies For Transfer And Adaptation Across Tasks And Environments

The ability to transfer in reinforcement learning is key towards building an agent of general artificial intelligence.The authors propose a novel compositional neural network architecture which depicts a meta rule for composing policies from environment and task embeddings.

Hexiang Hu, Liyu Chen, Boqing Gong, Fei Sha
Acceleration Through Optimistic No-Regret Dynamics

The authors consider the problem of minimizing a smooth convex function by reducing the optimization to computing the Nash equilibrium of a particular zero-sum convex-concave game.

Jun-Kun Wang, Jacob Abernethy
Efficient Online Portfolio With Logarithmic Regret

The authors study the decades-old problem of online portfolio management and propose the first algorithm with logarithmic regret that is not based on Cover's Universal Portfolio algorithm and admits much faster implementation.

Haipeng Luo, Chen-Yu Wei, Kai Zheng
Almost Optimal Algorithms For Linear Stochastic Bandits With Heavy-Tailed Payoffs

In linear stochastic bandits, it is commonly assumed that payoffs are with sub-Gaussian noises.In this paper, under a weaker assumption on noises, the authors study the problem of underline{lin}ear stochastic {underline b}andits with h{underline e}avy-{underline t}ailed payoffs (LinBET), where the distributions have finite moments of order $1+epsilon$, for some $epsilonin (0,1]$.

Han Shao, Xiaotian Yu, Irwin King, Michael Lyu
A Smoothed Analysis Of The Greedy Algorithm For The Linear Contextual Bandit Problem

Bandit learning is characterized by the tension between long-term exploration and short-term exploitation.

Sampath Kannan, Jamie Morgenstern, Aaron Roth, Bo Waggoner, Zhiwei Steven Wu
Exploration In Structured Reinforcement Learning

The authors address reinforcement learning problems with finite state and action spaces where the underlying MDP has some known structure that could be potentially exploited to minimize the exploration rates of suboptimal (state, action) pairs.

Jungseul Ok, Alexandre Proutiere, Damianos Tranos
Near Optimal Exploration-Exploitation In Non-Communicating Markov Decision Processes

While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e.g., in mountain car, the product space of speed and position contains configurations that are not physically reachable).

Ronan Fruit, Matteo Pirotta, Alessandro Lazaric
A Block Coordinate Ascent Algorithm For Mean-Variance Optimization Tengyang Xie, Bo Liu, Yangyang Xu, Mohammad Ghavamzadeh, Yinlam Chow, Daoming Lyu, Daesub Yoon
Learning Safe Policies With Expert Guidance

The authors propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function may be difficult to specify.

Jessie Huang, Fa Wu, Doina Precup, Yang Cai
M-Walk: Learning To Walk Over Graphs Using Monte Carlo Tree Search Yelong Shen, Jianshu Chen, Po-Sen Huang, Yuqing Guo, Jianfeng Gao
Is Q-Learning Provably Efficient?

Model-free reinforcement learning (RL) algorithms directly parameterize and update value functions or policies, bypassing the modeling of the environment.

Chi Jin, Zeyuan Allen-Zhu, Sebastien Bubeck, Michael I. Jordan
Variational Inverse Control With Events: A General Framework For Data-Driven Reward Definition

The design of a reward function often poses a major practical challenge to real-world applications of reinforcement learning.

Justin Fu, Avi Singh, Dibya Ghosh, Larry Yang, Sergey Levine
An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Policy gradient methods are widely used for control in reinforcement learning, particularly for the continuous action setting.There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence of the policy gradient theorem which provides a simplified form for the gradient.

Ehsan Imani, Eric Graves, Martha White
Near-Optimal Time And Sample Complexities For Solving Markov Decision Processes With A Generative Model

In this paper the authors consider the problem of computing an $epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided the authors can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.

Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye
Monte-Carlo Tree Search For Constrained POMDPs

Monte-Carlo Tree Search (MCTS) has been successfully applied to very large POMDPs, a standard model for stochastic sequential decision-making problems.In this paper, the authors present CC-POMCP (Cost-Constrained POMCP), an online MCTS algorithm for large CPOMDPs that leverages the optimization of LP-induced parameters and only requires a black-box simulator of the environment.

Jongmin Lee, Geon-hyeong Kim, Pascal Poupart, Kee-Eung Kim
Equality Of Opportunity In Classification: A Causal Approach

The Equalized Odds (for short, EO) is one of the most popular measures of discrimination used in the supervised learning setting.

Justin Zhang, Elias Bareinboim
Confounding-Robust Policy Improvement

The authors study the problem of learning personalized decision policies from observational data while accounting for possible unobserved confounding in the data-generating process.

Nathan Kallus, Angela Zhou
Causal Discovery From Discrete Data Using Hidden Compact Representation

Causal discovery from a set of observations is one of the fundamental problems across several disciplines.In this paper the authors make an attempt to find a way to solve this problem by assuming a two-stage causal process: the first stage maps the cause to a hidden variable of a lower cardinality, and the second stage generates the effect from the hidden representation.

Ruichu Cai, Jie Qiao, Kun Zhang, Zhenjie Zhang, Zhifeng Hao
Dirichlet Belief Networks For Topic Structure Learning

Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures.Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously studied.

He Zhao, Lan Du, Wray Buntine, Mingyuan Zhou
Approximate Knowledge Compilation By Online Collapsed Importance Sampling

The authors introduce collapsed compilation, a novel approximate inference algorithm for discrete probabilistic graphical models.

Tal Friedman, Guy Van Den Broeck
Proximal Graphical Event Models

Event datasets include events that occur irregularly over the timeline and are prevalent in numerous domains.The authors introduce proximal graphical event models (PGEM) as a representation of such datasets.

Debarun Bhattacharjya, Shankar Subramanian, Tian Gao
Dynamic Network Model From Partial Observations

Can evolving networks be inferred and modeled without directly observing their nodes and edges?The authors propose a novel framework for providing a non-parametric dynamic network model---based on a mixture of coupled hierarchical Dirichlet processes---based on data capturing cascade node infection times.

Elahe Ghalebi, Baharan Mirzasoleiman, Radu Grosu, Jure Leskovec
HOGWILD!-Gibbs Can Be PanAccurate Constantinos Daskalakis, Nishanth Dikkala, Siddhartha Jayanti
DAGs With NO TEARS: Continuous Optimization For Structure Learning

Estimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes.In this paper, the authors introduce a fundamentally different strategy: the authors formulate the structure learning problem as a purely continuous optimization problem over real matrices that avoids this combinatorial constraint entirely.

Xun Zheng, Bryon Aragam, Pradeep Ravikumar, Eric Xing
Mean Field For The Stochastic Blockmodel: Optimization Landscape And Convergence Issues

Variational approximation has been widely used in large-scale Bayesian inference recently, the simplest kind of which involves imposing a mean field assumption to approximate complicated latent structures.

Soumendu Sundar Mukherjee, Purnamrita Sarkar, Y. X. Rachel Wang, Bowei Yan
Coupled Variational Bayes Via Optimization Embedding

Variational inference plays a vital role in learning graphical models, especially on large-scale datasets.In this paper, the authors proposed coupled variational Bayes which exploits the primal-dual view of the ELBO with the variational distribution class generated by an optimization procedure, which is termed optimization embedding.

Bo Dai, Hanjun Dai, Niao He, Weiyang Liu, Zhen Liu, Jianshu Chen, Lin Xiao, Le Song
Stochastic Nonparametric Event-Tensor Decomposition

Tensor decompositions are fundamental tools for multiway data analysis.To address these issues, the authors formulate event-tensors, to preserve the complete temporal information for multiway data, and propose a novel Bayesian nonparametric decomposition model.

Shandian Zhe, Yishuai Du
GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference With GPU Acceleration

Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware.The authors present an efficient and general approach to GP inference based on Blackbox Matrix-Matrix multiplication (BBMM).

Jacob Gardner, Geoff Pleiss, Kilian Weinberger, David Bindel, Andrew Wilson
Heterogeneous Multi-output Gaussian Process Prediction

The authors present a novel extension of multi-output Gaussian processes for handling heterogeneous outputs.

Pablo Moreno-Muñoz, Antonio Artés, Mauricio Álvarez
Probabilistic Matrix Factorization For Automated Machine Learning

In order to achieve state-of-the-art performance, modern machine learning techniques require careful data pre-processing and hyperparameter tuning.In this paper, the authors propose to solve this meta-learning task by combining ideas from collaborative filtering and Bayesian optimization.

Nicolo Fusi, Rishit Sheth, Melih Elibol
Stochastic Expectation Maximization With Variance Reduction

Expectation-Maximization (EM) is a popular tool for learning latent variable models, but the vanilla batch EM does not scale to large data sets because the whole data set is needed at every E-step.In this paper, the authors propose a variance reduced stochastic EM (sEM-vr) algorithm inspired by variance reduced stochastic gradient descent algorithms.

Jianfei Chen, Jun Zhu, Yee Whye Teh, Tong Zhang
Generative Neural Machine Translation

The authors introduce Generative Neural Machine Translation (GNMT), a latent variable architecture which is designed to model the semantics of the source and target sentences.

Harshil Shah, David Barber
Sparse Covariance Modeling In High Dimensions With Gaussian Processes

This paper studies statistical relationships among components of high-dimensional observations varying across non-random covariates.

Rui Li, Kishan KC, Feng Cui, Justin Domke, Anne Haake
Variational Learning On Aggregate Outputs With Gaussian Processes

While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global mapping of disease, only have access to outputs at a much coarser level than that of the inputs.The authors propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account.

Ho Chung Law, Dino Sejdinovic, Ewan Cameron, Tim Lucas, Seth Flaxman, Katherine Battle, Kenji Fukumizu
Learning Invariances Using The Marginal Likelihood

In many supervised learning tasks, learning what changes do not affect the predic-tion target is as crucial to generalisation as learning what does.Data augmentationis a common way to enforce a model to exhibit an invariance: training data is modi-fied according to an invariance designed by a human and added to the training data.The authors argue that invariances should be incorporated the model structure, and learnedusing themarginal likelihood, which can correctly reward the reduced complexityof invariant models.

Mark Van Der Wilk, Matthias Bauer, Ti John, James Hensman
Dirichlet-based Gaussian Processes For Large-scale Calibrated Classification

This paper studies the problem of deriving fast and accurate classification algorithms with uncertainty quantification.To this aim, the authors propose a novel regression approach where the labels are obtained through the interpretation of classification labels as the coefficients of a degenerate Dirichlet distribution.

Dimitrios Milios, Raffaello Camoriano, Pietro Michiardi, Lorenzo Rosasco, Maurizio Filippone
Regret Bounds For Meta Bayesian Optimization With An Unknown Gaussian Process Prior

Bayesian optimization usually assumes that a Bayesian prior is given.

Zi Wang, Beomjoon Kim, Leslie Kaelbling
Efficient High Dimensional Bayesian Optimization With Additivity And Quadrature Fourier Features

The authors develop an efficient and provably no-regret Bayesian optimization (BO) algorithm for optimization of black-box functions in high dimensions.

Mojmir Mutny, Andreas Krause
Adversarially Robust Optimization With Gaussian Processes

In this paper, the authors consider the problem of Gaussian process (GP) optimization with an added robustness requirement: The returned point may be perturbed by an adversary, and the authors require the function value to remain as high as possible even after this perturbation.This problem is motivated by settings in which the underlying functions during optimization and implementation stages are different, or when one is interested in finding an entire region of good inputs rather than only a single point.

Ilija Bogunovic, Jonathan Scarlett, Stefanie Jegelka, Volkan Cevher
Multi-objective Maximization Of Monotone Submodular Functions With Cardinality Constraint

The authors consider the problem of multi-objective maximization of monotone submodular functions subject to cardinality constraint, often formulated as $max_{|A|=k}min_{iin{1,dots,m}}f_i(A)$.

Rajan Udwani
Variational PDEs For Acceleration On Manifolds And Application To Diffeomorphisms Ganesh Sundaramoorthi, Anthony Yezzi
Zeroth-order (Non)-Convex Stochastic Optimization Via Conditional Gradient And Gradient Updates

In this paper, the authors propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization.

Krishnakumar Balasubramanian, Saeed Ghadimi
Computing Higher Order Derivatives Of Matrix And Tensor Expressions

Optimization is an integral part of most machine learning systems and most numerical optimization schemes rely on the computation of derivatives.Here, the authors close this fundamental gap and present an algorithmic framework for computing matrix and tensor derivatives that extends seamlessly to higher order derivatives.

Soeren Laue, Matthias Mitterreiter, Joachim Giesen
How SGD Selects The Global Minima In Over-parameterized Learning: A Dynamical Stability Perspective

The question of which global minima are accessible by a stochastic gradient decent (SGD) algorithm with specific learning rate and batch size is studied from the perspective of dynamical stability.The concept of non-uniformity is introduced, which, together with sharpness, characterizes the stability property of a global minimum and hence the accessibility of a particular SGD algorithm to that global minimum.

Lei Wu, Chao Ma, Weinan E
The Effect Of Network Width On The Performance Of Large-batch Training

Distributed implementations of mini-batch stochastic gradient descent (SGD) suffer from communication overheads, attributed to the high frequency of gradient updates inherent in small-batch training.

Lingjiao Chen, Hongyi Wang, Jinman Zhao, Dimitrios Papailiopoulos, Paraschos Koutris
COLA: Decentralized Linear Learning

Decentralized machine learning is a promising emerging paradigm in view of global challenges of data ownership and privacy.The authors consider learning of linear classification and regression models, in the setting where the training data is decentralized over many user devices, and the learning algorithm must run on-device, on an arbitrary communication network, without a central coordinator.The authors propose COLA, a new decentralized training algorithm with strong theoretical guarantees and superior practical performance.

Lie He, An (Yatao) Bian, Martin Jaggi
Distributed Stochastic Optimization Via Adaptive SGD

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data.In this paper, the authors propose an efficient distributed stochastic optimization method by combining adaptivity with variance reduction techniques.

Ashok Cutkosky, Róbert Busa-Fekete
Non-Ergodic Alternating Proximal Augmented Lagrangian Algorithms With Optimal Rates

The authors develop two new non-ergodic alternating proximal augmented Lagrangian algorithms (NEAPAL) to solve a class of nonsmooth constrained convex optimization problems.

Quoc Tran Dinh
Breaking The Span Assumption Yields Fast Finite-Sum Minimization

In this paper, the authors show that SVRG and SARAH can be modified to be fundamentally faster than all of the other standard algorithms that minimize the sum of $n$ smooth functions, such as SAGA, SAG, SDCA, and SDCA without duality.Moreover, the authors present lower bound results that show this speedup is optimal, and provide analysis to help explain why this speedup exists.

Robert Hannah, Yanli Liu, Daniel O'Connor, Wotao Yin
Optimization For Approximate Submodularity

The authors consider the problem of maximizing a submodular function when given access to its approximate version.The authors describe a technique which the authors call the sampledmean approximation that yields strong guarantees for maximization of submodular functions from approximate surrogates under cardinality and intersection of matroid constraints.

Yaron Singer, Avinatan Hassidim
Submodular Maximization Via Gradient Ascent: The Case Of Deep Submodular Functions

The authors study the problem of maximizing deep submodular functions (DSFs) subject to a matroid constraint.

Wenruo Bai, William Stafford Noble, Jeff Bilmes
Maximizing Induced Cardinality Under A Determinantal Point Process

Determinantal point processes (DPPs) are well-suited to recommender systems where the goal is to generate collections of diverse, high-quality items.

Jennifer Gillenwater, Alex Kulesza, Sergei Vassilvitskii, Zelda Mariet
Efficient Algorithms For Non-convex Isotonic Regression Through Submodular Optimization

The authors consider the minimization of submodular functions subject to ordering constraints.The authors propose new discretization schemes that lead to simple and efficient algorithms based on zero-th, first, or higher order oracles; these algorithms also lead to improvements without isotonic constraints.

Francis Bach
Revisiting Decomposable Submodular Function Minimization With Incidence Relations

The authors introduce a new approach to decomposable submodular function minimization (DSFM) that exploits incidence relations.

Pan Li, Olgica Milenkovic
Coordinate Descent With Bandit Sampling

Coordinate descent methods minimize a cost function by updating a single decision variable (corresponding to one coordinate) at a time.Therefore, the authors propose a new adaptive method for coordinate descent.

Farnood Salehi, Patrick Thiran, Ecelis Celis
Accelerated Stochastic Matrix Inversion: General Theory And Speeding Up BFGS Rules For Faster Second-Order Optimization

The authors present the first accelerated randomized algorithm for solving linear systems in Euclidean spaces.

Robert Gower, Filip Hanzely, Peter Richtarik, Sebastian Stich
Stochastic Cubic Regularization For Fast Nonconvex Optimization

This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak].

Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeff Regier, Michael I. Jordan
On The Local Minima Of The Empirical Risk Chi Jin, Lydia T. Liu, Rong Ge, Michael I. Jordan
Stochastic Nested Variance Reduced Gradient Descent For Nonconvex Optimization

The authors study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions.

Dongruo Zhou, Pan Xu, Quanquan Gu
NEON2: Finding Local Minima Via First-Order Oracles

The authors propose a reduction for non-convex optimization that can (1) turn an stationary-point finding algorithm into an local-minimum finding one, and (2) replace the Hessian-vector product computations with only gradient computations.

Zeyuan Allen-Zhu, Yuanzhi Li
How Much Restricted Isometry Is Needed In Nonconvex Matrix Recovery?

When the linear measurements of an instance of low-rank matrix recoverysatisfy a restricted isometry property (RIP) --- i.e.

Richard Zhang, Cedric Josz, Somayeh Sojoudi, Javad Lavaei
Escaping Saddle Points In Constrained Optimization

In this paper, the authors study the problem of escaping from saddle points in smoothnonconvex optimization problems subject to a convex set $mathcal{C}$.

Aryan Mokhtari, Asuman Ozdaglar, Ali Jadbabaie
Analysis Of Krylov Subspace Solutions Of Regularized Non-Convex Quadratic Problems

The authors provide convergence rates for Krylov subspace solutions to the trust-region and cubic-regularized (nonconvex) quadratic problems.

Yair Carmon, John Duchi
SPIDER: Near-Optimal Non-Convex Optimization Via Stochastic Path-Integrated Differential Estimator

In this paper, the authors propose a new technique named extit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interests with significantly reduced computational cost.

Cong Fang, Chris Junchi Li, Zhouchen Lin, Tong Zhang
Natasha 2: Faster Non-Convex Optimization Than SGD

The authors design a stochastic algorithm to find $varepsilon$-approximate local minima of any smooth nonconvex function in rate $O(varepsilon^{-3.25})$, with only oracle access to stochastic gradients.

Zeyuan Allen-Zhu
Zeroth-Order Stochastic Variance Reduction For Nonconvex Optimization Sijia Liu, Bhavya Kailkhura, Pin-Yu Chen, Paishun Ting, Shiyu Chang, Lisa Amini
Structured Local Minima In Sparse Blind Deconvolution

Blind deconvolution is a ubiquitous problem of recovering two unknown signals from their convolution.

Yuqian Zhang, Henry Kuo, John Wright
Algorithmic Regularization In Learning Deep Homogeneous Models: Layers Are Automatically Balanced

The authors study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear, ReLU or Leaky ReLU activation.

Simon Du, Wei Hu, Jason Lee
Are ResNets Provably Better Than Linear Predictors?

A residual network (or ResNet) is a standard deep neural net architecture, with state-of-the-art performance across numerous applications.

Ohad Shamir
Adaptive Methods For Nonconvex Optimization

Adaptive gradient methods that rely on scaling gradients down by the square root of exponential moving averages of past squared gradients, such RMSProp, Adam, Adadelta have found wide application in optimizing the nonconvex problems that arise in deep learning.

Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar
Alternating Optimization Of Decision Trees, With Application To Learning Sparse Oblique Trees

Learning a decision tree from data is a difficult optimization problem.

Miguel A. Carreira-Perpinan, Pooya Tavallali
GILBO: One Metric To Measure Them All

The authors propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund).

Alex Alemi, Ian Fischer
Isolating Sources Of Disentanglement In Variational Autoencoders

The authors decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables.The authors use this to motivate the beta-TCVAE (Total Correlation Variational Autoencoder) algorithm, a refinement and plug-in replacement of the beta-VAE for learning disentangled representations, requiring no additional hyperparameters during training.

Ricky Chen, Xuechen Li, Roger Grosse, David Duvenaud
On GANs And GMMs

A longstanding problem in machine learning is to find unsupervised methods that can learn the statistical structure of high dimensional signals.

Eitan Richardson, Yair Weiss
Assessing Generative Models Via Precision And Recall

Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison.

Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, Sylvain Gelly
Gather-Excite: Exploiting Feature Context In Convolutional Neural Networks

While the use of bottom-up local operators in convolutional neural networks (CNNs) matches well some of the statistics of natural images, it may also prevent such models from capturing contextual long-range feature interactions.In this work, the authors propose a simple, lightweight approach for better context exploitation in CNNs.

Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Andrea Vedaldi
Uncertainty-Aware Attention For Reliable Interpretation And Prediction

Attention mechanism is effective in both focusing the deep learning models on relevant features and interpreting them.To overcome this limitation, the authors introduce the notion of input-dependent uncertainty to the attention mechanism, such that it generates attention for each feature with varying degrees of noise based on the given input, to learn larger variance on instances it is uncertain about.

Jay Heo, Hae Beom Lee, Saehoon Kim, Juho Lee, Kwang Joon Kim, Eunho Yang, Sung Ju Hwang
Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks

The authors introduce the Recurrent Marginal Structural Network - a sequence-to-sequence architecture for forecasting a patient's expected response to a series of planned treatments.

Bryan Lim, Ahmed M. Alaa, Mihaela Van Der Schaar
Backpropagation With Callbacks: Foundations For Efficient And Expressive Differentiable Programming

Training of deep learning models depends on gradient descent and end-to-enddifferentiation.

Fei Wang, James Decker, Xilun Wu, Gregory Essertel, Tiark Rompf
Recurrent World Models Facilitate Policy Evolution

A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations.

David Ha, Jürgen Schmidhuber
Long Short-term Memory And Learning-to-learn In Networks Of Spiking Neurons

Recurrent networks of spiking neurons (RSNNs) underlie the astounding computing and learning capabilities of the brain.One is that RSNNs in the brain are not randomly connected or designed according to simple rules, and they do not start learning as a tabula rasa network.

Guillaume Bellec, Darjan Salaj, Anand Subramoney, Robert Legenstein, Wolfgang Maass
Distributed Weight Consolidation: A Brain Segmentation Case Study

Collecting the large datasets needed to train deep neural networks can be very difficult, particularly for the many applications for which sharing and pooling data is complicated by practical, ethical, or legal concerns.In this paper, the authors introduce distributed weight consolidation (DWC), a continual learning method to consolidate the weights of separate neural networks, each trained on an independent dataset.

Patrick McClure, Charles Zheng, Jakub Kaczmarzyk, John Rogers-Lee, Satra Ghosh, Dylan Nielson, Peter A Bandettini, Francisco Pereira
Learning To Play With Intrinsically-Motivated, Self-Aware Agents

Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals.The authors seek to mathematically formalize these abilities using a neural network that implements curiosity-driven intrinsic motivation.

Nick Haber, Damian Mrowca, Stephanie Wang, Li Fei-Fei, Daniel Yamins
Gradient Descent For Spiking Neural Networks

Most large-scale network models use neurons with static nonlinearities that produce analog output, despite the fact that information processing in the brain is predominantly carried out by dynamic neurons that produce discrete pulses called spikes.Here, the authors present a gradient descent method for optimizing spiking network models by introducing a differentiable formulation of spiking dynamics and deriving the exact gradient calculation.

Ben Huh, Terrence J Sejnowski
Demystifying Excessively Volatile Human Learning: A Bayesian Persistent Prior And A Neural Approximation

Understanding how humans and animals learn about statistical regularities in stable and volatile environments, and utilize these regularities to make predictions and decisions, is an important problem in neuroscience and psychology.Because exact Bayesian inference in this setting, an example of switching state space models, is computationally intense, a number of approximately Bayesian and heuristic algorithms have been proposed to account for learning/prediction in the brain.

Chaitanya Ryali, Gautam Reddy, Angela J Yu
Temporal Alignment And Latent Gaussian Process Factor Inference In Population Spike Trains

The authors introduce a novel scalable approach to identifying common latent structure in neural population spike-trains, which allows for variability both in the trajectory and in the rate of progression of the underlying computation.

Lea Duncker, Maneesh Sahani
Information-based Adaptive Stimulus Selection To Optimize Communication Efficiency In Brain-Computer Interfaces

Stimulus-driven brain-computer interfaces (BCIs), such as the P300 speller, rely on using a sequence of sensory stimuli to elicit specific neural responses as control signals, while a user attends to relevant target stimuli that occur within the sequence.In current BCIs, the stimulus presentation schedule is typically generated in a pseudo-random fashion.

Boyla Mainsah, Dmitry Kalika, Leslie Collins, Siyuan Liu, Chandra Throckmorton
Model-based Targeted Dimensionality Reduction For Neuronal Population Data

Summarizing high-dimensional data using a small number of parameters is a ubiquitous first step in the analysis of neuronal population activity.Recently developed methods use "targeted" approaches that work by identifying multiple, distinct low-dimensional subspaces of activity that capture the population response to individual experimental task variables, such as the value of a presented stimulus or the behavior of the animal.

Mikio Aoi, Jonathan W Pillow
Objective And Efficient Inference For Couplings In Neuronal Networks

Inferring directional couplings from the spike data of networks is desired in various scientific fields such as neuroscience.Here, the authors apply a recently proposed objective procedure to the spike data obtained from the Hodgkin-Huxley type models and in vitro neuronal networks cultured in a circular structure.

Yu Terada, Tomoyuki Obuchi, Takuya Isomura, Yoshiyuki Kabashima
The Emergence Of Multiple Retinal Cell Types Through Efficient Coding Of Natural Movies

One of the most striking aspects of early visual processing in the retina is the immediate parcellation of visual information into multiple parallel pathways, formed by different retinal ganglion cell types each tiling the entire visual field.

Samuel Ocko, Jack Lindsey, Surya Ganguli, Stephane Deny
Benefits Of Over-parameterization With EM

Expectation Maximization (EM) is among the most popular algorithms for maximum likelihood estimation, but it is generally only guaranteed to find its stationary points of the log-likelihood objective.The goal of this article is to present theoretical and empirical evidence that over-parameterization can help EM avoid spurious local optima in the log-likelihood.

Ji Xu, Daniel Hsu, Arian Maleki
On Coresets For Logistic Regression

Coresets are one of the central methods to facilitate the analysis of large data.To deal with intractable worst-case instances the authors introduce a complexity measure $mu(X)$, which quantifies the hardness of compressing a data set for logistic regression.

Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David Woodruff
On Learning Markov Chains

The problem of estimating an unknown discrete distribution from its samples is a fundamental tenet of statistical learning.

Yi HAO, Alon Orlitsky, Venkatadheeraj Pichapati
Contextual Stochastic Block Models

The authors provide the first information theoretical tight analysis for inference of latent community structure given a sparse graph along with high dimensional node covariates, correlated with the same latent communities.

Yash Deshpande, Subhabrata Sen, Andrea Montanari, Elchanan Mossel
Estimators For Multivariate Information Measures In General Probability Spaces Arman Rahimzamani, Himanshu Asnani, Pramod Viswanath, Sreeram Kannan
Blind Deconvolutional Phase Retrieval Via Convex Programming

The authors consider the task of recovering two real or complex $m$-vectors from phaseless Fourier measurements of their circular convolution.

Ali Ahmed, Alireza Aghasi, Paul Hand
Entropy Rate Estimation For Markov Chains With Large State Space

Entropy estimation is one of the prototypical problems in distribution property testing.

Yanjun Han, Jiantao Jiao, Chuan-Zheng Lee, Tsachy Weissman, Yihong Wu, Tiancheng Yu
Bandit Learning In Concave N-Person Games

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games.

Mario Bravo, David Leslie, Panayotis Mertikopoulos
Depth-Limited Solving For Imperfect-Information Games

A fundamental challenge in imperfect-information games is that states do not have well-defined values.This paper introduces a principled way to conduct depth-limited solving in imperfect-information games by allowing the opponent to choose among a number of strategies for the remainder of the game at the depth limit.

Noam Brown, Tuomas Sandholm, Brandon Amos
The Physical Systems Behind Optimization Algorithms Lin Yang, Raman Arora, Vladimir Braverman, Tuo Zhao
The Nearest Neighbor Information Estimator Is Adaptively Near Minimax Rate-Optimal

The authors analyze the Kozachenko–Leonenko (KL) fixed k-nearest neighbor estimator for the differential entropy.

Jiantao Jiao, Weihao Gao, Yanjun Han
Robust Learning Of Fixed-Structure Bayesian Networks

The authors investigate the problem of learning Bayesian networks in a robust model where an $epsilon$-fraction of the samples are adversarially corrupted.In this work, the authors study the fully observable discrete case where the structure of the network is given.

Yu Cheng, Ilias Diakonikolas, Daniel Kane, Alistair Stewart
Information-theoretic Limits For Community Detection In Network Models

The authors analyze the information-theoretic limits for the recovery of node labels in several network models.

Chuyang Ke, Jean Honorio
Generalizing Graph Matching Beyond Quadratic Assignment Model

Graph matching has received persistent attention over decades, which can be formulated as a quadratic assignment problem (QAP).The authors show that a large family of functions, which the authors define as Separable Functions, can approximate discrete graph matching in the continuous domain asymptotically by varying the approximation controlling parameters.

Tianshu Yu, Junchi Yan, Yilin Wang, Wei Liu, Baoxin Li
Improving Simple Models With Confidence Profiles

In this paper, the authors propose a new method called ProfWeight for transferring information from a pre-trained deep neural network that has a high test accuracy to a simpler interpretable model or a very shallow network of low complexity and a priori low test accuracy.

Amit Dhurandhar, Karthikeyan Shanmugam, Ronny Luss, Peder A Olsen
Online Learning With An Unknown Fairness Metric

The authors consider the problem of online learning in the linear contextual bandits setting, but in which there are also strong individual fairness constraints governed by an unknown similarity metric.This is intended to represent the interventions of a regulator who "knows unfairness when he sees it" but nevertheless cannot enunciate a quantitative fairness metric over individuals.

Stephen Gillen, Christopher Jung, Michael Kearns, Aaron Roth
Legendre Decomposition For Tensors

The authors present a novel nonnegative tensor decomposition method, called Legendre decomposition, which factorizes an input tensor into a multiplicative combination of parameters.

Mahito Sugiyama, Hiroyuki Nakahara, Koji Tsuda
The Price Of Privacy For Low-rank Factorization

In this paper, the authors study what price one has to pay to release emph{differentially private low-rank factorization} of a matrix.

Jalaj Upadhyay
Empirical Risk Minimization In Non-interactive Local Differential Privacy Revisited

In this paper, the authors revisit the Empirical Risk Minimization problem in the non-interactive local model of differential privacy.Then, the authors propose player-efficient algorithms with $1$-bit communication complexity and $O(1)$ computation cost for each player.

Di Wang, Marco Gaboardi, Jinhui Xu
Differentially Private Change-Point Detection

The change-point detection problem seeks to identify distributional changes at an unknown change-point k* in a stream of data.The authors study the statistical problem of change-point problem through the lens of differential privacy.

Sara Krehbiel, Rachel Cummings, Wanrong Zhang, Yajun Mei, Rui Tuo
Scalable Laplacian K-modes

The authors advocate Laplacian K-modes for joint clustering and density mode finding,and propose a concave-convex relaxation of the problem, which yields a parallelalgorithm that scales up to large datasets and high dimensions.

Imtiaz Ziko, Eric Granger, Ismail Ben Ayed
Geometrically Coupled Monte Carlo Sampling

Monte Carlo sampling in high-dimensional, low-sample settings is important in many machine learning tasks.The authors improve current methods for sampling in Euclidean spaces by avoiding independence, and instead consider ways to couple samples.

Mark Rowland, Krzysztof Choromanski, François Chalus, Aldo Pacchiano, Tamas Sarlos, Richard E Turner, Adrian Weller
Continuous-time Value Function Approximation In Reproducing Kernel Hilbert Spaces

Motivated by the success of reinforcement learning (RL) for discrete-time tasks such as AlphaGo and Atari games, there has been a recent surge of interest in using RL for continuous-time control of physical systems (cf.many challenging tasks in OpenAI Gym and DeepMind Control Suite).Since discretization of time is susceptible to error, it is methodologically more desirable to handle the system dynamics directly in continuous time.However, very few techniques exist for continuous-time RL and they lack flexibility in value function approximation.In this paper, the authors propose a novel framework for model-based continuous-time value function approximation in reproducing kernel Hilbert spaces.The resulting framework is so flexible that it can accommodate any kind of kernel-based approach, such as Gaussian processes and kernel adaptive filters, and it allows us to handle uncertainties and nonstationarity without prior knowledge about the environment or what basis functions to employ.The authors demonstrate the validity of the presented framework through experiments.

Motoya Ohnishi, Masahiro Yukawa, Mikael Johansson, Masashi Sugiyama
Faster Online Learning Of Optimal Threshold For Consistent F-measure Optimization

In this paper, the authors consider online F-measure optimization (OFO).To advance OFO, the authors propose an efficient online algorithm based on simultaneously learning a posterior probability of class and learning an optimal threshold by minimizing a stochastic strongly convex function with unknown strong convexity parameter.

Xiaoxuan Zhang, Mingrui Liu, Xun Zhou, Tianbao Yang
Reducing Network Agnostophobia

Agnostophobia, the fear of the unknown, can be experienced by deep learning engineers while applying their networks to real-world applications.The authors also introduce a new evaluation metric that focuses on comparing the performance of multiple approaches in scenarios where such unseen classes or unknowns are encountered.

Akshay Raj Dhamija, Manuel Günther, Terrance Boult
Life-Long Disentangled Representation Learning With Cross-Domain Latent Homologies

The authors propose a novel algorithm for unsupervised representation learning from piece-wise stationary visual data: Variational Autoencoder with Shared Embeddings (VASE).

Alessandro Achille, Tom Eccles, Loic Matthey, Chris Burgess, Nick Watters, Alexander Lerchner, Irina Higgins
Near-Optimal Policies For Dynamic Multinomial Logit Assortment Selection Models

In this paper the authors consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model.

Yining Wang, Xi Chen, Yuan Zhou
The Everlasting Database: Statistical Validity At A Fair Price

The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries.The authors propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional samples.

Blake Woodworth, Vitaly Feldman, Saharon Rosset, Nati Srebro
Scalar Posterior Sampling With Applications

The authors propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule.

Georgios Theocharous, Zheng Wen, Yasin Abbasi, Nikos Vlassis
Iterative Value-Aware Model Learning

This paper introduces a model-based reinforcement learning (MBRL) framework that incorporates the underlying decision problem in learning the transition model of the environment.

Amir-massoud Farahmand
A Lyapunov-based Approach To Safe Reinforcement Learning

In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints.The authors define and present a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local linear constraints.

Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh
Temporal Regularization For Markov Decision Process

Several applications of Reinforcement Learning suffer from instability due to highvariance.

Pierre Thodoroff, Audrey Durand, Joelle Pineau, Doina Precup
Maximum Causal Tsallis Entropy Imitation Learning

In this paper, the authors propose a novel maximum causal Tsallis entropy (MCTE) framework for imitation learning which can efficiently learn a sparse multi-modal policy distribution from demonstrations.

Kyungjae Lee, Sungjoon Choi, Songhwai Oh
Policy Optimization Via Importance Sampling

Policy optimization is an effective reinforcement learning approach to solve continuous control tasks.However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate.

Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli
Reinforcement Learning Of Theorem Proving

The authors introduce a theorem proving algorithm that uses practically no domain heuristics for guiding its connection-style proof search.

Cezary Kaliszyk, Josef Urban, Henryk Michalewski, Miroslav Olšák
Simple Random Search Of Static Linear Policies Is Competitive For Reinforcement Learning

Model-free reinforcement learning aims to offer off-the-shelf solutions for controlling dynamical systems without requiring models of the system dynamics.The authors introduce a model-free random search algorithm for training static, linear policies for continuous control problems.

Horia Mania, Aurelia Guy, Benjamin Recht
Meta-Gradient Reinforcement Learning

The goal of reinforcement learning algorithms is to estimate and/or optimisethe value function.

Zhongwen Xu, Hado Van Hasselt, David Silver
Reinforcement Learning For Solving The Vehicle Routing Problem

The authors present an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning.

MohammadReza Nazari, Afshin Oroojlooy, Lawrence Snyder, Martin Takac
Learn What Not To Learn: Action Elimination With Deep Reinforcement Learning

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.In this work, the authors propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates sub-optimal actions.

Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J Mankowitz, Shie Mannor
REFUEL: Exploring Sparse Features In Deep Reinforcement Learning For Fast Disease Diagnosis

This paper proposes REFUEL, a reinforcement learning method with two techniques: {em reward shaping} and {em feature rebuilding}, to improve the performance of online symptom checking for disease diagnosis.

Yu-Shao Peng, Kai-Fu Tang, Hsuan-Tien Lin, Edward Chang
Learning Plannable Representations With Causal InfoGAN Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel
Improving Exploration In Evolution Strategies For Deep Reinforcement Learning Via A Population Of Novelty-Seeking Agents

Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g.This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration.

Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth Stanley, Jeff Clune
Transfer Of Deep Reactive Policies For MDP Planning

Domain-independent probabilistic planners input an MDP description in a factored representation language such as PPDDL or RDDL, and exploit the specifics of the representation for faster planning.

Aniket (Nick) Bajpai, Sankalp Garg, Mausam
Q-learning With Nearest Neighbors

The authors consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available.

Devavrat Shah, Qiaomin Xie
Distributed Multitask Reinforcement Learning With Quadratic Convergence

Multitask reinforcement learning (MTRL) suffers from scalability issues when the number of tasks or trajectories grows large.Recent methods exploited the connection between MTRL and general consensus to propose scalable solutions.

Rasul Tutunov, Dongho Kim, Haitham Bou Ammar
Breaking The Curse Of Horizon: Infinite-Horizon Off-Policy Estimation

The authors consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy.

Qiang Liu, Lihong Li, Ziyang Tang, Denny Zhou
Constrained Cross-Entropy Method For Safe Reinforcement Learning

The authors study a safe reinforcement learning problem in which the constraints are defined as the expected cost over finite-length trajectories.

Min Wen, Ufuk Topcu
Representation Balancing MDPs For Off-policy Policy Evaluation

The authors study the problem of off-policy policy evaluation (OPPE) in RL.

Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo A Faisal, Finale Doshi-Velez, Emma Brunskill
Dual Policy Iteration

Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [1], AlphaGo-Zero from [2]).In this work the authors study this Dual Policy Iteration (DPI) strategy in an alternating optimization framework and provide a convergence analysis that extends existing API theory.

Wen Sun, Geoffrey Gordon, Byron Boots, J. Bagnell
Occam's Razor Is Insufficient To Infer The Preferences Of Irrational Agents

Inverse reinforcement learning (IRL) attempts to infer human rewards or preferences from observed behavior.

Stuart Armstrong, Sören Mindermann
Transfer Of Value Functions Via Variational Methods Andrea Tirinzoni, Rafael Rodriguez Sanchez, Marcello Restelli
Reinforcement Learning With Multiple Experts: A Bayesian Model Combination Approach

Potential based reward shaping is a powerful technique for accelerating convergence of reinforcement learning algorithms.

Mike Gimelfarb, Scott Sanner, Chi-Guhn Lee
Online Robust Policy Learning In The Presence Of Unknown Adversaries

The growing prospect of deep reinforcement learning (DRL) being used in cyber-physical systems has raised concerns around safety and robustness of autonomous agents.This paper introduces a Meta-Learned Advantage Hierarchy (MLAH) framework that is attack model-agnostic and more suited to reinforcement learning, via handling the attacks in the decision space (as opposed to data space) and directly mitigating learned bias introduced by the adversary.

Aaron Havens, Zhanhong Jiang, Soumik Sarkar
A Bayesian Approach To Generative Adversarial Imitation Learning

Generative adversarial training for imitation learning has shown promising results on high-dimensional and continuous control tasks.Although this approach has shown to robustly learn to imitate even with scarce demonstration, one must still address the inherent challenge that collecting trajectory samples in each iteration is a costly operation.

Wonseok Jeon, Seokin Seo, Kee-Eung Kim
Verifiable Reinforcement Learning Via Policy Extraction

While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies.The authors propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured).

Osbert Bastani, Yewen Pu, Armando Solar-Lezama
Deep Reinforcement Learning Of Marked Temporal Point Processes

In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time.Can the authors design online interventions that will help humans achieve certain goals in such asynchronous setting?

Utkarsh Upadhyay, Abir De, Manuel Gomez Rodriguez
On Learning Intrinsic Rewards For Policy Gradient Methods

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer.

Zeyu Zheng, Junhyuk Oh, Satinder Singh
Evolution-Guided Policy Gradient In Reinforcement Learning

Deep Reinforcement Learning (DRL) algorithms have been successfully applied to a range of challenging control tasks.In this paper, the authors introduce Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into the EA population periodically to inject gradient information into the EA.

Shaw Khadka, Kagan Tumer
Meta-Reinforcement Learning Of Structured Exploration Strategies

Exploration is a fundamental challenge in reinforcement learning (RL).In this work, westudy how prior tasks can inform an agent about how to explore effectively in newsituations.

Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine
Diversity-Driven Exploration Strategy For Deep Reinforcement Learning

Efficient exploration remains a challenging research problem in reinforcement learning, especially when an environment contains large state spaces, deceptive local optima, or sparse rewards.To tackle this problem, the authors present a diversity-driven approach for exploration, which can be easily combined with both off- and on-policy reinforcement learning algorithms.

Zhang-Wei Hong, Ariel Shann, Shih-Yang Su, Yi-Hsiang Chang, Tsu-Jui Fu, Chun-Yi Lee
Genetic-Gated Networks For Deep Reinforcement Learning

The authors introduce the Genetic-Gated Networks (G2Ns), simple neural networks that combine a gate vector composed of binary genetic genes in the hidden layer(s) of networks.

Simyung Chang, John Yang, Jaeseok Choi, Nojun Kwak
Memory Augmented Policy Optimization For Program Synthesis And Semantic Parsing

The authors present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate.

Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc V Le, Ni Lao
Hardware Conditioned Policies For Multi-Robot Transfer Learning

Deep reinforcement learning could be used to learn dexterous robotic policies but it is challenging to transfer them to new robots with vastly different hardware properties.The authors propose a novel approach called Hardware Conditioned Policies where the authors train a universal policy conditioned on a vector representation of robot hardware.

Tao Chen, Adithyavairavan Murali, Abhinav Gupta
Reward Learning From Human Preferences And Demonstrations In Atari

To solve complex real-world problems with reinforcement learning, the authors cannot rely on manually specified reward functions.Additionally, the authors investigate the fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.

Jan Leike, Borja Ibarz, Dario Amodei, Geoffrey Irving, Shane Legg
Graph Convolutional Policy Network For Goal-Directed Molecular Graph Generation

Generating novel graph structures that optimize given objectives while obeying some given underlying rules is fundamental for chemistry, biology and social science research.However, designing models that finds molecules that optimize desired properties while incorporating highly complex and non-differentiable rules remains to be a challenging task.

Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, Jure Leskovec
Visual Reinforcement Learning With Imagined Goals

For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires.In this paper, the authors propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies.

Ashvin Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine
Playing Hard Exploration Games By Watching YouTube

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse.However, these demonstrations are typically collected under artificial conditions, i.e.

Yusuf Aytar, Tobias Pfaff, David Budden, Thomas Paine, Ziyu Wang, Nando De Freitas
Unsupervised Video Object Segmentation For Deep Reinforcement Learning

The authors present a new technique for deep reinforcement learning that automatically detects moving objects and uses the relevant information for action selection.

Vik Goel, Jameson Weng, Pascal Poupart
Learning To Navigate In Cities Without A Map

Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence.

Piotr Mirowski, Matt Grimes, Mateusz Malinowski, Karl Moritz Hermann, Keith Anderson, Denis Teplyashin, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell
Learning Abstract Options Matthew Riemer, Miao Liu, Gerald Tesauro
Object-Oriented Dynamics Predictor

Generalization has been one of the major challenges for learning dynamics models in model-based reinforcement learning.In this paper, the authors present a novel object-oriented framework, called object-oriented dynamics predictor (OODP), which decomposes the environment into objects and predicts the dynamics of objects conditioned on both actions and object-to-object relations.

Guangxiang Zhu, Zhiao Huang, Chongjie Zhang
A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents YAN ZHENG, Zhaopeng Meng, Jianye Hao, Zongzhang Zhang, Tianpei Yang, Changjie Fan
Learning Attentional Communication For Multi-Agent Cooperation

Communication could potentially be an effective way for multi-agent cooperation.To tackle these difficulties, in this paper, the authors propose an attentional communication model that learns when communication is needed and how to integrate shared information for cooperative decision making.

Jiechuan Jiang, Zongqing Lu
Deep Dynamical Modeling And Control Of Unsteady Fluid Flows

The design of flow control systems remains a challenge due to the nonlinear nature of the equations that govern fluid flow.

Jeremy Morton, Antony Jameson, Mykel J Kochenderfer, Freddie Witherden
Adaptive Skip Intervals: Temporal Abstraction For Recurrent Dynamical Models

The authors introduce a method which enables a recurrent dynamics model to be temporally abstract.

Alexander Neitz, Giambattista Parascandolo, Stefan Bauer, Bernhard Schölkopf
Zero-Shot Transfer With Deictic Object-Oriented Representation In Reinforcement Learning

Object-oriented representations in reinforcement learning have shown promise in transfer learning, with previous research introducing a propositional object-oriented framework that has provably efficient learning bounds with respect to sample complexity.

Ofir Marom, Benjamin Rosman
Total Stochastic Gradient Algorithms And Applications In Reinforcement Learning

Backpropagation and the chain rule of derivatives have been prominent; however,the total derivative rule has not enjoyed the same amount of attention. The authors show how the total derivative rule leads to an intuitive visual framework forcreating gradient estimators on graphical models.

Paavo Parmas
Fighting Boredom In Recommender Systems With Linear Reinforcement Learning

A common assumption in recommender systems (RS) is the existence of a best fixed recommendation strategy.Such strategy may be simple and work at the item level (e.g., in multi-armed bandit it is assumed one best fixed arm/item exists) or implement more sophisticated RS (e.g., the objective of A/B testing is to find thebest fixed RS and execute it thereafter).

Romain WARLOP, Alessandro Lazaric, Jérémie Mary
Randomized Prior Functions For Deep Reinforcement Learning Ian Osband, Jaslanides Aslanides, Cassirer Cassirer
Scalable Coordinated Exploration In Concurrent Reinforcement Learning

The authors consider a team of reinforcement learning agents that concurrently operate in a common environment, and the authors develop an approach to efficient coordinated exploration that is suitable for problems of practical scale.

Maria Dimakopoulou, Ian Osband, Benjamin Van Roy
Context-dependent Upper-confidence Bounds For Directed Exploration

Directed exploration strategies for reinforcement learning are critical for learning an optimal policy in a minimal number of interactions with the environment.

Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White
Multi-Agent Generative Adversarial Imitation Learning

Imitation learning algorithms can be used to learn a policy from expert demonstrations without access to a reward signal.However, most existing approaches are not applicable in multi-agent settings due to the existence of multiple (Nash) equilibria and non-stationary environments.The authors propose a new framework for multi-agent imitation learning for general Markov games, where the authors build upon a generalized notion of inverse reinforcement learning.

Jiaming Song, Hongyu Ren, Dorsa Sadigh, Stefano Ermon
Actor-Critic Policy Optimization In Partially Observable Multiagent Environments

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence.Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return.

Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling
Learning To Share And Hide Intentions Using Information Regularization

Learning to cooperate with friends and compete with foes is a key component of multi-agent reinforcement learning.

DJ Strouse, Max Kleiman-Weiner, Josh Tenenbaum, Matt Botvinick, David Schwab
Credit Assignment For Collective Multiagent RL With Global Rewards

Scaling decision theoretic planning to large multiagent systems is challenging due to uncertainty and partial observability in the environment.The authors focus on a multiagent planning model subclass, relevant to urban settings, where agent interactions are dependent on their ``collective influence' on each other, rather than their identities.

Duc Nguyen, Akshat Kumar, Hoong Chuin Lau
Multi-Agent Reinforcement Learning Via Double Averaging Primal-Dual Optimization

Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents.Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, the authors study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn the value of a given policy.

Hoi-To Wai, Zhuoran Yang, Princeton Zhaoran Wang, Mingyi Hong
Learning Others' Intentional Models In Multi-Agent Settings Using Interactive POMDPs

Interactive partially observable Markov decision processes (I-POMDPs) provide a principled framework for planning and acting in a partially observable, stochastic and multi-agent environment. In order to predict other agents' actions using I-POMDPs, authors propose an approach that effectively uses Bayesian inference and sequential Monte Carlo sampling to learn others' intentional models which ascribe to them beliefs, preferences and rationality in action selection.

Yanlin Han, Piotr Gmytrasiewicz
Bayesian Control Of Large MDPs With Unknown Dynamics In Data-Poor Environments

The authors propose a Bayesian decision making framework for control of Markov Decision Processes (MDPs) with unknown dynamics and large, possibly continuous, state, action, and parameter spaces in data-poor environments.

Mahdi Imani, Seyede Fatemeh Ghoreishi, Ulisses M. Braga-Neto
Negotiable Reinforcement Learning For Pareto Optimal Sequential Decision-Making

It is commonly believed that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a Pareto optimal policy, i.e.To gain insight into the dynamics of this new framework, the authors implement a simple NRL agent and empirically examine its behavior in a simple environment.

Nishant Desai, Andrew Critch, Stuart J Russell
Rho-POMDPs Have Lipschitz-Continuous Epsilon-Optimal Value Functions

Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a “fully observable” problem—a belief MDP—and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex ∆).Then, value function approximators are proposed for both upper- and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds.

Mathieu Fehr, Olivier Buffet, Vincent Thomas, Jilles Dibangoye
Learning Task Specifications From Demonstrations

Real-world applications often naturally decompose into several sub-tasks.

Marcell Vazquez-Chanlatte, Susmit Jha, Ashish Tiwari, Mark Ho, Sanjit Seshia
Teaching Inverse Reinforcement Learners Via Features And Demonstrations Luis Haug, Sebastian Tschiatschek, Adish Singla
Single-Agent Policy Tree Search With Guarantees

The authors introduce two novel tree search algorithms that use a policy to guidesearch.

Laurent Orseau, Levi Lelis, Tor Lattimore, Theophane Weber
From Stochastic Planning To Marginal MAP

It is well known that the problems of stochastic planning and probabilistic inference are closely related.The authors introduce a new reduction from MMAP to maximum expected utility problems which are suitable for the symbolic computation in SOGBOFA.

Hao Cui, Radu Marinescu, Roni Khardon
Dual Principal Component Pursuit: Improved Analysis And Efficient Algorithms

Recent methods for learning a linear subspace from data corrupted by outliers are based on convex L1 and nuclear norm optimization and require the dimension of the subspace and the number of outliers to be sufficiently small [27].In sharp contrast, the recently proposed Dual Principal Component Pursuit (DPCP) method [22] can provably handle subspaces of high dimension by solving a non-convex L1 optimization problem on the sphere.

Zhihui Zhu, Yifan Wang, Daniel Robinson, Daniel Naiman, Rene Vidal, Manolis Tsakiris
Experimental Design For Cost-Aware Learning Of Causal Graphs

The authors consider the minimum cost intervention design problem: Given the essential graph of a causal graph and a cost to intervene on a variable, identify the set of interventions with minimum total cost that can learn any causal graph with the given essential graph.

Erik Lindgren, Murat Kocaoglu, Alex Dimakis, Sriram Vishwanath
Removing Hidden Confounding By Experimental Grounding

Observational data is increasingly used as a means for making individual-level causal predictions and intervention recommendations.The authors introduce a novel method of using limited experimental data to correct the hidden confounding in causal effect models trained on larger observational data, even if the observational data does not fully overlap with the experimental data.

Nathan Kallus, Aahlad Manas Puli, Uri Shalit
Domain Adaptation By Using Causal Inference To Predict Invariant Conditional Distributions Sara Magliacane, Thijs Van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, Joris M Mooij
Structural Causal Bandits: Where To Intervene?

The authors study the problem of identifying the best action in a sequential decision-making setting when the reward distributions of the arms exhibit a non-trivial dependence structure, which is governed by the underlying causal model of the domain where the agent is deployed.

Sanghack Lee, Elias Bareinboim
Uplift Modeling From Separate Labels Ikko Yamane, Florian Yger, Jamal Atif, Masashi Sugiyama
Causal Inference With Noisy And Missing Covariates Via Matrix Factorization

Valid causal inference in observational studies often requires controlling for confounders.The authors propose the use of matrix factorization to infer the confounders from noisy covariates.

Nathan Kallus, Xiaojie Mao, Madeleine Udell
Fast Estimation Of Causal Interactions Using Wold Processes

The authors here focus on the task of learning Granger causality matrices for multivariate point processes.

Flavio Figueiredo, Guilherme Resende Borges, Pedro O.S. Vaz De Melo, Renato Assunção
Learning And Testing Causal Models With Interventions

The authors consider testing and learning problems on causal Bayesian networks as defined by Pearl (Pearl, 2009).

Jayadev Acharya, Arnab Bhattacharyya, Constantinos Daskalakis, Saravanan Kandasamy
Causal Inference Via Kernel Deviance Measures

Discovering the causal structure among a set of variables is a fundamental problem in many areas of science.In this paper, the authors propose Kernel Conditional Deviance for Causal Inference (KCDC) a fully nonparametric causal discovery method based on purely observational data.

Jovana Mitrovic, Dino Sejdinovic, Yee Whye Teh
Multi-domain Causal Structure Learning In Linear Systems

The authors study the problem of causal structure learning in linear systems from observational data given in multiple domains, across which the causal coefficients and/or the distribution of the exogenous noises may vary.

AmirEmad Ghassami, Negar Kiyavash, Biwei Huang, Kun Zhang
Causal Inference And Mechanism Clustering Of A Mixture Of Additive Noise Models Shoubo Hu, Zhitang Chen, Vahid Partovi Nia, Laiwan CHAN, Yanhui Geng
Direct Estimation Of Differences In Causal Graphs

The authors consider the problem of estimating the differences between two causal directed acyclic graph (DAG) models with a shared topological order given i.i.d.samples from each model.

Yuhao Wang, Chandler Squires, Anastasiya Belyaeva, Caroline Uhler
Identification And Estimation Of Causal Effects From Dependent Data

The assumption that data samples are independent and identically distributed (iid) is standard in many areas of statistics and machine learning.The authors use segregated graphs [15], a generalization of latent projection mixed graphs [23], to represent causal models of this type and provide a complete algorithm for non-parametric identification in these models.

Eli Sherman, Ilya Shpitser
Multilingual Anchoring: Interactive Topic Modeling And Alignment Across Languages

Multilingual topic models can reveal patterns in cross-lingual document collections.

Michelle Yuan, Benjamin Van Durme , Jordan Ying
Submodular Field Grammars: Representation, Inference, And Application To Image Parsing

Natural scenes contain many layers of part-subpart structure, and distributions over them are thus naturally represented by stochastic image grammars, with one production per decomposition of a part.

Abe L Friesen, Pedro Domingos
Autoconj: Recognizing And Exploiting Conjugacy Without A Domain-Specific Language

Deriving conditional and marginal distributions using conjugacy relationships can be time consuming and error prone.In this paper, the authors propose a strategy for automating such derivations.

Matthew D. Hoffman, Matthew Johnson, Dustin Tran
Distributionally Robust Graphical Models

In many structured prediction problems, complex relationships between variables are compactly defined using graphical structures.The authors present adversarial graphical models (AGM), a distributionally robust approach for constructing a predictor that performs robustly for a class of data distributions defined using a graphical structure.

Rizal Fathony, Ashkan Rezaei, Mohammad Ali Bashiri, Xinhua Zhang, Brian Ziebart
Flexible And Accurate Inference And Learning For Deep Generative Models

The authors introduce a new approach to learning in hierarchical latent-variable generativemodels called the “distributed distributional code Helmholtz machine”, whichemphasises flexibility and accuracy in the inferential process.

Eszter Vértes, Maneesh Sahani
Provable Gaussian Embedding With One Observation

The success of machine learning methods heavily relies on having an appropriate representation for data at hand.

Ming Yu, Zhuoran Yang, Tuo Zhao, Mladen Kolar, Princeton Zhaoran Wang
Learning And Inference In Hilbert Space With Quantum Graphical Models

Quantum Graphical Models (QGMs) generalize classical graphical models by adopting the formalism for reasoning about uncertainty from quantum mechanics.Unlike classical graphical models, QGMs represent uncertainty with density matrices in complex Hilbert spaces.

Siddarth Srinivasan, Carlton Downey, Byron Boots
Multi-value Rule Sets For Interpretable Classification With Feature-Efficient Representations

The authors present the Multi-value Rule Set (MRS) for interpretableclassification with feature efficient presentations.

Tong Wang
Nonparametric Bayesian Lomax Delegate Racing For Survival Analysis With Competing Risks

The authors propose Lomax delegate racing (LDR) to explicitly model the mechanism of survival under competing risks and to interpret how the covariates accelerate or decelerate the time to event.

Quan Zhang, Mingyuan Zhou
Theoretical Guarantees For EM Under Misspecified Gaussian Mixture Models

Recent years have witnessed substantial progress in understanding the behavior of EM for mixture models that are correctly specified.The authors provide two classes of theoretical guarantees: first, the authors characterize the bias introduced due to the misspecification; and second, the authors prove that population EM converges at a geometric rate to the model projection under a suitable initialization condition.

Raaz Dwivedi, Nhật Hồ, Koulik Khamaru, Martin Wainwright, Michael I. Jordan
Nonparametric Learning From Bayesian Models With Randomized Objective Functions

Bayesian learning is built on an assumption that the model space contains a true reflection of the data generating mechanism.Here the authors present a Bayesian nonparametric approach to learning that makes use of statistical models, but does not assume that the model is true.

Simon Lyddon, Stephen Walker, Chris C Holmes
Rectangular Bounding Process

Stochastic partition models divide a multi-dimensional space into a number of rectangular regions, such that the data within each region exhibit certain types of homogeneity.Due to the nature of their partition strategy, existing partition models may create many unnecessary divisions in sparse regions when trying to describe data in dense regions.

Xuhui Fan, Bin Li, Scott SIsson
A Bayesian Nonparametric View On Count-Min Sketch

The count-min sketch is a time- and memory-efficient randomized data structure that provides a point estimate of the number of times an item has appeared in a data stream.The authors present a Bayesian view on the count-min sketch, using the same data structure, but providing a posterior distribution over the frequencies that characterizes the uncertainty arising from the hash-based approximation.

Diana Cai, Michael Mitzenmacher, Ryan Adams
Communication Efficient Parallel Algorithms For Optimization On Manifolds Bayan Saparbayeva, Michael Zhang, Lizhen Lin
Lifted Weighted Mini-Bucket

Many graphical models, such as Markov Logic Networks (MLNs) with evidence, possess highly symmetric substructures but no exact symmetries.In this paper, the authors present a lifted variant of the Weighted Mini-Bucket elimination algorithm which provides a principled way to (i) exploit the highly symmetric substructure of MLN models, and (ii) incorporate high-order inference terms which are necessary for high quality approximate inference.

Nicholas Gallo, Alexander Ihler
Cluster Variational Approximations For Structure Learning Of Continuous-Time Bayesian Networks From Incomplete Data

Continuous-time Bayesian networks (CTBNs) constitute a general and powerful framework for modeling continuous-time stochastic processes on networks.Inspired by recent advances in statistical physics, the authors present a new approximation scheme based on cluster-variational methods that significantly improves upon existing variational approximations.

Dominik Linzner, Heinz Koeppl
Faithful Inversion Of Generative Models For Effective Amortized Inference

Inference amortization methods share information across multiple posterior-inference problems, allowing each to be carried out more efficiently.The authors introduce an algorithm for faithfully, and minimally, inverting the graphical model structure of any generative model.

Stefan Webb, Adam Golinski, Rob Zinkov, Siddharth Narayanaswamy, Tom Rainforth, Yee Whye Teh, Frank Wood
A Stein Variational Newton Method

Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm: it minimizes the Kullback–Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space [Liu & Wang, NIPS 2016].

Gianluca Detommaso, Tiangang Cui, Youssef Marzouk, Alessio Spantini, Robert Scheichl
Reparameterization Gradient For Non-differentiable Models

The authors present a new algorithm for stochastic variational inference that targets at models with non-differentiable densities.

Wonyeol Lee, Hangyeol Yu, Hongseok Yang
Implicit Reparameterization Gradients

By providing a simple and efficient way of computing low-variance gradients of continuous random variables, the reparameterization trick has become the technique of choice for training a variety of latent variable models.The authors introduce an alternative approach to computing reparameterization gradients based on implicit differentiation and demonstrate its broader applicability by applying it to Gamma, Beta, Dirichlet, and von Mises distributions, which cannot be used with the classic reparameterization trick.

Michael Figurnov, Shakir Mohamed, Andriy Mnih
SLANG: Fast Structured Covariance Approximations For Bayesian Deep Learning With Natural Gradient

Uncertainty estimation in large deep-learning models is a computationally challengingtask, where it is difficult to form even a Gaussian approximation to theposterior distribution.To address this issue, the authors proposea new stochastic, low-rank, approximate natural-gradient (SLANG) method forvariational inference in large deep models.

Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Emtiyaz Khan
Wasserstein Variational Inference

This paper introduces Wasserstein variational inference, a new form of approximate Bayesian inference based on optimal transport theory.

Luca Ambrogioni, Umut Güçlü, Yağmur Güçlütürk, Max Hinne, Marcel A. J. Van Gerven, Eric Maris
Adaptive Path-Integral Autoencoders: Representation Learning And Planning For Dynamical Systems

The authors present a representation learning algorithm that learns a low-dimensional latent dynamical system from high-dimensional sequential raw data, e.g., video.

Jung-Su Ha, Young-Jin Park, Hyeok-Joo Chae, Soon-Seo Park, Han-Lim Choi
Variational Inference With Tail-adaptive F-Divergence

Variational inference with α-divergences has been widely used in modern probabilisticmachine learning.In this paper, the authors propose a new class oftail-adaptive f-divergences that adaptively change the convex function f with thetail of the importance weights, in a way that theoretically guarantee finite moments,while simultaneously achieving mass-covering properties.

Dilin Wang, Hao Liu, Qiang Liu
Boosting Black Box Variational Inference

Approximating a probability density in a tractable manner is a central task in Bayesian statistics.Specifically, they require a custom implementation of the greedy step (called the LMO) for every probabilistic model with respect to an unnatural variational family of truncated distributions.

Francesco Locatello, Gideon Dresdner, Rajiv Khanna, Isabel Valera, Gunnar Raetsch
Discretely Relaxing Continuous Variables For Tractable Variational Inference

The authors explore a new research direction in Bayesian variational inference with discrete latent variable priors where the authors exploit Kronecker matrix algebra for efficient and exact computations of the evidence lower bound (ELBO).The proposed "DIRECT" approach has several advantages over its predecessors; (i) it can exactly compute ELBO gradients (i.e.

Trefor Evans, Prasanth Nair
Using Large Ensembles Of Control Variates For Variational Inference

Variational inference is increasingly being addressed with stochastic optimization.The authors also present a Bayesian risk minimization framework in which the quality of a procedure for combining control variates is quantified by its effect on optimization convergence rates, which leads to a very simple combination rule.

Tomas Geffner, Justin Domke
The Promises And Pitfalls Of Stochastic Gradient Langevin Dynamics

Stochastic Gradient Langevin Dynamics (SGLD) has emerged as a key MCMC algorithm for Bayesian learning from large scale datasets.Several strategies have been suggested to reduce this effect; among them, SGLD Fixed Point (SGLDFP) uses carefully designed control variates to reduce the variance of the stochastic gradients.

Nicolas Brosse, Alain Durmus, Eric Moulines
Large-Scale Stochastic Sampling From The Probability Simplex

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference.To avoid the biases caused by this discretization error, the authors propose the stochastic Cox-Ingersoll-Ross process (SCIR), which removes all discretization error and the authors prove that samples from the SCIR process are asymptotically unbiased.

Jack Baker, Paul Fearnhead, Emily Fox, Chris Nemeth
Mirrored Langevin Dynamics

The authors consider the problem of sampling from constrained distributions, which has posed significant challenges to both non-asymptotic analysis and algorithmic design.

Ya-Ping Hsieh, Ali Kavis, Paul Rolland, Volkan Cevher
Thermostat-assisted Continuously-tempered Hamiltonian Monte Carlo For Bayesian Learning

In this paper, the authors propose a novel sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for the purpose of multimodal Bayesian learning.

Rui Luo, Jianhong Wang, Yaodong Yang, Jun WANG, Zhanxing Zhu
Dimensionally Tight Bounds For Second-Order Hamiltonian Monte Carlo

Hamiltonian Monte Carlo (HMC) is a widely deployed method to sample from high-dimensional distributions in Statistics and Machine learning.HMC is known to run very efficiently in practice and its popular second-order ``leapfrog" implementation has long been conjectured to run in $d^{1/4}$ gradient evaluations.

Oren Mangoubi, Nisheeth Vishnoi
Global Convergence Of Langevin Dynamics Based Algorithms For Nonconvex Optimization

The authors present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with $n$ component functions.

Pan Xu, Jinghui Chen, Difan Zou, Quanquan Gu
Meta-Learning MCMC Proposals

Effective implementations of sampling-based probabilistic inference often require manually constructed, model-specific proposals.

Tongzhou Wang, YI WU, Dave Moore, Stuart Russell
Posterior Concentration For Sparse Deep Learning

The authors introduce Spike-and-Slab Deep Learning (SS-DL), a fully Bayesian alternative to dropout for improving generalizability of deep ReLU networks.

Veronika Rockova, Nicholas Polson
Analytic Solution And Stationary Phase Approximation For The Bayesian Lasso And Elastic Net

The lasso and elastic net linear regression models impose a double-exponential prior distribution on the model parameters to achieve regression shrinkage and variable selection, allowing the inference of robust models from large data sets.

Tom Michoel
Bayesian Model Selection Approach To Boundary Detection With Non-Local Priors

Based on non-local prior distributions, the authors propose a Bayesian model selection (BMS) procedure for boundary detection in a sequence of data with multiple systematic mean changes.

Fei Jiang, Guosheng Yin, Francesca Dominici
Graphical Model Inference: Sequential Monte Carlo Meets Deterministic Approximations

Approximate inference in probabilistic graphical models (PGMs) can be grouped into deterministic methods and Monte-Carlo-based methods.In this paper the authors present a way of bridging the gap between deterministic and stochastic inference.

Fredrik Lindsten, Jouni Helske, Matti Vihola
Implicit Probabilistic Integrators For ODEs

The authors introduce a family of implicit probabilistic integrators for initial value problems (IVPs), taking as a starting point the multistep Adams–Moulton method.

Onur Teymur, Han Lie, Tim Sullivan, Ben Calderhead
A Bayes-Sard Cubature Method

This paper focusses on the formulation of numerical integration as an inferential task.To address these drawbacks the authors introduce Bayes-Sard cubature, a probabilistic framework that combines the flexibility of Bayesian cubature with the robustness of classical cubatures which are well-established.

Toni Karvonen, Chris J Oates, Simo Sarkka
Deep State Space Models For Time Series Forecasting

The authors present a novel approach to probabilistic time series forecasting that combines state space models with deep learning.

Syama Sundar Rangapuram, Matthias W Seeger, Jan Gasthaus, Lorenzo Stella, Bernie Wang, Tim Januschowski
BRUNO: A Deep Recurrent Model For Exchangeable Data

The authors present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations.

Iryna Korshunova, Jonas Degrave, Ferenc Huszar, Yarin Gal, Arthur Gretton, Joni Dambre
Scaling Gaussian Process Regression With Derivatives

Gaussian processes (GPs) with derivatives are useful in many applications, including Bayesian optimization, implicit surface reconstruction, and terrain reconstruction.The authors propose iterative solvers using fast $mathcal{O}(nd)$ matrix-vector multiplications (MVMs), together with pivoted Cholesky preconditioning that cuts the iterations to convergence by several orders of magnitude, allowing for fast kernel learning and prediction.

David Eriksson, Kun Dong, Eric Lee, David Bindel, Andrew Wilson
Algebraic Tests Of General Gaussian Latent Tree Models

The authors consider general Gaussian latent tree models in which the observed variables are not restricted to be leaves of the tree.Illustrating with the star tree, the authors propose a new testing methodology that circumvents singularity issues by trading off some statistical estimation efficiency and handles cases with many constraints through recent advances on Gaussian approximation for maxima of sums of high-dimensional random vectors.

Dennis Leung, Mathias Drton
Differentially Private Bayesian Inference For Exponential Families Garrett Bernstein, Dan Sheldon
Semi-crowdsourced Clustering With Deep Generative Models Yucen Luo, TIAN TIAN, Jiaxin Shi, Jun Zhu, Bo Zhang
Deep Poisson Gamma Dynamical Systems

The authors develop deep Poisson-gamma dynamical systems (DPGDS) to model sequentially observed multivariate count data, improving previously proposed models by not only mining deep hierarchical latent structure from the data, but also capturing both first-order and long-range temporal dependencies.

Dandan Guo, Bo Chen, Hao Zhang, Mingyuan Zhou
Deep State Space Models For Unconditional Word Generation

Autoregressive feedback is considered a necessity for successful unconditional text generation using stochastic sequence models.However, such feedback is known to introduce systematic biases into the training process and it obscures a principle of generation: committing to global information and forgetting local nuances.

Florian Schmidt, Thomas Hofmann
Modular Networks: Learning To Decompose Neural Computation

Scaling model capacity has been vital in the success of deep learning.The authors propose a training algorithm that flexibly chooses neural modules based on the data to be processed.

Louis Kirsch, Julius Kunze, David Barber
Gaussian Process Prior Variational Autoencoders

Variational autoencoders (VAE) are a powerful and widely-used class of models to learn complex data distributions in an unsupervised fashion.One important limitation of VAEs is the prior assumption that latent sample representations are independent and identically distributed.

Francesco Paolo Casale, Adrian Dalca, Luca Saglietti, Jennifer Listgarten, Nicolo Fusi
Bayesian Semi-supervised Learning With Graph Gaussian Processes

The authors propose a data-efficient Gaussian process-based Bayesian approach to the semi-supervised learning problem on graphs.

Yin Cheng Ng, Nicolò Colombo, Ricardo Silva
Inference In Deep Gaussian Processes Using Stochastic Gradient Hamiltonian Monte Carlo

Deep Gaussian Processes (DGPs) are hierarchical generalizations of Gaussian Processes that combine well calibrated uncertainty estimates with the high flexibility of multilayer models.To efficiently optimize the hyperparameters, the authors introduce the Moving Window MCEM algorithm.

Marton Havasi, Jose Miguel Hernández-Lobato, Juan José Murillo-Fuentes
Variational Bayesian Monte Carlo

Many probabilistic models of interest in scientific computing and machine learning have expensive, black-box likelihoods that prevent the application of standard techniques for Bayesian inference, such as MCMC, which would require access to the gradient or a large number of likelihood evaluations.The authors introduce here a novel sample-efficient inference framework, Variational Bayesian Monte Carlo (VBMC).

Luigi Acerbi
Bayesian Alignments Of Warped Multi-Output Gaussian Processes

The authors propose a novel Bayesian approach to modelling nonlinear alignments of time series based on latent shared information.

Markus Kaiser, Clemens Otte, Thomas Runkler, Carl Henrik Ek
Automating Bayesian Optimization With Bayesian Optimization

Bayesian optimization is a powerful tool for global optimization of expensive functions.In this work, the authors introduce a novel automated Bayesian optimization approach that dynamically selects promising models for explaining the observed data using Bayesian Optimization in the model space.

Gustavo Malkomes, Roman Garnett
Infinite-Horizon Gaussian Processes

Gaussian processes provide a flexible framework for forecasting, removing noise, and interpreting long temporal datasets.The authors show examples for large finite-length modelling problems, and present how the method runs in real-time on a smartphone on a continuous data stream updated at 100 Hz.

Arno Solin, James Hensman, Richard E Turner
Learning Gaussian Processes By Minimizing PAC-Bayesian Generalization Bounds

Gaussian Processes (GPs) are a generic modelling tool for supervised learning.To this end, the authors propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood.

David Reeb, Andreas Doerr, Sebastian Gerwinn, Barbara Rakitsch
Algorithmic Linearly Constrained Gaussian Processes

The authors algorithmically construct multi-output Gaussian process priors which satisfy linear differential equations.

Markus Lange-Hegermann
Efficient Projection Onto The Perfect Phylogeny Model

Several algorithms build on the perfect phylogeny model to infer evolutionary trees.

Bei Jia, Surjyendu Ray, Sam Safavi, José Bento
Distributed $k$-Clustering For Data With Heavy Noise

In this paper, the authors consider the $k$-center/median/means clustering with outliers problems (or the $(k, z)$-center/median/means problems) in the distributed setting.

Shi Li, Xiangyu Guo
Communication Compression For Decentralized Training

Optimizing distributed learning systems is an artof balancing between computation and communication.There have been two lines of research that try todeal with slower networks: {em communication compression} forlow bandwidth networks, and {em decentralization} forhigh latency networks.}Although the system implication of such combinationis trivial, the underlying theoretical principle andalgorithm design is challenging: unlike centralized algorithms, simply compressing{ c exchanged information,even in an unbiased stochastic way, within the decentralized network would accumulate the error and cause divergence.}

Hanlin Tang, Shaoduo Gan, Ce Zhang, Tong Zhang, Ji Liu
Do Less, Get More: Streaming Submodular Maximization With Subsampling

In this paper, the authors develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once.

Moran Feldman, Amin Karbasi, Ehsan Kazemi
Optimal Algorithms For Continuous Non-monotone Submodular And DR-Submodular Maximization

In this paper the authors study the fundamental problems of maximizing a continuous non monotone submodular function over a hypercube, with and without coordinate-wise concavity.

Rad Niazadeh, Tim Roughgarden, Joshua Wang
Provable Variational Inference For Constrained Log-Submodular Models

Submodular maximization problems appear in several areas of machine learning and data science, as many useful modelling concepts such as diversity and coverage satisfy this natural diminishing returns property.To perform inference in these models the authors design novel variational inference algorithms, which carefully leverage the combinatorial and probabilistic properties of these objects.

Josip Djolonga, Stefanie Jegelka, Andreas Krause
Fast Greedy Algorithms For Dictionary Selection With Generalized Sparsity Constraints

In dictionary selection, several atoms are selected from finite candidates that successfully approximate given data points in the sparse representation.

Kaito Fujii, Tasuku Soma
Boolean Decision Rules Via Column Generation

This paper considers the learning of Boolean rules in either disjunctive normal form (DNF, OR-of-ANDs, equivalent to decision rule sets) or conjunctive normal form (CNF, AND-of-ORs) as an interpretable model for classification.To handle large datasets, the authors propose an approximate CG algorithm using randomization.

Sanjeeb Dash, Oktay Gunluk, Dennis Wei
Computing Kantorovich-Wasserstein Distances On $d$-dimensional Histograms Using $(d+1)$-partite Graphs

This paper presents a novel method to compute the exact Kantorovich-Wasserstein distance between a pair of $d$-dimensional histograms having $n$ bins each.

Gennaro Auricchio, Federico Bassetti, Stefano Gualandi, Marco Veneroni
Adaptive Negative Curvature Descent With Applications In Non-convex Optimization

Negative curvature descent (NCD) method has been utilized to design deterministic or stochastic algorithms for non-convex optimization aiming at finding second-order stationary points or local minima.

Mingrui Liu, Zhe Li, Xiaoyu Wang, Jinfeng Yi, Tianbao Yang
Implicit Bias Of Gradient Descent On Linear Convolutional Networks

The authors show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $ell_{2/L}$ bridge penalty in the frequency domain.

Suriya Gunasekar, Jason Lee, Daniel Soudry, Nati Srebro
Deep Generative Models For Distribution-Preserving Lossy Compression

The authors propose and study the problem of distribution-preserving lossy compression.

Michael Tschannen, Eirikur Agustsson, Mario Lucic
Visual Object Networks: Image Generation With Disentangled 3D Representations

Recent progress in deep generative models has led to tremendous breakthroughs in image generation.Different from previous works built on 2D datasets and models, the authors present a new generative model, Visual Object Networks (VON), synthesizing natural images of objects with a disentangled 3D representation.

Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Josh Tenenbaum, Bill Freeman
Nonlocal Neural Networks, Nonlocal Diffusion And Nonlocal Modeling

Nonlocal neural networks have been proposed and shown to be effective in several computer vision tasks, where the nonlocal operations can directly capture long-range dependencies in the feature space.

Yunzhe Tao, Qi Sun, Qiang Du, Wei Liu
Can We Gain More From Orthogonality Regularizations In Training Deep Networks?

This paper seeks to answer the question: as the (near-) orthogonality of weights is found to be a favorable property for training deep convolutional neural networks, how can the authors enforce it in more effective and easy-to-use ways?The authors observe consistent performance gains after applying those proposed regularizations, in terms of both the final accuracies achieved, and faster and more stable convergences.

Nitin Bansal, Xiaohan Chen, Zhangyang Wang
Discrimination-aware Channel Pruning For Deep Neural Networks

Channel pruning is one of the predominant approaches for deep model compression.To this end, the authors introduce additional losses into the network to increase the discriminative power of intermediate layers and then select the most discriminative channels for each layer by considering the additional loss and the reconstruction error.

Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu
Probabilistic Model-Agnostic Meta-Learning

Meta-learning for few-shot learning entails acquiring a prior over previous tasks and experiences, such that new tasks be learned from small amounts of data.In this paper, the authors propose a probabilistic meta-learning algorithm that can sample models for a new task from a model distribution.

Chelsea Finn, Kelvin Xu, Sergey Levine
FastGRNN: A Fast, Accurate, Stable And Tiny Kilobyte Sized Gated Recurrent Neural Network

This paper develops the FastRNN and FastGRNN algorithms to address the twin RNN limitations of inaccurate training and inefficient prediction.

Aditya Kusupati, Manish Singh, Kush Bhatia, Ashish Kumar, Prateek Jain, Manik Varma
Understanding Batch Normalization

Batch normalization (BN) is a technique to normalize activations in intermediate layers of deep neural networks.

Nils Bjorck, Carla P Gomes, Bart Selman, Kilian Weinberger
How Many Samples Are Needed To Estimate A Convolutional Neural Network?

A widespread folklore for explaining the success of Convolutional Neural Networks (CNNs) is that CNNs use a more compact representation than the Fully-connected Neural Network (FNN) and thus require fewer training samples to accurately estimate their parameters.

Simon Du, Yining Wang, Xiyu Zhai, Sivaraman Balakrishnan, Russ Salakhutdinov, Aarti Singh
Robust Detection Of Adversarial Attacks By Modeling The Intrinsic Properties Of Deep Neural Networks

It has been shown that deep neural network (DNN) based classifiers are vulnerable to human-imperceptive adversarial perturbations which can cause DNN classifiers to output wrong predictions with high confidence.The authors propose an unsupervised learning approach to detect adversarial inputs without any knowledge of attackers.

Zhihao Zheng, Pengyu Hong
Combinatorial Optimization With Graph Convolutional Networks And Guided Tree Search

The authors present a learning-based approach to computing solutions for certain NP-hard problems.

Zhuwen Li, Qifeng Chen, Vladlen Koltun
Automatic Differentiation In ML: Where We Are And Where We Should Be Going Bart Van Merrienboer, Olivier Breuleux, Arnaud Bergeron, Pascal Lamblin
Realistic Evaluation Of Deep Semi-Supervised Learning Algorithms

Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data when labels are limited or expensive to obtain.After creating a unified reimplementation of various widely-used SSL techniques, the authors test them in a suite of experiments designed to address these issues.

Avital Oliver, Augustus Odena, Colin A Raffel, Dogus Cubuk, Ian Goodfellow
Toddler-Inspired Visual Object Learning

Real-world learning systems have practical limitations on the quality and quantity of the training datasets that they can collect and consider.

Sven Bambach, David Crandall, Linda Smith, Chen Yu
Generalisation In Humans And Deep Neural Networks

The authors compare the robustness of humans and current convolutional deep neural networks (DNNs) on object recognition under twelve different types of image degradations.

Robert Geirhos, Carlos R. M. Temme, Jonas Rauber, Heiko H. Schütt, Matthias Bethge, Felix A. Wichmann
Assessing The Scalability Of Biologically-Motivated Deep Learning Algorithms And Architectures

The backpropagation of error algorithm (BP) is impossible to implement in a real brain.

Sergey Bartunov, Adam Santoro, Blake Richards, Luke Marris, Geoffrey E Hinton, Timothy Lillicrap
Incorporating Context Into Language Encoding Models For FMRI

Language encoding models help explain language processing in the human brain by learning functions that predict brain responses from the language stimuli that elicited them.In this work the authors instead build encoding models using rich contextual representations derived from an LSTM language model.

Shailee Jain, Alexander Huth
Why So Gloomy? A Bayesian Explanation Of Human Pessimism Bias In The Multi-armed Bandit Task

How humans make repeated choices among options with imperfectly known reward outcomes is an important problem in psychology and neuroscience.The authors present data from a human stationary bandit experiment, in which the authors vary the average abundance and variability of reward availability (mean and variance of reward rate distributions).

Dalin Guo, Angela J Yu
Mental Sampling In Multimodal Representations Jianqiao Zhu, Adam Sanborn, Nick Chater
Integrated Accounts Of Behavioral And Neuroimaging Data Using Flexible Recurrent Neural Network Models

Neuroscience studies of human decision-making abilities commonly involvesubjects completing a decision-making task while BOLD signals arerecorded using fMRI.To address this limitation, weintroduce a new method using recurrent neural network models that areflexible enough to be jointly fitted to the behavioral and neuraldata.

Amir Dezfouli, Richard Morris, Fabio Ramos, Peter Dayan, Bernard Balleine
Efficient Inference For Time-varying Behavior During Learning Nicholas Roy, Ji Hyun Bak, Athena Akrami, Carlos Brody, Jonathan W Pillow
Multivariate Convolutional Sparse Coding For Electromagnetic Brain Signals Tom Dupré La Tour, Thomas Moreau, Mainak Jas, Alexandre Gramfort
Manifold-tiling Localized Receptive Fields Are Optimal In Similarity-preserving Neural Networks

Many neurons in the brain, such as place cells in the rodent hippocampus, have localized receptive fields, i.e., they respond to a small neighborhood of stimulus space.What is the functional significance of such representations and how can they arise?

Anirvan Sengupta, Cengiz Pehlevan, Mariano Tepper, Alexander Genkin, Dmitri Chklovskii
Connectionist Temporal Classification With Maximum Entropy Regularization

The authors propose a regularization method based on maximum conditional entropy which penalizes peaky distributions and encourages exploration.

Hu Liu, Sheng Jin, Changshui Zhang
Removing The Feature Correlation Effect Of Multiplicative Noise

Multiplicative noise, including dropout, is widely used to regularize deep neural networks (DNNs), and is shown to be effective in a wide range of architectures and tasks.However, high feature correlation is undesirable, as it increases redundancy in representations.

Zijun Zhang, Yining Zhang, Zongpeng Li
Overfitting Or Perfect Fitting? Risk Bounds For Classification And Regression Rules That Interpolate

Many modern machine learning models are trained to achieve zero or near-zero training error in order to obtain near-optimal (but non-zero) test error.

Mikhail Belkin, Daniel Hsu, Partha Mitra
Smoothed Analysis Of Discrete Tensor Decomposition And Assemblies Of Neurons

The authors analyze linear independence of rank one tensors produced by tensor powers of randomly perturbed vectors.

Nima Anari, Constantinos Daskalakis, Wolfgang Maass, Christos Papadimitriou, Amin Saberi, Santosh Vempala
Entropy And Mutual Information In Models Of Deep Neural Networks

The authors examine a class of stochastic deep learning models with a tractable method to compute information-theoretic quantities.(ii) The authors extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method.

Marylou Gabrié, Andre Manoel, Clément Luneau, Jean Barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová
The Committee Machine: Computational To Statistical Gaps In Learning A Two-layers Neural Network

Heuristic tools from statistical physics have been used in the past to compute the optimal learning and generalization errors in the teacher-student scenario in multi- layer neural networks.The authors also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters.

Benjamin Aubin, Antoine Maillard, Jean Barbier, Florent Krzakala, Nicolas Macris, Lenka Zdeborová
A Unified Framework For Extensive-Form Game Abstraction With Bounds

Abstraction has long been a key component in the practical solving of large-scale extensive-form games.In this paper the authors present a unified framework for analyzing abstractions that can express all types of abstractions and solution concepts used in prior papers with performance guarantees---while maintaining comparable bounds on abstraction quality.

Christian Kroer, Tuomas Sandholm
Connecting Optimization And Regularization Paths

The authors study the implicit regularization properties of optimization techniques by explicitly connecting their optimization paths to the regularization paths of ``corresponding' regularized problems.

Arun Suggala, Adarsh Prasad, Pradeep Ravikumar
Overlapping Clustering Models, And One (class) SVM To Bind Them All

People belong to multiple communities, words belong to multiple topics, and books cover multiple genres; overlapping clusters are commonplace.

Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti
Learning Latent Variable Structured Prediction Models With Gaussian Perturbations

The standard margin-based structured prediction commonly uses a maximum loss over all possible structured outputs.Recent work has proposed the use of the maximum loss over random structured outputs sampled independently from some proposal distribution, with theoretical guarantees.

Kevin Bello, Jean Honorio
Self-Supervised Generation Of Spatial Audio For 360° Video

The authors introduce an approach to convert mono audio recorded by a 360° video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere.

Pedro Morgado, Nuno Nvasconcelos, Timothy Langlois, Oliver Wang
Symbolic Graph Reasoning Meets Convolutions

Beyond local convolution networks, the authors explore how to harness various external human knowledge for endowing the networks with the capability of semantic global reasoning.CRF) or constraints for modeling broader dependencies, the authors propose a new Symbolic Graph Reasoning (SGR) layer, which performs reasoning over a group of symbolic nodes whose outputs explicitly represent different properties of each semantic in a prior knowledge graph.

Xiaodan Liang, Zhiting Hu, Hao Zhang, Liang Lin, Eric Xing
Towards Deep Conversational Recommendations Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, Chris Pal
Human-in-the-Loop Interpretability Prior

The authors optimize for interpretability by directly including humans in the optimization loop.

Isaac Lage, Andrew Ross, Samuel J Gershman, Been Kim, Finale Doshi-Velez
Why Is My Classifier Discriminatory?

Recent attempts to achieve fairness in predictive models focus on the balance between fairness and accuracy.In this work, the authors argue that the fairness of predictions should be evaluated in context of the data, and that unfairness induced by inadequate samples sizes or unmeasured predictive variables should be addressed through data collection, rather than by constraining the model.

Irene Chen, Fredrik Johansson, David Sontag
Link Prediction Based On Graph Neural Networks

Link prediction is a key problem for network-structured data.In this paper, the authors study this heuristic learning paradigm for link prediction.

Muhan Zhang, Yixin Chen
KONG: Kernels For Ordered-neighborhood Graphs

The authors present novel graph kernels for graphs with node and edge labels that have ordered neighborhoods, i.e.

Moez Draief, Konstantin Kutzkov, Kevin Scaman, Milan Vojnovic
Efficient Stochastic Gradient Hard Thresholding

Stochastic gradient hard thresholding methods have recently been shown to work favorably in solving large-scale empirical risk minimization problems under sparsity or rank constraint.To address these deficiencies, the authors propose an efficient hybrid stochastic gradient hard thresholding (HSG-HT) method that can be provably shown to have sample-size-independent gradient evaluation and hard thresholding complexity bounds.

Pan Zhou, Xiaotong Yuan, Jiashi Feng
Measures Of Distortion For Machine Learning

Given data from a general metric space, one of the standard machine learning pipelines is to first embed the data into a Euclidean space and subsequently apply out of the box machine learning algorithms to analyze the data.The quality of such an embedding is typically described in terms of a distortion measure.

Leena Chennuru Vankadara, Ulrike Von Luxburg
Relating Leverage Scores And Density Using Regularized Christoffel Functions

Statistical leverage scores emerged as a fundamental tool for matrix sketching and column sampling with applications to low rank approximation, regression, random feature learning and quadrature.Borrowing ideas from the orthogonal polynomial literature, the authors introduce the regularized Christoffel function associated to a positive definite kernel.

Edouard Pauwels, Francis Bach, Jean-Philippe Vert
Streaming Kernel PCA With $\tilde{O}(\sqrt{n})$ Random Features

The authors study the statistical and computational aspects of kernel principal component analysis using random Fourier features

Enayat Ullah, Poorya Mianjy, Teodor Vanislavov Marinov, Raman Arora
Learning With SGD And Random Features

Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms.More precisely, the authors study the estimator defined by stochastic gradient with mini batches and random features.

Luigi Carratino, Alessandro Rudi, Lorenzo Rosasco
But How Does It Work In Theory? Linear SVM With Random Features

The authors prove that, under low noise assumptions, the support vector machine with $Nll m$ random features (RFSVM) can achieve the learning rate faster than $O(1/sqrt{m})$ on a training set with $m$ samples when an optimized feature map is used.

Yitong Sun, Anna Gilbert, Ambuj Tewari
Statistical And Computational Trade-Offs In Kernel K-Means

The authors investigate the efficiency of k-means in terms of both statistical and computational requirements.More precisely, the authors study a Nystr"om approach to kernel k-means.

Daniele Calandriello, Lorenzo Rosasco
Quadrature-based Features For Kernel Approximation

The authors consider the problem of improving kernel approximation via randomized feature maps.These maps arise as Monte Carlo approximation to integral representations of kernel functions and scale up kernel methods for larger datasets.

Marina Munkhoeva, Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets
Processing Of Missing Data By Neural Networks

The authors propose a general, theoretically justified mechanism for processing missing data by neural networks.

Marek Śmieja, Łukasz Struski, Jacek Tabor, Bartosz Zieliński, Przemysław Spurek
Constructing Deep Neural Networks By Bayesian Network Structure Learning

The authors introduce a principled approach for unsupervised structure learning of deep neural networks.

Raanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz, Guy Koren, Gal Novik
Mallows Models For Top-k Lists

The classic Mallows model is a widely-used tool to realize distributions on per- mutations.The authors demonstrate this by studying two basic problems in this model, namely, sampling and reconstruction, from both algorithmic and experimental points of view.

Flavio Chierichetti, Anirban Dasgupta, Shahrzad Haddadan, Ravi Kumar, Silvio Lattanzi
Cooperative Neural Networks (CoNN): Exploiting Prior Independence Structure For Improved Classification

The authors propose a new approach, called cooperative neural networks (CoNN), which use a set of cooperatively trained neural networks to capture latent representations that exploit prior given independence structure.

Harsh Shrivastava, Eugene Bart, Bob Price, Hanjun Dai, Bo Dai, Srinivas Aluru
Maximum-Entropy Fine Grained Classification

Fine-Grained Visual Classification (FGVC) is an important computer vision problem that involves small diversity within the different classes, and often requires expert annotators to collect data.

Abhimanyu Dubey, Otkrist Gupta, Ramesh Raskar, Nikhil Naik
Efficient Loss-Based Decoding On Graphs For Extreme Classification

In extreme classification problems, learning algorithms are required to map instances to labels from an extremely large label set.The authors build on a recent extreme classification framework with logarithmic time and space (LTLS), and on a general approach for error correcting output coding (ECOC) with loss-based decoding, and introduce a flexible and efficient approach accompanied by theoretical bounds.

Itay Evron, Edward Moroshko, Koby Crammer
A No-regret Generalization Of Hierarchical Softmax To Extreme Multi-label Classification

Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. The authors investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems.

Marek Wydmuch, Kalina Jasinska, Mikhail Kuznetsov, Róbert Busa-Fekete, Krzysztof Dembczynski
Efficient Gradient Computation For Structured Output Learning With Rational And Tropical Losses

Many structured prediction problems admit a natural loss function for evaluation such as the edit-distance or $n$-gram loss.However, existing learning algorithms are typically designed to optimize alternative objectives such as the cross-entropy.

Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Dmitry Storcheus, Scott Yang
Deep Structured Prediction With Nonlinear Output Transformations

Deep structured models are widely used for tasks like semantic segmentation, where explicit correlations between variables provide important prior information which generally helps to reduce the data needs of deep nets.However, current deep structured models are restricted by oftentimes very local neighborhood structure, which cannot be increased for computational complexity reasons, and by the fact that the output configuration, or a representation thereof, cannot be transformed further.

Colin Graber, Ofer Meshi, Alex Schwing
Mapping Images To Scene Graphs With Permutation-Invariant Structured Prediction

Machine understanding of complex images is a key goal of artificial intelligence.However, it is unclear what principles should guide the design of a structured prediction model that utilizes the power of deep learning components.

Roei Herzig, Moshiko Raboh, Gal Chechik, Jonathan Berant, Amir Globerson
Large Margin Deep Networks For Classification

The authors present a formulation of deep learning that aims at producing a large margin classifier.

Gamaleldin Elsayed, Dilip Krishnan, Hossein Mobahi, Kevin Regan, Samy Bengio
Semi-supervised Deep Kernel Learning: Regression With Unlabeled Data By Minimizing Predictive Variance

Large amounts of labeled data are typically required to train deep learning models.The authors present semi-supervised deep kernel learning (SSDKL), a semi-supervised regression model based on minimizing predictive variance in the posterior regularization framework.

Neal Jean, Sang Michael Xie, Stefano Ermon
Multitask Boosting For Survival Analysis With Competing Risks

The co-occurrence of multiple diseases among the general population is an important problem as those patients have more risk of complications and represent a large share of health care expenditure.

Alexis Bellot, Mihaela Van Der Schaar
Multi-Layered Gradient Boosting Decision Trees

Multi-layered distributed representation is believed to be the key ingredient of deep neural networks especially in cognitive tasks like computer vision.

Ji Feng, Yang Yu, Zhi-Hua Zhou
Unsupervised Adversarial Invariance

Data representations that contain all the information about target variables but are invariant to nuisance factors benefit supervised learning algorithms by preventing them from learning associations between these factors and the targets, thus reducing overfitting.

Ayush Jaiswal, Rex Yue Wu, Wael Abd-Almageed, Prem Natarajan
Learning Deep Disentangled Embeddings With The F-Statistic Loss

Deep-embedding methods aim to discover representations of a domain. Disentangling methods aim to make explicit compositional or factorial structure. The authors combine these two active but independent lines of research and propose a new paradigm suitable for both goals.

Karl Ridgeway, Mike Mozer
Learning Latent Subspaces In Variational Autoencoders

Variational autoencoders (VAEs) are widely used deep generative models capable of learning unsupervised latent representations of data.

Jack Klys, Jake Snell, Richard Zemel
Dual Swap Disentangling

Learning interpretable disentangled representations is a crucial yet challenging task.

Zunlei Feng, Xinchao Wang, Chenglong Ke, An-Xiang Zeng, Dacheng Tao, Mingli Song
Joint Autoregressive And Hierarchical Priors For Learned Image Compression

Recent models for learned image compression are based on autoencoders that learn approximately invertible mappings from pixels to a quantized latent representation.

David Minnen, Johannes Ballé, Johannes Ballé, George D Toderici
Group Equivariant Capsule Networks

The authors present group equivariant capsule networks, a framework to introduce guaranteed equivariance and invariance properties to the capsule network idea.

Jan Lenssen, Matthias Fey, Pascal Libuschewski
Learning Disentangled Joint Continuous And Discrete Representations

The authors present a framework for learning disentangled and interpretable jointly continuous and discrete representations in an unsupervised manner.

Emilien Dupont
Image-to-image Translation For Cross-domain Disentanglement

Deep image translation methods have recently shown excellent results, outputting high-quality images covering multiple modes of the data distribution.There has also been increased interest in disentangling the internal representations learned by deep methods to further improve their performance and achieve a finer control.

Abel Gonzalez-Garcia, Joost Van De Weijer, Yoshua Bengio
Cooperative Learning Of Audio And Video Models From Self-Supervised Synchronization

There is a natural correlation between the visual and auditive elements of a video.The authors demonstrate that a calibrated curriculum learning scheme, a careful choice of negative examples, and the use of a contrastive loss are critical ingredients to obtain powerful multi-sensory representations from models optimized to discern temporal synchronization of audio-video pairs.

Bruno Korbar, Du Tran, Lorenzo Torresani
Non-Adversarial Mapping With VAEs

The study of cross-domain mapping without supervision has recently attracted much attention.

Yedid Hoshen
Learning To Teach With Dynamic Loss Functions

Teaching is critical to human society: it is with teaching that prospective students are educated and human civilization can be inherited and advanced.

Lijun Wu, Fei Tian, Yingce Xia, Yang Fan, Tao Qin, Lai Jian-Huang, Tie-Yan Liu
Maximizing Acquisition Functions For Bayesian Optimization

Bayesian optimization is a sample-efficient approach to global optimization that relies on theoretically motivated value heuristics (acquisition functions) to guide its search process.

James Wilson, Frank Hutter, Marc Deisenroth
MetaReg: Towards Domain Generalization Using Meta-Regularization

Training models that generalize to new domains at test time is a problem of fundamental importance in machine learning.

Yogesh Balaji, Swami Sankaranarayanan, Rama Chellappa
Transfer Learning With Neural AutoML

The authors reduce the computational cost of Neural AutoML with transfer learning.AutoML relieves human effort by automating the design of ML algorithms.

Catherine Wong, Neil Houlsby, Yifeng Lu, Andrea Gesmundo
Hierarchical Reinforcement Learning For Zero-shot Generalization With Subtask Dependencies

The authors introduce a new RL problem where the agent is required to generalize to a previously-unseen environment characterized by a subtask graph which describes a set of subtasks and their dependencies.

Sungryull Sohn, Junhyuk Oh, Honglak Lee
Lifelong Inverse Reinforcement Learning

Methods for learning from demonstration (LfD) have shown success in acquiring behavior policies by imitating a user.To address this challenge, the authors introduce the novel problem of lifelong learning from demonstration, which allows the agent to continually build upon knowledge learned from previously demonstrated tasks to accelerate the learning of new tasks, reducing the amount of demonstrations required.

Jorge A Mendez, Shashank Shivkumar, Eric Eaton
Safe Active Learning For Time-Series Modeling With Gaussian Processes

Learning time-series models is useful for many applications, such as simulationand forecasting.In this study, the authors consider the problem of actively learning time-series models while taking given safety constraints into account.

Christoph Zimmer, Mona Meister, Duy Nguyen-Tuong
Online Structure Learning For Feed-Forward And Recurrent Sum-Product Networks

Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable.

Agastya Kalra, Abdullah Rashwan, Wei-Shou Hsu, Pascal Poupart, Prashant Doshi, George Trimponias
Preference Based Adaptation For Learning Objectives

In many real-world learning tasks, it is hard to directly optimize the true performance measures, meanwhile choosing the right surrogate objectives is also difficult.A novel sampling based algorithm DL^2M is proposed to learn the optimal parameter, which enjoys strong theoretical guarantees and efficient empirical performance.

Yao-Xiang Ding, Zhi-Hua Zhou
Byzantine Stochastic Gradient Descent

This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of $m$ machines which allegedly compute stochastic gradients every iteration, an $alpha$-fraction are Byzantine, and may behave adversarially.

Dan Alistarh, Zeyuan Allen-Zhu, Jerry Li
Contextual Bandits With Surrogate Losses: Margin Bounds And Efficient Algorithms

The authors use surrogate losses to obtain several new regret bounds and new algorithms for contextual bandit learning.

Dylan Foster, Akshay Krishnamurthy
Online Learning Of Quantum States

Suppose the authors have many copies of an unknown n-qubit state $ ho$.

Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, Ashwin Nayak
Horizon-Independent Minimax Linear Regression

The authors consider online linear regression: at each round, an adversary reveals a covariate vector, the learner predicts a real value, the adversary reveals a label, and the learner suffers the squared prediction error.Further, the authors show that this forward recursion remains optimal even against adaptively chosen labels and covariates, provided that the adversary adheres to a set of constraints that prevent misrepresentation of the scale of the problem.

Alan Malek, Peter Bartlett
Factored Bandits

The authors introduce the factored bandits model, which is a framework for learning withlimited (bandit) feedback, where actions can be decomposed into a Cartesianproduct of atomic actions.

Julian Zimmert, Yevgeny Seldin
A Model For Learned Bloom Filters And Optimizing By Sandwiching

Recent work has suggested enhancing Bloom filters by using a pre-filter, based on applying machine learning to determine a function that models the data set the Bloom filter is meant to represent.

Michael Mitzenmacher
Streamlining Variational Inference For Constraint Satisfaction Problems

Several algorithms for solving constraint satisfaction problems are based on survey propagation, a variational inference scheme used to obtain approximate marginal probability estimates for variable assignments.The authors introduce a more general branching strategy based on streamlining constraints, which sidestep hard assignments to variables.

Aditya Grover, Tudor Achim, Stefano Ermon
Robust Hypothesis Testing Using Wasserstein Uncertainty Sets

The authors develop a novel computationally efficient and general framework for robust hypothesis testing.

RUI GAO, Liyan Xie, Yao Xie, Huan Xu
Smoothed Analysis Of The Low-rank Approach For Smooth Semidefinite Programs

The authors consider semidefinite programs (SDPs) of size $n$ with equality constraints.In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix $Y$ of size $n imes k$ such that $X=YY^*$ is the SDP variable.

Thomas Pumir, Samy Jelassi, Nicolas Boumal
Convergence Of Cubic Regularization For Nonconvex Optimization Under KL Property

Cubic-regularized Newton's method (CR) is a popular algorithm that guarantees to produce a second-order stationary solution for solving nonconvex optimization problems. The authors explore the asymptotic convergence rate of CR by exploiting the ubiquitous Kurdyka-Lojasiewicz (KL) property of the nonconvex objective functions.

Yi Zhou, Zhe Wang, Yingbin Liang
A Simple Proximal Stochastic Gradient Method For Nonsmooth Nonconvex Optimization

The authors analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems.

Zhize Li, Jian Li
Stochastic Chebyshev Gradient Descent For Spectral Optimization

A large class of machine learning techniques requires the solution of optimization problems involving spectral functions of parametric matrices, e.g.A careful design of the truncation distribution allows us to offer distributions that are variance-optimal, which is crucial for fast and stable convergence of stochastic gradient methods.

Insu Han, Haim Avron, Jinwoo Shin
Proximal SCOPE For Distributed Sparse Learning

Distributed sparse learning with a cluster of multiple machines has attracted much attention in machine learning, especially for large-scale applications with high-dimensional data.One popular way to implement sparse learning is to use L1 regularization.

Shenyi Zhao, Gong-Duo Zhang, Ming-Wei Li, Wu-Jun Li
LAG: Lazily Aggregated Gradient For Communication-Efficient Distributed Learning

This paper presents a new class of gradient methods for distributed machine learning that adaptively skip the gradient calculations to learn with reduced communication and computation.

Tianyi Chen, Georgios Giannakis, Tao Sun, Wotao Yin
Direct Runge-Kutta Discretization Achieves Acceleration

The authors study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method.

Jingzhao Zhang, Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie
Optimal Algorithms For Non-Smooth Distributed Optimization In Networks

In this work, the authors consider the distributed optimization of non-smooth convex functions using a network of computing units.

Kevin Scaman, Francis Bach, Sebastien Bubeck, Laurent Massoulié, Yin Tat Lee
Graph Oracle Models, Lower Bounds, And Gaps For Parallel Stochastic Optimization

The authors suggest a general oracle-based framework that captures parallel stochastic optimization in different parallelization settings described by a dependency graph, and derive generic lower bounds in terms of this graph.

Blake Woodworth, Jialei Wang, Adam Smith, Brendan McMahan, Nati Srebro
(Probably) Concave Graph Matching

In this paper the authors address the graph matching problem.The authors introduce the concepts of emph{conditionally concave} and emph{probably conditionally concave} energies on polytopes and show that they encapsulate many instances of the graph matching problem, including matching Euclidean graphs and graphs on surfaces.

Haggai Maron, Yaron Lipman
Solving Non-smooth Constrained Programs With Lower Complexity Than $\mathcal{O}(1/\varepsilon)$: A Primal-Dual Homotopy Smoothing Approach

The authors propose a new primal-dual homotopy smoothing algorithm for a linearly constrained convex program, where neither the primal nor the dual function has to be smooth or strongly convex.

Xiaohan Wei, Hao Yu, Qing Ling, Michael Neely
Wasserstein Distributionally Robust Kalman Filtering

The authors study a distributionally robust mean square error estimation problem over a nonconvex Wasserstein ambiguity set containing only normal distributions.

Soroosh Shafieezadeh Abadeh, Viet Anh Nguyen, Daniel Kuhn, Peyman Mohajerin Esfahani
Decentralize And Randomize: Faster Algorithm For Wasserstein Barycenters

The authors study the decentralized distributed computation of discrete approximations for the regularized Wasserstein barycenter of a finite set of continuous probability measures distributedly stored over a network.

Pavel Dvurechenskii, Darina Dvinskikh, Alexander Gasnikov, Cesar Uribe, Angelia Nedich
Limited Memory Kelley's Method Converges For Composite Convex And Submodular Objectives

The original simplicial method (OSM), a variant of the classic Kelley’s cutting plane method, has been shown to converge to the minimizer of a composite convex and submodular objective, though no rate of convergence for this method was known.The authors propose a limited memory version of Kelley’s method (L-KM) and of OSM that requires limited memory (at most n+ 1 constraints for an n-dimensional problem) independent of the iteration.

Song Zhou, Swati Gupta, Madeleine Udell
Stochastic Spectral And Conjugate Descent Methods

The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD).In this paper the authors introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions.

Dmitry Kovalev, Peter Richtarik, Eduard Gorbunov, Elnur Gasanov
Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms For Finding Local Minima

The authors propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently.

Yaodong Yu, Pan Xu, Quanquan Gu
First-order Stochastic Algorithms For Escaping From Saddle Points In Almost Linear Time

(This is a theory paper) In this paper, the authors consider first-order methods for solving stochastic non-convex optimization problems.The key building block of the proposed algorithms is first-order procedures to extract negative curvature from the Hessian matrix through a principled sequence starting from noise, which are referred to {it NEgative-curvature-Originated-from-Noise or NEON} and are of independent interest.

Yi Xu, Jing Rong, Tianbao Yang
Gen-Oja: Simple & Efficient Algorithm For Streaming Generalized Eigenvector Computation

In this paper, the authors study the problems of principle Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.

Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter Bartlett, Michael I. Jordan
Sparse DNNs With Improved Adversarial Robustness

Deep neural networks (DNNs) are computationally/memory-intensive and vulnerable to adversarial attacks, making them prohibitive in some real-world applications.

Yiwen Guo, Chao Zhang, Changshui Zhang, Yurong Chen
Constructing Fast Network Through Deconstruction Of Convolution

This study shows that naive convolution can be deconstructed into a shift operation and pointwise convolution.

Yunho Jeon, Junmo Kim
Learning Loop Invariants For Program Verification

A fundamental problem in program verification concerns inferring loop invariants.Inspired by how human experts construct loop invariants, the authors propose a reasoning framework Code2Inv that constructs the solution by multi-step decision making and querying an external program graph memory block.

Xujie Si, Hanjun Dai, Mukund Raghothaman, Mayur Naik, Le Song
Learning Libraries Of Subroutines For Neurally–Guided Bayesian Program Induction

Successful approaches to program induction require a hand-engineered domain-specific language (DSL), constraining the space of allowed programs and imparting prior knowledge of the domain.

Kevin Ellis, Lucas Morales, Mathias Sablé-Meyer, Armando Solar-Lezama, Josh Tenenbaum
Learning To Infer Graphics Programs From Hand-Drawn Images

The authors introduce a model that learns to convert simple hand drawings into graphics programs written in a subset of LaTeX.~The model combines techniques from deep learning and program synthesis.

Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, Josh Tenenbaum
Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn The Same Representation

It is widely believed that learning good representations is one of the main reasons for the success of deep neural networks.

Liwei Wang, Lunjia Hu, Jiayuan Gu, Zhiqiang Hu, Yue Wu, Kun He, John Hopcroft
Norm Matters: Efficient And Accurate Normalization Schemes In Deep Networks

The authors present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective.

Elad Hoffer, Ron Banner, Itay Golan, Daniel Soudry
ResNet With One-neuron Hidden Layers Is A Universal Approximator

The authors demonstrate that a very deep ResNet with stacked modules that have one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in d dimensions, i.e. \ell_1(R^d).

Hongzhou Lin, Stefanie Jegelka
Hyperbolic Neural Networks

Hyperbolic spaces have recently gained momentum in the context of machine learning due to their high capacity and tree-likeliness properties.However, the representational power of hyperbolic geometry is not yet on par with Euclidean geometry, firstly because of the absence of corresponding hyperbolic neural network layers.

Octavian Ganea, Gary Becigneul, Thomas Hofmann
A Simple Unified Framework For Detecting Out-of-Distribution Samples And Adversarial Attacks

Detecting test samples drawn sufficiently far away from the training distribution statistically or adversarially is a fundamental requirement for deploying a good classifier in many real-world machine learning applications.In this paper, the authors propose a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier.

Kimin Lee, Kibok Lee, Honglak Lee, Jinwoo Shin
Improving Neural Program Synthesis With Inferred Execution Traces

The task of program synthesis, or automatically generating programs that are consistent with a provided specification, remains a challenging task in artificial intelligence.

Richard Shin, Illia Polosukhin, Dawn Song
Scaling The Poisson GLM To Massive Neural Datasets Through Polynomial Approximations

The authors develop highly scalable approximate inference methods for Poisson generalized linear models (GLMs) that require only a single pass over the data.

David Zoltowski, Jonathan W Pillow
Discovery Of Latent 3D Keypoints Via End-to-end Geometric Reasoning

This paper presents KeypointNet, an end-to-end geometric reasoning framework to learn an optimal set of category-specific keypoints, along with their detectors to predict 3D keypoints in a single 2D input image.

Supasorn Suwajanakorn, Noah Snavely, Jonathan Tompson, Mohammad Norouzi
Learning To Reconstruct Shapes From Unseen Classes

From a single image, humans are able to perceive the full 3D shape of an object by exploiting learned shape priors from everyday life.Here the authors present an algorithm, Generalizable Reconstruction (GenRe), designed to capture more generic, class-agnostic shape priors.

Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Josh Tenenbaum, Bill Freeman, Jiajun Wu
Using Trusted Data To Train Deep Networks On Labels Corrupted By Severe Noise

The growing importance of massive datasets with the advent of deep learning makes robustness to label noise a critical property for classifiers to have.The authors demonstrate that robustness to label noise up to severe strengths can be achieved by using a set of trusted data with clean labels, and propose a loss correction that utilizes trusted examples in a data-efficient manner to mitigate the effects of label noise on deep neural network classifiers.

Dan Hendrycks, Mantas Mazeika, Duncan Wilson, Kevin Gimpel
A Reduction For Efficient LDA Topic Reconstruction

The authors present a novel approach for LDA (Latent Dirichlet Allocation) topic reconstruction.

Matteo Almanza, Flavio Chierichetti, Alessandro Panconesi, Andrea Vattani
A Unified View Of Piecewise Linear Neural Network Verification

The success of Deep Learning and its potential use in many safety-critical applications has motivated research on formal verification of Neural Network (NN) models.First, the authors present a unified framework that encompasses previous methods.

Rudy Bunel, Ilker Turkaslan, Philip Torr, Pushmeet Kohli, Pawan K Mudigonda
Optimization Over Continuous And Multi-dimensional Decisions With Observational Data

The authors consider the optimization of an uncertain objective over continuous and multi-dimensional decision spaces in problems in which the authors are only provided with observational data.The authors propose a novel algorithmic framework that is tractable, asymptotically consistent, and superior to comparable methods on example problems.

Dimitris Bertsimas, Christopher McCord
The Convergence Of Sparsified Gradient Methods

Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace.Several families of communication-reduction methods, such as quantization, large-batch methods, and gradient sparsification, have been proposed.

Dan Alistarh, Torsten Hoefler, Mikael Johansson, Nikola Konstantinov, Sarit Khirirat, Cedric Renggli
Estimating Learnability In The Sublinear Data Regime

The authors consider the problem of estimating how well a model class is capable of fitting a distribution of labeled data.This ability to estimate the explanatory value of a set of features (or dataset), even in the regime in which there is too little data to realize that explanatory value, may be relevant to the scientific and industrial settings for which data collection is expensive and there are many potentially relevant feature sets that could be collected.

Weihao Kong, Gregory Valiant
Learning Convex Polytopes With Margin

The authors present improved algorithm for properly learning convex polytopes in therealizable PAC setting from data with a margin.

Lee-Ad Gottlieb, Eran Kaufman, Aryeh Kontorovich, Gabriel Nivasch
Ridge Regression And Provable Deterministic Ridge Leverage Score Sampling

Ridge leverage scores provide a balance between low-rank approximation and regularization, and are ubiquitous in randomized linear algebra and machine learning.

Shannon McCurdy
GIANT: Globally Improved Approximate Newton Method For Distributed Optimization

For distributed computing environment, the authors consider the empirical risk minimization problem and propose a distributed and communication-efficient Newton-type optimization method.

Shusen Wang, Farbod Roosta-Khorasani, Peng Xu, Michael W Mahoney
Wavelet Regression And Additive Models For Irregularly Spaced Data

The authors present a novel approach for nonparametric regression using wavelet basis functions.

Asad Haris, Ali Shojaie, Noah Simon
New Insight Into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling And Convexity

As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HSGD) enjoys merits of both stochastic and full gradient methods for finite-sum minimization problem.

Pan Zhou, Xiaotong Yuan, Jiashi Feng
Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method For Stochastic Sparse Linear Regression With Limited Attribute Observation

The authors develop new stochastic gradient methods for efficiently solving sparse linear regression in a partial attribute observation setting

Tomoya Murata, Taiji Suzuki
Robust Subspace Approximation In A Stream

The authors study robust subspace estimation in the streaming and distributed settings.

Roie Levin, Anish Sevekari Sevekari, David Woodruff
A Practical Algorithm For Distributed Clustering And Outlier Detection

The authors study the classic k-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where the authors are allowed to discard a small portion of the data by labeling them as outliers.

Jiecao Chen, Erfan Sadeqi Azer, Qin Zhang
Compact Representation Of Uncertainty In Clustering

For many classic structured prediction problems, probability distributions over the dependent variables can be efficiently computed using widely-known algorithms and data structures (such as forward-backward, and its corresponding trellis for exact probability distributions in Markov models).However, the authors know of no previous work studying efficient representations of exact distributions over clusterings.

Craig Greenberg, Nicholas Monath, Ari Kobren, Patrick Flaherty, Andrew McGregor, Andrew McCallum
Bipartite Stochastic Block Models With Tiny Clusters

The authors study the problem of finding clusters in random bipartite graphs.

Stefan Neumann
Clustering Redemption–Beyond The Impossibility Of Kleinberg’s Axioms

Kleinberg (2002) stated three axioms that any clustering procedure should satisfy and showed there is no clustering procedure that simultaneously satisfies all three.The authors show that single linkage satisfies the modified axioms, and if the input is well-clusterable, some popular procedures such as k-means also satisfy the axioms, taking a step towards explaining the success of these objective functions for guiding the design of algorithms.

Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn
Understanding Regularized Spectral Clustering Via Graph Conductance

This paper uses the relationship between graph conductance and spectral clustering to study (i) the failures of spectral clustering and (ii) the benefits of regularization.

Yilin Zhang, Karl Rohe
Query K-means Clustering And The Double Dixie Cup Problem

The authors consider the problem of approximate $K$-means clustering with outliers and side information provided by same-cluster queries and possibly noisy answers.Compared to a handful of other known approaches that perform importance sampling to account for small cluster sizes, the proposed query technique reduces the number of queries by a factor of roughly $O(frac{K^6}{epsilon^3})$, at the cost of possibly missing very small clusters.

Eli Chien, Chao Pan, Olgica Milenkovic
How To Tell When A Clustering Is (approximately) Correct Using Convex Relaxations

The authors introduce the Sublevel Set (SS) method, a generic method to obtain sufficient guarantees of near-optimality and uniqueness (up to small perturbations) for a clustering.

Marina Meila
Derivative Estimation In Random Design

The authors propose a nonparametric derivative estimation method for random design withouthaving to estimate the regression function.

Yu Liu, Kris De Brabanter
Exploiting Numerical Sparsity For Efficient Learning : Faster Eigenvector Computation And Regression

In this paper, the authors obtain improved running times for regression and top eigenvector computation for numerically sparse matrices.

Neha Gupta, Aaron Sidford
Boosted Sparse And Low-Rank Tensor Regression

The authors propose a sparse and low-rank tensor regression model to relate a univariate outcome to a feature tensor, in which each unit-rank tensor from the CP decomposition of the coefficient tensor is assumed to be sparse.

Lifang He, Kun Chen, Wanwan Xu, Jiayu Zhou, Fei Wang
An Efficient Pruning Algorithm For Robust Isotonic Regression

The authors study a generalization of the classic isotonic regression problem where the authors allow separable nonconvex objective functions, focusing on the case of estimators used in robust regression.

Cong Han Lim
A Convex Program For Bilinear Inversion Of Sparse Vectors

The authors consider the bilinear inverse problem of recovering two vectors, x in R^L and w in R^L, from their entrywise product.Here, K and N may be larger than, smaller than, or equal to L. The authors introduce L1-BranchHull, which is a convex program posed in the natural parameter space and does not require an approximate solution or initialization in order to be stated or solved.

Alireza Aghasi, Ali Ahmed, Paul Hand, Babhru Joshi
Efficient Convex Completion Of Coupled Tensors Using Coupled Nuclear Norms

Coupled norms have emerged as a convex method to solve coupled tensor completion.In this paper, the authors introduce a new set of coupled norms known as coupled nuclear norms by constraining the CP rank of coupled tensors.

Kishan Wimalawarne, Hiroshi Mamitsuka
Support Recovery For Orthogonal Matching Pursuit: Upper And Lower Bounds

This paper studies the problem of sparse regression where the goal is to learn a sparse vector that best optimizes a given objective function.

Raghav Somani, Chirag Gupta, Prateek Jain, Praneeth Netrapalli
Learning Without The Phase: Regularized PhaseMax Achieves Optimal Sample Complexity Fariborz Salehi, Ehsan Abbasi, Babak Hassibi
On Controllable Sparse Alternatives To Softmax

Converting an n-dimensional vector to a probability distribution over n objects is a commonly used component in many machine learning tasks like multiclass classification, multilabel classification, attention mechanisms etc.For this, several probability mapping functions have been proposed and employed in literature such as softmax, sum-normalization, spherical softmax, and sparsemax, but there is very little understanding in terms how they relate with each other.

Anirban Laha, Saneem Ahmed Chemmengath, Priyanka Agrawal, Mitesh Khapra, Karthik Sankaranarayanan, Harish Ramaswamy
Sparse PCA From Sparse Linear Regression

Sparse Principal Component Analysis (SPCA) and Sparse Linear Regression (SLR) have a wide range of applications and have attracted a tremendous amount of attention in the last two decades as canonical examples of statistical problems in high dimension.A variety of algorithms have been proposed for both SPCA and SLR, but an explicit connection between the two had not been made.

Guy Bresler, Sung Min Park, Madalina Persu
Efficient Anomaly Detection Via Matrix Sketching

The authors consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores.

Vatsal Sharan, Parikshit Gopalan, Udi Wieder
Dimensionality Reduction For Stationary Time Series Via Stochastic Nonconvex Optimization

Stochastic optimization naturally arises in machine learning. Efficient algorithms with provable guarantees, are still largely missing. This paper studies this challenge through a streaming PCA problem for stationary time series data.

Minshuo Chen, Lin Yang, Mengdi Wang, Tuo Zhao
Contrastive Learning From Pairwise Measurements

The authors study a semiparametric model where the pairwise measurements follow a natural exponential family distribution with an unknown base measure.

Yi Chen, Zhuoran Yang, Yuchen Xie, Princeton Zhaoran Wang
Deep Functional Dictionaries: Learning Consistent Semantic Structures On 3D Models From Functions

Various 3D semantic attributes such as segmentation masks, geometric features, keypoints, and materials can be encoded as per-point probe functions on 3D geometries.Given a collection of related 3D shapes, the authors consider how to jointly analyze such probe functions over different shapes, and how to discover common latent structures using a neural network — even in the absence of any correspondence information.

Minhyuk Sung, Hao Su, Ronald Yu, Leonidas J Guibas
Large Scale Computation Of Means And Clusters For Persistence Diagrams Using Optimal Transport

Persistence diagrams (PDs) are now routinely used to summarize the underlying topology of complex data.The authors propose in this article a tractable framework to carry out standard tasks on PDs at scale, notably evaluating distances, estimating barycenters and performing clustering.

Theo Lacombe, Marco Cuturi, Steve OUDOT
Representation Learning Of Compositional Data

The authors consider the problem of learning a low dimensional representation for compositional data.

Marta Avalos, Richard Nock, Cheng Soon Ong, Julien Rouar, Ke Sun
How To Make The Gradients Small Stochastically: Even Faster Convex And Nonconvex SGD

Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives $f(x)$.However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when $f(x)$ is convex.If $f(x)$ is convex, to find a point with gradient norm $varepsilon$, the authors design an algorithm SGD3 with a near-optimal rate $ ilde{O}(varepsilon^{-2})$, improving the best known rate $O(varepsilon^{-8/3})$.

Zeyuan Allen-Zhu
Statistical Optimality Of Stochastic Gradient Descent On Hard Learning Problems Through Multiple Passes

The authors consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data.

Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach
Optimal Subsampling With Influence Functions

Subsampling is a common and often effective method to deal with the computational challenges of large datasets.

Daniel Ting, Eric Brochu
Metric On Nonlinear Dynamical Systems With Perron-Frobenius Operators

The development of a metric for structural data is a long-term problem in pattern recognition and machine learning. The authors develop a general metric for comparing nonlinear dynamical systems that is defined with Perron-Frobenius operators in reproducing kernel Hilbert spaces.

Isao Ishikawa, Keisuke Fujii, Masahiro Ikeda, Yuka Hashimoto, Yoshinobu Kawahara
Random Feature Stein Discrepancies

Computable Stein discrepancies have been deployed for a variety of applications, ranging from sampler selection in posterior inference to approximate Bayesian inference to goodness-of-fit testing.While linear-time Stein discrepancies have been proposed for goodness-of-fit testing, they exhibit avoidable degradations in testing power—even when power is explicitly optimized.

Jonathan Huggins, Lester Mackey
Informative Features For Model Comparison

Given two candidate models, and a set of target observations, the authors address the problem of measuring the relative goodness of fit of the two models.The authors propose two new statistical tests which are nonparametric, computationally efficient (runtime complexity is linear in the sample size), and interpretable.

Wittawat Jitkrittum, Heishiro Kanagawa, Patsorn Sangkloy, James Hays, Bernhard Schölkopf, Arthur Gretton
On Fast Leverage Score Sampling And Optimal Learning

Leverage score sampling provides an appealing way to perform approximate com- putations for large matrices.In this paper, the authors study the problem of leverage score sampling for positive definite ma- trices defined by a kernel.

Alessandro Rudi, Daniele Calandriello, Luigi Carratino, Lorenzo Rosasco
Persistence Fisher Kernel: A Riemannian Manifold Kernel For Persistence Diagrams

Algebraic topology methods have recently played an important role for statistical analysis with complicated geometric structured data such as shapes, linked twist maps, and material data.In this work, the authors rely upon the alternative extit{Fisher information geometry} to propose a positive definite kernel for PDs extit{without approximation}, namely the Persistence Fisher (PF) kernel.

Tam Le, Makoto Yamada
Learning Bounds For Greedy Approximation With Explicit Feature Maps From Multiple Kernels

Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization.

Shahin Shahrampour, Vahid Tarokh
RetGK: Graph Kernels Based On Return Probabilities Of Random Walks

The authors develop a framework for computing graph kernels, based on return probabilities of random walks.

Zhen Zhang, Mianzhi Wang, Yijian Xiang, Yan Huang, Arye Nehorai
Nonparametric Density Estimation Under Adversarial Losses

The authors study minimax convergence rates of nonparametric density estimation under a large class of loss functions called ``adversarial losses', which, besides classical L^p losses, includes maximum mean discrepancy (MMD), Wasserstein distance, and total variation distance.

Shashank Singh, Ananya Uppal, Boyue Li, Chun-Liang Li, Manzil Zaheer, Barnabas Poczos
Deep Homogeneous Mixture Models: Representation, Separation, And Approximation

At their core, many unsupervised learning models provide a compact representation of homogeneous density mixtures, but their similarities and differences are not always clearly understood.

Priyank Jaini, Pascal Poupart, Yaoliang Yu
Gaussian Process Conditional Density Estimation

Conditional Density Estimation (CDE) models deal with estimating conditional distributions.CDE is a challenging task as there is a fundamental trade-off between model complexity, representational capacity and overfitting.

Vincent Dutordoir, Hugh Salimbeni, James Hensman, Marc Deisenroth
Low-rank Interaction With Sparse Additive Effects Model For Large Data Frames

Many applications of machine learning involve the analysis of large data frames -- matrices collecting heterogeneous measurements (binary, numerical, counts, etc.)

Geneviève Robin, Hoi-To Wai, Julie Josse, Olga Klopp, Eric Moulines
Mixture Matrix Completion

Completing a data matrix X has become an ubiquitous problem in modern data science, with motivations in recommender systems, computer vision, and networks inference, to name a few.

Daniel Pimentel-Alarcon
Multivariate Time Series Imputation With Generative Adversarial Networks

Multivariate time series usually contain a large number of missing values, which hinders the application of advanced analysis methods on multivariate time series data.Inspired by the success of Generative Adversarial Networks (GAN) in image generation, the authors propose to learn the overall distribution of a multivariate time series dataset with GAN, which is further used to generate the missing values for each sample.

Yonghong Luo, Xiangrui Cai, Ying ZHANG, Jun Xu, Yuan Xiaojie
Fully Understanding The Hashing Trick

Feature hashing, also known as {em the hashing trick}, introduced by Weinberger et al.

Lior Kamma, Casper B. Freksen, Kasper Green Larsen
Learning Semantic Similarity In A Continuous Space

The authors address the problem of learning semantic representation of questions to measure similarity between pairs as a continuous distance metric.

Michel Deudon
Bilevel Learning Of The Group Lasso Structure

Regression with group-sparsity penalty plays a central role in high-dimensional prediction problems.To circumvent this issue, the authors present a method to estimate the group structure by means of a continuous bilevel optimization problem where the data is split into training and validation sets.

Jordan Frecon, Saverio Salzo, Massimiliano Pontil
Bayesian Structure Learning By Recursive Bootstrap

The authors address the problem of Bayesian structure learning for domains with hundreds of variables by employing non-parametric bootstrap, recursively.The authors propose a method that covers both model averaging and model selection in the same framework.

Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov, Guy Koren, Gal Novik
Learning From Group Comparisons: Exploiting Higher Order Interactions

The authors study the problem of learning from group comparisons, with applications in predicting outcomes of sports and online games.

Yao Li, Minhao Cheng, Kevin Fujii, Fushing Hsieh, Cho-Jui Hsieh
A Structured Prediction Approach For Label Ranking

The authors propose to solve a label ranking problem as a structured output regression task.

Anna Korba, Alexandre Garcia, Florence D'Alché-Buc
The Sample Complexity Of Semi-Supervised Learning With Nonparametric Mixture Models

The authors study the sample complexity of semi-supervised learning (SSL) and introduce new assumptions based on the mismatch between a mixture model learned from unlabeled data and the true mixture model induced by the (unknown) class conditional distributions.

Chen Dan, Liu Leqi, Bryon Aragam, Pradeep Ravikumar, Eric Xing
Binary Classification From Positive-Confidence Data

Can the authors learn a binary classifier from only positive data, without any negative data or unlabeled data?The authors theoretically establish the consistency and an estimation error bound, and demonstrate the usefulness of the proposed method for training deep neural networks through experiments.

Takashi Ishida, Gang Niu, Masashi Sugiyama
Learning SMaLL Predictors

The authors introduce a new framework for learning in severely resource-constrained settings.

Vikas Garg, Ofer Dekel, Lin Xiao
Contour Location Via Entropy Reduction Leveraging Multiple Information Sources

The authors introduce an algorithm to locate contours of functions that are expensive to evaluate.

Alexandre Marques, Remi Lam, Karen Willcox
Multi-Class Learning: From Theory To Algorithm

In this paper, the authors study the generalization performance of multi-class classification and obtain a shaper data-dependent generalization error bound with fast convergence rate, substantially improving the state-of-art bounds in the existing data-dependent generalization analysis.

Jian Li, Yong Liu, Rong Yin, Hua Zhang, Lizhong Ding, Weiping Wang
Generalized Cross Entropy Loss For Training Deep Neural Networks With Noisy Labels

Deep neural networks (DNNs) have achieved tremendous success in a variety of applications across many disciplines.To combat this problem, mean absolute error (MAE) has recently been proposed as a noise-robust alternative to the commonly-used categorical cross entropy (CCE) loss.

Zhilu Zhang, Mert Sabuncu
A Smoother Way To Train Structured Prediction Models

The authors present a framework to train a structured prediction model by performing smoothing on the inference algorithm it builds upon.

Krishna Pillutla, Vincent Roulet, Sham Kakade, Zaid Harchaoui
Constrained Graph Variational Autoencoders For Molecule Design

Graphs are ubiquitous data structures for representing interactions between entities.

Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, Alexander Gaunt
Learning Beam Search Policies Via Imitation Learning

Beam search is widely used for approximate decoding in structured prediction problems.

Renato Negrinho, Matt Gormley, Geoffrey Gordon
Loss Functions For Multiset Prediction

The authors study the problem of multiset prediction.

Sean Welleck, Zixin Yao, Yu Gai, Jialin Mao, Zheng Zhang, Kyunghyun Cho
Learning Confidence Sets Using Support Vector Machines

The goal of confidence-set learning in the binary classification setting is to construct two sets, each with a specific probability guarantee to cover a class.Instead of plug-in approaches, the authors propose a support vector classifier to construct confidence sets in a flexible manner.

Wenbo Wang, Xingye Qiao
Fast Similarity Search Via Optimal Sparse Lifting

Similarity search is a fundamental problem in computing science with various applications and has attracted significant research attention, especially in large-scale search with high dimensions.

Wenye Li, Jingwei Mao, Yin Zhang, Shuguang Cui
The Sparse Manifold Transform

The authors present a signal representation framework called the sparse manifold transform that combines key ideas from sparse coding, manifold learning, and slow feature analysis.

Yubei Chen, Dylan Paiton, Bruno Olshausen
When Do Random Forests Fail?

Random forests are learning algorithms that build large collections of random trees and make predictions by averaging the individual tree predictions.In this paper, the authors consider various tree constructions and examine how the choice of parameters affects the generalization error of the resulting random forests as the sample size goes to infinity.

Cheng Tang, Damien Garreau, Ulrike Von Luxburg
Diverse Ensemble Evolution: Curriculum Data-Model Marriage

The authors study a new method (``Diverse Ensemble Evolution (DivE$^2$)'') to train an ensemble of machine learning models that assigns data to models at each training epoch based on each model's current expertise and an intra- and inter-model diversity reward.

Tianyi Zhou, Shengjie Wang, Jeff Bilmes
The Pessimistic Limits And Possibilities Of Margin-based Losses In Semi-supervised Learning

Consider a classification problem where the authors have both labeled and unlabeled data available.

Jesse Krijthe, Marco Loog
Semi-Supervised Learning With Declaratively Specified Entropy Constraints

The authors propose a technique for declaratively specifying strategies for semi-supervised learning (SSL).

Haitian Sun, William Cohen, Lidong Bing
Learning To Multitask

Multitask learning has shown promising performance in many applications and many multitask models have been proposed.

Yu Zhang, Ying Wei, Qiang Yang
CatBoost: Unbiased Boosting With Categorical Features

This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit.

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin
Supervised Autoencoders: Improving Generalization Performance With Unsupervised Regularizers

Generalization performance is a central goal in machine learning, particularly when learning representations with large neural networks.

Lei Le, Andy Patterson, Martha White
Representation Learning For Treatment Effect Estimation From Observational Data

Estimating individual treatment effect (ITE) is a challenging problem in causal inference, due to the missing counterfactuals and the selection bias.In this paper, the authors propose a local similarity preserved individual treatment effect (SITE) estimation method based on deep representation learning.

Liuyi Yao, Sheng Li, Yaliang Li, Mengdi Huai, Jing Gao, Aidong Zhang
SimplE Embedding For Link Prediction In Knowledge Graphs

Knowledge graphs contain knowledge about the world and provide a structured representation of this knowledge.

Seyed Mehran Kazemi, David Poole
DeepProbLog: Neural Probabilistic Logic Programming

The authors introduce DeepProbLog, a probabilistic logic programming language that incorporates deep learning by means of neural predicates.

Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, Luc De Raedt
Watch Your Step: Learning Node Embeddings Via Graph Attention

Graph embedding methods represent nodes in a continuous vector space,preserving different types of relational information from the graph.There are many hyper-parameters to these methods (e.g.

Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, Alex Alemi
Invariant Representations Without Adversarial Training

Representations of data that are invariant to changes in specified factors are useful for a wide range of problems: removing potential biases in prediction problems, controlling the effects of covariates, and disentangling meaningful factors of variation.

Daniel Moyer, Shuyang Gao, Rob Brekelmans, Aram Galstyan, Greg Ver Steeg
Domain-Invariant Projection Learning For Zero-Shot Recognition

Zero-shot learning (ZSL) aims to recognize unseen object classes without any training samples, which can be regarded as a form of transfer learning from seen classes to unseen ones.In this paper, the authors propose a novel ZSL model termed domain-invariant projection learning (DIPL).

An Zhao, Mingyu Ding, Abel Guan, Zhiwu Lu, Tao Xiang, Ji-Rong Wen
Unsupervised Learning Of View-invariant Action Representations

The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance.However, label collection is an expensive and time-consuming process.

Junnan Li, Yongkang Wong, Qi Zhao, Mohan Kankanhalli
Neural Architecture Optimization

Automatic neural architecture design has shown its potential in discovering powerful neural network architectures.

Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, Tie-Yan Liu
Scalable Hyperparameter Transfer Learning

Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization.The authors propose a multi-task adaptive Bayesian linear regression model for transfer learning in BO, whose complexity is linear in the function evaluations: one Bayesian linear regression model is associated to each black-box function optimization problem (or task), while transfer learning is achieved by coupling the models through a shared deep neural net.

Valerio Perrone, Rodolphe Jenatton, Matthias W Seeger, Cedric Archambeau
Learning To Learn Around A Common Mean

The problem of learning-to-learn (LTL) or meta-learning is gaining increasing attention due to recent empirical evidence of its effectiveness in applications. In this work, the authors consider the family of algorithms given by a variant of Ridge Regression, in which the regularizer is the square distance to an unknown mean vector.

Giulia Denevi, Carlo Ciliberto, Dimitris Stamos, Massimiliano Pontil
Hybrid-MST: A Hybrid Active Sampling Strategy For Pairwise Preference Aggregation

In this paper the authors present a hybrid active sampling strategy for pairwise preference aggregation, which aims at recovering the underlying rating of the test candidates from sparse and noisy pairwise labeling.

Jing LI, Rafal Mantiuk, Junle Wang, Suiyi Ling, Patrick Le Callet
Algorithmic Assurance: An Active Approach To Algorithmic Testing Using Bayesian Optimisation

The authors introduce algorithmic assurance, the problem of testing whethermachine learning algorithms are conforming to their intended designgoal.

Shiva Gopakumar, Sunil Gupta, Santu Rana, Vu Nguyen, Svetha Venkatesh
Understanding The Role Of Adaptivity In Machine Teaching: The Case Of Version Space Learners

In real-world applications of education, an effective teacher adaptively chooses the next example to teach based on the learner’s current state.In this paper, the authors study the case of teaching consistent, version space learners in an interactive setting.

Yuxin Chen, Adish Singla, Oisin Mac Aodha, Pietro Perona, Yisong Yue
Active Learning For Non-Parametric Regression Using Purely Random Trees

Active learning is the task of using labelled data to select additional points to label, with the goal of fitting the most accurate model with a fixed budget of labelled points.In this paper the authors propose an intuitive tree based active learning algorithm for non-parametric regression with provable improvement over random sampling.

Jack Goetz, Ambuj Tewari, Paul Zimmerman
Interactive Structure Learning With Structural Query-by-Committee

In this work, the authors introduce interactive structure learning, a framework that unifies many different interactive learning tasks.

Christopher Tosh, Sanjoy Dasgupta
Efficient Nonmyopic Batch Active Search

Active search is a learning paradigm for actively identifying as many members of a given class as possible.The authors also propose novel, efficient batch policies inspired by state-of-the-art sequential policies, and develop an aggressive pruning technique that can dramatically speed up computation.

Shali Jiang, Gustavo Malkomes, Matthew Abbott, Benjamin Moseley, Roman Garnett
Uncertainty Sampling Is Preconditioned Stochastic Gradient Descent On Zero-One Loss

Uncertainty sampling, a popular active learning algorithm, is used to reduce the amount of data required to learn a classifier, but it has been observed in practice to converge to different parameters depending on the initialization and sometimes to even better parameters than standard training on all the data.

Stephen Mussmann, Percy Liang
Online Adaptive Methods, Universality And Acceleration

The authors present a novel method for convex unconstrained optimization that, without any modifications ensures: (1) accelerated convergence rate for smooth objectives, (2) standard convergence rate in the general (non-smooth) setting, and (3) standard convergence rate in the stochastic optimization setting.

Yehuda Kfir Levy, Alp Yurtsever, Volkan Cevher
Online Improper Learning With An Approximation Oracle

The authors study the following question: given an efficient approximation algorithm for an optimization problem, can the authors learn efficiently in the same setting?

Elad Hazan, Wei Hu, Yuanzhi Li, Zhiyuan Li
Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting

The authors introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks.

Hippolyt Ritter, Alex Botev, David Barber
Approximating Real-Time Recurrent Learning With Random Kronecker Factors

Despite all the impressive advances of recurrent neural networks, sequential data is still in need of better modelling.In this paper the authors propose the Kronecker Factored RTRL (KF-RTRL) algorithm that uses a Kronecker product decomposition to approximate the gradients for a large class of RNNs.

Asier Mujika, Florian Meier, Angelika Steger
Online Reciprocal Recommendation With Theoretical Performance Guarantees

The authors initiate a rigorous theoretical investigation of the reciprocal recommendation task in a specific framework of sequential learning.

Claudio Gentile, Nikos Parotsidis, Fabio Vitale
Generalized Inverse Optimization Through Online Learning

Inverse optimization is a powerful paradigm for learning preferences and restrictions that explain the behavior of a decision maker, based on a set of external signal and the corresponding decision pairs.However, most inverse optimization algorithms are designed specifically in batch setting, where all the data is available in advance.

Chaosheng Dong, Yiran Chen, Bo Zeng
Adaptive Online Learning In Dynamic Environments

In this paper, the authors study online convex optimization in dynamic environments, and aim to bound the dynamic regret with respect to any sequence of comparators.

Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou
Online Convex Optimization For Cumulative Constraints

The authors propose the algorithms for online convex optimization which lead to cumulative squared constraint violations of the form $sumlimits_{t=1}^Tig([g(x_t)]_+ig)^2=O(T^{1-eta})$, where $etain(0,1)$.

Jianjun Yuan, Andrew Lamperski
Efficient Online Algorithms For Fast-rate Regret Bounds Under Sparsity

The authors consider the problem of online convex optimization in two different settings: arbitrary and i.i.d.

Pierre Gaillard, Olivier Wintenberger
Regret Bounds For Online Portfolio Selection With A Cardinality Constraint

Online portfolio selection is a sequential decision-making problem in which a learner repetitively selects a portfolio over a set of assets, aiming to maximize long-term return.In this paper, the authors study the problem with the cardinality constraint that the number of assets in a portfolio is restricted to be at most k, and consider two scenarios: (i) in the full-feedback setting, the learner can observe price relatives (rates of return to cost) for all assets, and (ii) in the bandit-feedback setting, the learner can observe price relatives only for invested assets.

Shinji Ito, Daisuke Hatano, Sumita Hanna, Akihiro Yabe, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi
Policy Regret In Repeated Games

The notion of policy regret' in online learning is supposed to capture the reactions of the adversary to the actions taken by the learner, which more traditional notions such as external regret do not take into account.

Raman Arora, Michael Dinitz, Teodor Vanislavov Marinov, Mehryar Mohri
Query Complexity Of Bayesian Private Learning

The authors study the query complexity of Bayesian Private Learning.

Kuang Xu
The Limits Of Post-Selection Generalization

The authors show several limitations on the power of algorithms satisfying post hoc generalization.

Jonathan Ullman, Adam Smith, Kobbi Nissim, Uri Stemmer, Thomas Steinke
Sequential Test For The Lowest Mean: From Thompson To Murphy Sampling

Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-problem in planning, game tree search and reinforcement learning.The authors then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules.

Emilie Kaufmann, Wouter Koolen, Aurélien Garivier
Contextual Combinatorial Multi-armed Bandits With Volatile Arms And Submodular Reward

In this paper, the authors study the stochastic contextual combinatorial multi-armed bandit (CC-MAB) framework that is tailored for volatile arms and submodular reward functions.

Lixing Chen, Jie Xu, Zhuo Lu
TopRank: A Practical Algorithm For Online Stochastic Ranking

Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user.Many sample-efficient algorithms have been proposed for this problem that assume a specific click model connecting rankings and user behavior.

Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari
A Bandit Approach To Sequential Experimental Design With False Discovery Control

The authors propose a new adaptive sampling approach to multiple testing which aims to maximize statistical power while ensuring anytime false discovery control.

Kevin Jamieson, Lalit Jain
Adaptation To Easy Data In Prediction With Limited Advice

The authors derive an online learning algorithm with improved regret guarantees for ``easy' loss sequences.While a number of algorithms have been proposed for ex